Important data can be found all over campus. Databases were a generally unexplored asset in university commercialization until recently. Such data assets can be found in in research labs in STEM departments, of course, but data can also be found in non-STEM departments.

Until recently this area of Tech Transfer has not been actively pursued, as the data was not truly an “invention” and patentable. The field took off when artificial intelligence algorithms (or AI, machine learning algorithms), really started to be used in medicine and real data sets of patient data were needed to train the AI algorithms.

3.6.1 Licensing Procedures: Software vs. the Data

In the case of data, one must be clear on what it is exactly: define it (are there gaps in format or accuracy?); determine the format of the data displayed; its uniqueness; whether the database is fixed (static) or is regularly updated (a dynamic data set).

In the case of software, one must be clear about the software used to display the data (is it unique or available commercially?) and if there are associated analytical tools included. If the software is unique, be clear on which version of the software code is being licensed; whether the next generation of the software will be made available, who the authors are and how to access updated, new data.

3.6.2 Privacy/Permissions Issues

Privacy and permission issues may arise in certain scenarios, particularly if the data is human data. In this case, is there written permission to license it to others, and can copies of that permission be provided? Be clear on the methods used to make individual data points anonymous, and whether the organization that has physical possession of the database can link the anonymous data points to the patients’ clinical records with complete accuracy.

3.6.3 License Terms

To minimize legal/use complications, it may be best to provide a non-exclusive license so the rights of the Licensor to use the data for any clinical/research purposes are clear. The license terms for data are typical of a non-exclusive license, but with some unique issues:

  • The definition of the fields of information which exist. Define the process by which information is gathered, reviewed, certified as accurate—and if the data gathered by a doctor or nurse. Note who exactly enters the data, who verifies that the entry is accurate. Note who maintains the database and the server, and if the server or its software is changed, note who verifies that the transfer is completely accurate.
  • Define the related documentation— if patient data, is there a properly signed informed consent agreement for every patient entry?
  • Describe where the data resides (it may be prudent to create a backup of the data on a named Licensor server. The Licensee has access to the copied data but cannot download the database and take physical possession of it).
  • Note who owns the results of the use of the data, and whether these results are granted back to the Licensor.
  • Determine if the data can be modified by the Licensee (derivative data).
  • Determine if the data can be re-transmitted to third parties for use or analysis.
  • Determine if the Licensor has the right to audit the use of the data.

3.6.4 Price 

Determining pricing can be an issue. One simple answer is to ask the potential Licensee what they have paid others and ask to call one of those entities to ask for verification of said price.

Alternatively, one must research how other databases compare. It’s important to compare all variables, such as the length of time the data was gathered, how each single data point entry was verified, how many fields and data points these other databases have. It’s also worth noting whether the database in question is unique, for example, the only database in the Spanish language, containing data on a disease state endemic in South America and not elsewhere, gathered by people or an institution which is certified by an International Certification Organization recognized globally, etc.