Ten years from now, I expect biomedical research will look much different than it does today. I expect researchers will be able to tap a wide range of data streams, which will not only be accessible, they will all be in a format that can be easily shared and reused. By building upon each other’s data, researchers will be able to collectively accelerate biomedical discovery.
This is often not how research works today. While staggering volumes of research data exist, they are housed across a broad spectrum of silos, and exist in a wide range of formats. As a result, researchers typically have access to just a sliver of data, which restricts the depth of their research and limits their results. If biomedical researchers have access to multiple sets of data, more breakthroughs are bound to happen.
Breaking down research barriers is the goal of the National Institutes of Health’s (NIH) $95.5 million Data Commons Pilot project. In late 2017, NIH issued 10 awards to launch the pilot phase. Participants will soon begin plotting a course to an open environment where researchers can share and contribute their data, their code, their metadata, and the lessons they’ve learned along the way—all on the same platform.
By using a common platform, researchers can know who is using their data sets and whether they are sharing their own data back. There is tremendous anticipation surrounding the potential of this project, and I am excited to be a part of it.
Common platform could reshape biomedical research
In December, I was one of about 70 to 80 people who attended a two-day kickoff event for the first phase of the pilot. NIH Director Francis Collins, M.D., Ph.D., offered his thoughts about the importance of this project, and explained how it could reshape biomedical research as we delve deeper into the digital world.
Prior to the event, each participant, many of whom were from academia, submitted a plan to address key aspects of the Data Commons platform. These individual plans are now being integrated into a master plan, and we will soon begin exploring the building phase of the project. Then, over the next six months, we want to turn ideas and concepts into meaningful minimally viable products (MVPs).
Deloitte is the only large consulting firm involved in the NIH Data Commons Pilot Project Consortium. In the early stages of this project, our role will be to help determine how researchers can ensure that their digital artifacts and objects can be easily found and reused by other researchers. The acronym we are using for this is FAIR, which stands for Findable, Accessible, Interoperable, and Reusable. Another important aspect of our role is to develop a mechanism to identify users and manage access to the platform (Identity and Access Management or IAM).
Who owns the data?
Under the existing system, researchers gather data and analyze them, but generally don’t focus on their reuse by others. Open science relies on FAIRness of data where all researchers make their shared data findable, accessible, interoperable, and reusable by following specific guidelines and recommendations. Researchers will need to learn how to make their data FAIR. During the pilot phase, we plan to develop tools that evaluate FAIRness of data and guide researchers through the process of making their data FAIRer.
In a nutshell, the Data Commons could eventually serve as a trade platform for digital objects of biomedical research. Unlike trade in commerce, trade in research probably won’t take place in monetary terms. Someone who uses my data may, for example, offer to make me a co-author of their resulting research paper, thus recognizing digital objects as valuable assets.
In this context, data ownership becomes an important ethical question that does not yet have an obvious answer, especially for human data. As the use of data leads to discoveries, new value is generated. If these discoveries could not have been made without the data, the newly generated value should be fairly distributed between those who provided the data and those who made the new discoveries. A blockchain is a technical approach we could employ to establish a fair trading platform for digital objects.
Data Commons could help open an expanding digital world
The timing is perfect for the NIH’s Data Commons Pilot Project. Researchers worldwide have accumulated massive amounts of digital objects, and the sheer volume of data is putting pressure on biomedical researchers to use it. Technology has reached a critical point where it allows the cost-effective sharing and reuse of large data volumes. Moreover, data science and analytic techniques, such as artificial intelligence (AI), have moved into the mainstream and can be used to dig deep into these vast troves of data for new discoveries.
We have an opportunity to think differently about the biomedical research enterprise. Data-driven biomedical research, powered by a Data Commons, could complement traditional research methods. And this trend will likely be seen across the entire field of life sciences and health care.