This week Deloitte and the National Institutes of Health’s National Library of Medicine (NLM) will recognize two student-led research teams from San Diego State University (SDSU) and Kingsborough Community College of The City University of New York (KCC CUNY) for their groundbreaking research in the Microbial Metagenomics Discovery Challenge.
The undergraduate students’ work was impressive by any standard. They identified viruses present in California mosquitos and their threat to humans, investigated the genetic codes of rabies viruses, examined whether antibiotic resistant genes found in human skin could be spread through swimming pools, analyzed antibiotic resistance in melanoma patients undergoing immune checkpoint therapy, and many more efforts that advance understanding of disease and treatment options.
The diversity of topics students chose reflects the virtually endless possibilities for research and discovery that exist in the vast stores of health care data.
Students were challenged to use novel computational methods in a cloud-based environment to identify new viruses and antibiotic resistant genes among metagenomes in the world’s largest biomedical database of sequences, the NLM’s Sequence Read Archive (SRA). This next generation of scientists are digital natives, familiar with data science and crowdsourcing and eager to make a difference. They are ready to discover, problem solve, and innovate. That’s why this discovery challenge offers an exciting model for biomedical research of the future.
The Microbial Metagenomics Discovery Challenge shows the promise of “open science,” which encourages all those working to unlock scientific problems to collaborate and share data, algorithms, tried and failed combinations, and more, so that discoveries can be made more quickly.
Deloitte sponsored this challenge because there are so many important lifesaving and life-improving discoveries and breakthroughs just waiting to be found in immense amounts of data. Prizes and challenges are a great way to mobilize talent and fresh thinking. Prize designs succeed where many efforts fail because they activate a crowd of solvers from diverse perspectives and give them access to necessary tools and data.
In this challenge, the simple crowdsourcing mechanism of two universities (SDSU and CUNY) with professors advocating unconventional study generated many valuable findings on which to build. The professors designed an undergraduate course to teach students computational methods and then set the students loose on the largest repository of DNA sequencing data to apply what they learned and make new discoveries. This is what biomedical research needs to make more medical breakthroughs – more students and young researchers digging into immense health datasets and using the latest computational tools to identify patterns and make discoveries.
The traditional approach to medical research is based on grant-funded studies to test hypotheses with primary data collections and structured and reproducible analyses. A complementary, innovative approach is one of collaborative, iterative, and exploratory analyses using available big data. The computational technologies used by the students as well as advances in cognitive computing and machine learning present a fresh opportunity in life sciences and health care to finally gain some traction against the explosion of data. Crowdsourcing using a prize challenge scales better in making new discoveries amidst an explosion in data. This approach can be extremely efficient and economical to advance medical research, and bypasses lengthy grant writing processes and expensive primary data collection. It simply uses what’s already there and engages young researchers in ways that excite them to learn and discover. This is open science at its best.
And while the NLM is the world’s largest biomedical library, it is really just the tip of the data iceberg with more data being generated at an exponential speed. More biomedical data are produced every year than at any other time in history – think electronic health records, clinical trials, genomic data, and patient-generated health data from mobile devices, wearables, and the “Internet of Things.”
That is why it is so important that young scientists and the life science workforce of the future be fluent in putting data to work in service of research, discovery, and innovation. Join me in congratulating these pioneering students and professors for their energy, fresh thinking, and foresight. They are creating a future of exciting discoveries that can lead to life-changing cures and treatments.