GOBII Workshop Seeks Solutions to Big Data Problem
The 25 molecular biologists, computational biologists and software developers travelled from breeding centers in the Philippines, India and Mexico and across the street from Cornell University and the U.S. Department of Agriculture to decide the best way to store and share the trillions of data points generated in the pursuit of breeding better crops. Ultimately the GOBII project, which stands for Genomic and Open source Breeding Informatics Initiative, seeks to create the architecture for a publicly accessible genomics database to accelerate the development of improved crop varieties.
GOBII works with breeding centers associated with CGIAR, a consortium that supports agricultural research for global development. The centers work to facilitate crop improvement, with the goal of increasing plant yield, nutritional value and resilience in the face of climate change.
The database will need to be robust enough to handle a monumental amount of data of multiple types, while also being user-friendly so that plant breeders can efficiently make use of the information—a task that is equivalent to “finding a shirt that fits everyone,” said Kevin Palis, a software developer at IRRI, the International Rice Research Institute in Los Baños, Philippines.
Breeding centers may sequence tens of thousands of varieties of a single crop to create a catalogue of millions of genetic markers for different traits like disease-resistance or heat tolerance. The mountains of data can be used for a plant breeding strategy called genomic selection, which uses statistical modeling to predict how a new plant variety will perform before being tested in the field. But to use these markers to make better, faster choices, breeders need tools to access and analyze the information. The GOBII project hopes to bridge the gap between plant breeders and the available genomic resources to yield better crops, especially in developing countries.
“There’s so much information that one can store and all the centers have overlapping needs, so the goal is to come up with the core requirements that are going to satisfy all the centers,” said Yaw Nti-Addae, GOBII’s lead software developer. Nti-Addae said that the four-day workshop was successful, both in bringing the interested parties face to face and in planning out a roadmap for the project.
The group has received $18.5 million in funding from the Bill & Melinda Gates Foundation through Cornell to create a breeding database for five major staple crops—wheat, rice, maize, sorghum and chickpea—but ultimately, they hope to develop a system that will work for any crop.
In the past, researchers working on a single crop have maintained their own data sets, using a variety of platforms, formats and terminology, which are not easily shared. IRRI has developed IRIS, the International Rice Information System, but plenty of data is sitting in individual spreadsheets.
“We don’t have [a database] set up yet and we don’t have that much capability to develop something,” said Victor Jun Ulat, a bioinformatician at CIMMYT, the International Maize and Wheat Improvement Center in Texcoco, Mexico. “Working with other colleagues with the same interest will really help us develop something that will be useful for us.”
BTI’s Associate Professor Lukas Mueller is a collaborator on the project. His lab has developed CassavaBase, a database of genomic data and physical traits from thousands of cassava varieties. Peter Bradbury, a USDA computational biologist who works on TASSEL, a software program that analyzes sequence data to find markers associated with plant traits, also attended the workshop.
There’s no good public, open source solution,” said Mueller. “GOBII will solve the problem of how to manage data efficiently.
Hima Bindu, a scientist who generates genomics resources for sorghum at ICRISAT, the International Crops Research Institute for the Semi-Arid Tropics, in Hyderabad, India, found the workshop to be helpful in deciding how to curate their existing and future data so that it fits into the GOBII database. She said that while generating the data is easy, the analysis of huge data sets for association with complex traits and determining how best to use the data for breeding purposes is the important part.
Using the feedback from the workshop, GOBII researchers will begin construction of the database and evaluate their process at a meeting in San Diego in January.
“The plant people moved into the big data realm,” said Ramil Mauleon, a bioinformatics specialist at IRRI, “and now we have to find a way to get a handle on it.”