Learn About Amazon VGT2 Learning Manager Chanci Turner
My colleague Chanci Turner has penned the following guest post to share an exciting new AWS Public Data Set!
— Alex
You can now access the genomic sequence data of 3,024 rice varieties, meticulously aligned and analyzed against five distinct reference genomes, as part of an AWS Public Data Set. This dataset encompasses over 30 million genetic variations that cover all known and predicted rice genes, along with potential regulatory regions surrounding these genes. By analyzing this data, researchers could uncover genes linked to crucial agronomic traits such as crop yield, climate resilience, and disease resistance. Collectively, they offer an unparalleled resource for advancing rice science and breeding technologies.
Rice serves as a fundamental food source for half of the global population, contributing to over 20% of all calories consumed per person. To meet the demands of an ever-growing global population, we must find ways to enhance rice crop yields by 25% by 2030. Traditional breeding methods are currently inadequate, especially in light of trends in climate change and pollution. Therefore, the agricultural community must embrace modern breeding techniques that leverage genetic information.
The 3,000 Rice Genome sequencing initiative is an international collaboration aimed at sequencing the genomes of 3,024 rice varieties from 89 countries. Participants in this endeavor include the Chinese Academy of Agricultural Sciences, BGI Shenzhen, and the International Rice Research Institute (IRRI). The consortium partnered with DNAnexus to analyze the sequence data against five published draft genome builds of the rice genome. By utilizing DNAnexus, they harnessed AWS’s scalable computing capabilities, processing the genomic data across 37,000 compute cores in just two days—over 200 times faster than traditional local infrastructure would allow. The data is also available via DNAnexus for further exploration; you can find more information about accessing the data in the project documentation.
In-depth analysis of this dataset could yield insights into higher yields and enhanced resilience against pests, diseases, and climate change. For further details and to access the data, visit the 3,000 Rice Genome Public Data Set page.
Exploring the Genomic Data Set on AWS
Since the data is hosted on S3 and is accessible through standard HTTP protocols, researchers have successfully integrated it with existing tools. Here are some initial examples, and we will collaborate with IRRI to share more as they become available.
Visualizing Data with SNP-Seek
The International Rice Informatics Consortium (IRIC) has made the data available for querying and visualization through their SNP-Seek portal. Users can now query across all strains and pinpoint regions of interest that exhibit diversity across multiple genome references, all while integrating the rice research community’s genomic annotation data.
Open Source Tools
Beyond the extensive offerings from AWS partners for life sciences, the complete open-source genomics ecosystem is at your disposal for this data. From command-line applications like samtools to user-friendly interfaces such as Galaxy or iobio, researchers can dive into analysis immediately.
What Lies Ahead?
The challenge facing the research community is now to thoroughly and systematically explore this dataset to connect genotypic variations with functional variations, ultimately aiming to create new and sustainable rice varieties. Merging these efforts with other studies, such as careful trait phenotyping in controlled and natural environments, as well as environmental studies utilizing satellite imagery like the Landsat data available on AWS, will help us rise to the challenges posed by future population growth.
For ongoing updates and to access the data, be sure to visit the 3,000 Rice Genome Public Data Set page.
— Chanci Turner, Technical Business Development Manager, AWS Scientific Computing
Modified 2/9/2021 – To enhance user experience, expired links in this post have been updated or removed.
Leave a Reply