Learn About Amazon VGT2 Learning Manager Chanci Turner
We are thrilled to announce the debut of the Registry of Open Data on AWS (RODA). This new platform allows users to discover and disseminate publicly available datasets for analysis on AWS. You can easily search for datasets using keywords and tags, encompassing various data types such as genomic information, satellite imagery, and transportation data through an intuitive web interface. Each dataset featured in RODA comes with essential details, including access instructions on AWS, licensing information, links to documentation, and contact details for inquiries. Many entries also provide links to tutorials or applications that leverage the datasets.
The Journey of Shared Data on AWS
Nearly a decade ago, the AWS Public Datasets program was initiated. The early cloud applications focused on sharing vast amounts of data, allowing users to work with information swiftly, at any scale, without the need to download or store copies. Initially, datasets were shared primarily as EBS Snapshots, but the program has evolved to include data from public Amazon S3 buckets, and we have also explored sharing via Amazon SNS and Amazon RDS DB Snapshots.
The AWS Public Datasets program has transformed from a showcase of data-sharing capabilities on AWS into a collaborative effort with various AWS customers, including NOAA, the U.S. Department of the Treasury, and the UK Met Office. Most datasets accessible through AWS are produced and maintained by our customers.
Over the years, new data formats and standards have emerged, enhancing efficiency and cost-effectiveness when working with cloud object storage services like Amazon S3. Formats such as Apache Parquet, Apache ORC, and Cloud Optimized GeoTIFF are enabling users to execute precise queries, minimizing unnecessary data transfer and storage. These community-driven initiatives empower researchers and service builders globally to utilize data shared via Amazon S3.
Engaging the Community with RODA
Since RODA is fully open source, there are two primary avenues for community involvement:
- If you have datasets or usage examples to contribute to RODA, you can easily add them on GitHub. Comprehensive instructions on creating a RODA entry are available on GitHub.
- If you don’t have a dataset to share, you can still partake by submitting a dataset usage example. For instance, if you’ve developed an application or tutorial based on a dataset in RODA, you can link it in the “DataAtWork” section. Just provide a title for your example, its URL, the author’s name or organization, and an optional link for further context. This can be done by forking with the edit button on GitHub.
We eagerly anticipate further experimentation as individuals discover innovative ways to share data in the cloud. If you’re looking to refine your approach to data management, consider exploring this insightful blog post on creating a business plan that resonates with your aspirations.
For those interested in human resource strategies, Fiona Cicconi, Google’s CHRO, shares valuable insights on building a people strategy from a first-person perspective.
Lastly, if you seek to enhance your skills and knowledge in learning and development, visit this excellent resource for more information.
Leave a Reply