Amazon VGT2 Las Vegas: Unlocking Dark Data on AWS with Tape-to-Cloud Migration

Amazon VGT2 Las Vegas: Unlocking Dark Data on AWS with Tape-to-Cloud MigrationMore Info

The oil and gas sector is grappling with significant hurdles in accessing subsurface data stored on tape media. Much of this information was recorded on various tape formats, some dating back over 25 years, encompassing everything from 9-track tapes to LTO and 3592 cartridges. These extensive collections are often held in offsite vaults, cut off from the advantages that cloud computing offers.

Subsurface data is inherently complex and slow to retrieve when confined to tapes. Additionally, the cost of maintaining data in tape vaults can be considerably higher than that of cloud storage. This legacy tape data hinders business innovation and agility, limiting the scalability that Amazon Web Services (AWS) can provide. Such collections of tape-bound data are frequently omitted from business processes, rendering them as “dark” data; while they exist, they remain largely invisible to users and cannot leverage new scientific algorithms or the vast capacities of cloud infrastructure.

With deep roots in the oil and gas industry, Tape Ark is well-equipped to resolve the challenges of migrating various tape formats, including those from the 1960s and 70s. Understanding how to safely and effectively transfer this data to AWS is a core competency of Tape Ark, recognized as an AWS Select Technology Partner.

Clients often possess large collections of tapes—sometimes numbering in the tens of thousands or even millions—containing data volumes that can reach tens of petabytes. Conducting large-scale tape migration is a complex and resource-heavy task, yet it is vital for oil and gas companies aiming to unlock the full potential of their data. Powered by AWS, Tape Ark is uniquely positioned to facilitate mass migrations efficiently.

Migrating Data from Tapes to Cloud

Before the advent of cloud technology from AWS, oil and gas companies relied on tape to transfer data to and from vessels, platforms, exploration units, and joint venture partners. Given the long retention requirements typical in the industry, tapes are frequently moved to long-term archives in offsite storage until needed again. The oil and gas sector is structured around moving tape-bound data to higher-density tapes every 3 to 5 years to ensure continued accessibility. As tape is not a random-access storage medium, the arrangement of data on tapes is crucial for logical retrieval when required.

Today, legacy seismic data in the energy sector remains underutilized, primarily because it resides on tape media. By migrating and storing this data in the cloud, oil and gas companies can harness greater value through artificial intelligence (AI) and machine learning (ML) applications, such as seismic interpretation and advanced data processing.

Tape Ark’s solution has led to successful data migrations from tapes to AWS for various oil and gas operators. For instance, a recent project involved an oil company utilizing 1 million virtual CPUs (vCPUs) to achieve results in record time—an accomplishment unattainable with data stored on tapes. Currently, Tape Ark is engaged in additional initiatives involving tens of millions of tapes and over 400 petabytes of exploration data.

Solution Overview

Tape Ark and AWS have developed a streamlined solution to liberate subsurface data to the cloud, making it accessible for analytics, machine learning, and collaborative workflows. The process begins with receiving media and conducting a comprehensive tape media audit, which allows companies to accurately predict their cloud footprint, identify duplicates, and exclude irrelevant data.

Post-audit, all data is ingested into the designated cloud account using Tape Ark’s scalable technology stack. As data is transferred to a client’s AWS account, automated checksum and name validation checks are performed. These checks can be pre-prepared by the oil company, facilitating real-time quality control during data ingestion.

Once the ingestion and automated quality control processes are complete, data tiering policies automatically manage the movement of data to the requested storage tier. JSON metadata manifest files are generated and placed into the customer’s AWS accounts to keep their internal databases updated. Completed tapes can then be either disposed of through Tape Ark-certified services or returned to the client.

The workflow solution begins with Tape Ark’s internal rapid mass tape ingestion platform, Arkbridge, designed to leverage the security and capabilities of the AWS Cloud. This system allows for a highly automated IoT approach, minimizing manual processes in data ingestion to AWS client accounts, ensuring high accuracy and efficiency.

Subsurface tape data is converted into objects and stored in Amazon Simple Storage Service (Amazon S3), whether through a direct tape-to-cloud ingest or via AWS Snowball. The data is stored in Amazon S3 Standard for quick access and Amazon S3 Glacier for deep archiving. Ultimately, this object data in S3 serves as the foundation for clients’ seismic and machine learning applications.

Conclusion

This article outlines Tape Ark’s successful approach to unlocking value from subsurface data trapped on legacy tapes by ingesting and cataloging it into AWS. Transitioning subsurface data to the cloud not only boosts productivity but also enhances the speed at which insights can be gathered from “dark” data. Oil and gas operators can reanalyze their data and implement AI/ML workflows, reintegrating their dark data into contemporary operations.

For further insight, check out this blog post on the topic. Additionally, for authoritative information, you can refer to this resource. If you’re interested in learning more about the hiring process at AWS, visit this excellent resource.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *