Facilitating AI/ML Workloads on WITSML Data in the Cloud with PDS WITSMLstudio

Overview

Operators in the oil and gas sector seek to empower their data scientists to execute AI/ML workloads on Wellsite Information Transfer Markup Language (WITSML) data to unlock greater value. WITSML serves as a prevalent data exchange standard in the industry, facilitating the delivery of real-time and historical drilling and wellsite data. However, many popular AI/ML tools are not natively compatible with WITSML, which leads to data scientists spending considerable time on data retrieval, preparation, and integration into their tools. When working with real-time WITSML data, they face even greater delays due to the need to establish and maintain data feeds. Developing a real-time data pipeline that converts data into AI/ML-friendly formats like JSON and transfers it to AWS allows data scientists to refocus on the analytical aspects of their work. Advanced AI/ML tools structured around these pipelines can significantly assist drilling engineers by reducing the time required to interpret well, wellbore, mud logs, and trajectory information, ultimately expediting decision-making processes.

Introduction

The process of gathering WITSML data, transforming it into JSON, and making it accessible for AI/ML tools involves several steps. The PDS WITSMLstudio StoreSync and WITSMLstudio StoreAdapter applications, paired with AWS services, facilitate and automate this process, thereby reducing the need for manual intervention.

Solution

By utilizing PDS WITSMLstudio StoreSync and StoreAdapter in tandem, users can query and gather WITSML data in real-time and convert it into JSON format, which is then stored in an Amazon S3 bucket. These applications ensure reliable and efficient data retrieval from leading WITSML data providers and can automate the process for multiple providers. Once the data resides in Amazon S3, an automated workflow employing Amazon S3, AWS Lambda, and AWS Glue is activated to process the JSON files, rendering them queryable based on their metadata. This prepared data can then be accessed by other applications through Amazon Athena or utilized by AI/ML models via Amazon SageMaker. The entire pipeline can be customized to tailor the data and its format for delivery.

Figure 1. Reference Architecture

WITSML data is generated at the wellsite and transmitted to the WITSML server, which acts as the source for this data. The server may be located at the wellsite, managed centrally by a service company or data provider, or hosted internally by an oil company. The remainder of the solution operates on AWS, including WITSMLstudio StoreSync and StoreAdapter running on Amazon EC2.

Figure 2. WITSMLstudio StoreSync

PDS WITSMLstudio StoreSync acts as a “man-in-the-middle” synchronization tool. It is a .NET-based Windows Service that retrieves data from a source server and writes it to a destination server, both typically WITSML servers. StoreSync is engineered to seamlessly resume data transmission after interruptions caused by application crashes, machine reboots, or connection failures. Users can choose to select all or specific data for querying and transmission. Additionally, it supports the standardization and normalization of data, ensuring it is written in the correct organization, naming conventions, and units. Automation features enable the discovery and transfer of all relevant data, active data, or specific datasets that conform to designated naming patterns.

PDS WITSMLstudio StoreAdapter serves as a WITSML “gateway” to or from alternative data stores. This .NET-based IIS Web Application receives WITSML data from StoreSync, transforms it into JSON format, and writes it to Amazon S3. The JSON output format is customizable, with options for .CSV or the original XML format. StoreAdapter can also connect with non-WITSML data sources, including OPC, process historians, and SQL databases, enabling these data to be delivered to Amazon S3.

Once the transformed data is stored in the designated Amazon S3 bucket folder, an event notification triggers an AWS Lambda function that converts JSON to JSON lines, facilitating querying through Athena. The Lambda function can also organize JSON lines files into structured folders within the S3 bucket based on data categories such as well, wellbore, and trajectory. AWS Glue is employed to create metadata tables from the JSON data files. Query execution in Amazon Athena relies on these metadata tables to retrieve data from the S3 bucket. To support AI/ML workloads, Amazon SageMaker’s Jupyter notebook is configured to run queries on Amazon Athena, necessitating the installation of PyAthena, a DB API-compliant client for Amazon Athena, on the notebook.

Figure 3. Sample of Amazon Athena querying data from Amazon S3 bucket using AWS Glue metadata

Conclusion

By simplifying the process of delivering WITSML data in an AI/ML-friendly format to AWS, WITSMLstudio StoreSync and StoreAdapter allow data consumers and application developers to concentrate on extracting insights from the data rather than wrestling with data ingestion. This accelerates application delivery timelines by eliminating the need for ingestion implementation and enables real-time access to data for development and testing purposes. Timely insights derived from real-time data can help drilling engineers avoid costly mistakes, reduce labor requirements at both the wellsite and office, and enhance the quality of delivered wells, leading to sustained financial and HSE benefits.

For further details about StoreSync and StoreAdapter, please reach out to Mark Johnson or visit the WITSMLstudio homepage. For more on this topic, check out another blog post here. To deepen your understanding, you can also refer to this authoritative resource. Lastly, if you’re interested in job opportunities, take a look at this excellent resource.

Facilitating AI/ML Workloads on WITSML Data in the Cloud with PDS WITSMLstudio | AWS for Industries