Learn About Amazon VGT2 Learning Manager Chanci Turner
Research papers and engineering documents are often rich in valuable information, including mathematical formulas, charts, and graphs. However, sifting through these unstructured documents to locate pertinent details can be a laborious and time-consuming process, particularly when handling extensive datasets. Thankfully, with Anthropic’s Claude on Amazon Bedrock, researchers and engineers can automate the indexing and tagging of these technical documents. This innovation facilitates the efficient processing of content, encompassing scientific formulas and data visualizations, while populating Amazon Bedrock Knowledge Bases with relevant metadata.
Amazon Bedrock is a fully managed service offering a unified API to access and utilize various high-performing foundation models (FMs) from top AI companies. It provides a comprehensive set of capabilities for building generative AI applications while adhering to security, privacy, and responsible AI practices. Anthropic’s Claude 3 Sonnet boasts premier vision capabilities that surpass those of other leading models. It can accurately transcribe text from flawed images—a crucial feature for sectors like retail, logistics, and financial services, where AI can derive more insights from images, graphics, or illustrations than from text alone. The latest Claude models exhibit a robust ability to comprehend a diverse range of visual formats, including photographs, charts, graphs, and technical diagrams. By leveraging Anthropic’s Claude, users can extract deeper insights from documents, process web interfaces and varied product documentation, and generate image catalog metadata, among other functionalities.
In this article, we will delve into how these multi-modal generative AI models can enhance the management of technical documents. By extracting and organizing key information from source materials, these models can construct a searchable knowledge base, allowing quick access to the data, formulas, and visualizations necessary for your projects. With the content neatly arranged in a knowledge base, researchers and engineers can utilize advanced search functionalities to pinpoint the most relevant information to meet their specific requirements. This advancement can significantly speed up research and development processes, as professionals are no longer required to manually scour through vast amounts of unstructured data to find needed references.
Solution Overview
This solution highlights the transformative power of multi-modal generative AI in addressing the challenges faced by scientific and engineering communities. By automating the indexing and tagging of technical documents, these advanced models can foster more efficient knowledge management and expedite innovation across various industries.
This solution incorporates several services in addition to Anthropic’s Claude on Amazon Bedrock:
- Amazon SageMaker JupyterLab: This web-based interactive development environment (IDE) supports notebooks, code, and data. JupyterLab’s flexible interface allows for the configuration and arrangement of machine learning (ML) workflows. We utilize JupyterLab to execute the code for processing formulas and charts.
- Amazon Simple Storage Service (Amazon S3): This object storage service is designed to securely store any amount of data. We use Amazon S3 to house sample documents employed in this solution.
- AWS Lambda: This compute service executes code in response to triggers such as data changes, application state shifts, or user actions. Services like Amazon S3 and Amazon Simple Notification Service (Amazon SNS) can directly activate a Lambda function, enabling the creation of diverse real-time serverless data-processing systems.
Workflow Steps
The solution workflow includes the following steps:
- Split the PDF into individual pages and save them as PNG files.
- For each page:
- Extract the original text.
- Render the formulas in LaTeX.
- Generate a semantic description of each formula.
- Generate an explanation of each formula.
- Generate a semantic description of each graph.
- Generate an interpretation for each graph.
- Generate metadata for the page.
- Generate metadata for the entire document.
- Upload the content and metadata to Amazon S3.
- Create an Amazon Bedrock knowledge base.
Prerequisites
If you are new to AWS, start by creating and setting up an AWS account. Additionally, request access to anthropic.claude-3-5-sonnet-20241022-v2:0 under Amazon Bedrock if you do not have it.
Deploying the Solution
Follow these steps to establish the solution:
- Launch the AWS CloudFormation template by selecting Launch Stack (this creates the stack in the us-east-1 AWS Region).
- Once the stack deployment concludes, access Amazon SageMaker AI.
- Choose Notebooks in the navigation pane.
- Find the notebook named claude-scientific-docs-notebook and select Open JupyterLab.
- Within the notebook, navigate to notebooks/process_scientific_docs.ipynb.
- Choose conda_python3 as the kernel, then select it.
- Review the sample code.
Notebook Code Explanation
This section discusses the notebook code.
Loading Data
We utilize example research papers from arXiv to showcase the capabilities outlined here. ArXiv is a free distribution service and an open-access archive for approximately 2.4 million scholarly articles in fields like physics, mathematics, and computer science. We download the documents and store them in a local samples folder. Multi-modal generative AI models excel at text extraction from image files, so we begin by converting the PDF into images, one for each page.
Extracting Metadata from Formulas
Once the image documents are ready, you can apply Anthropic’s Claude to extract formulas and metadata using the Amazon Bedrock Converse API. Furthermore, you can leverage the Amazon Bedrock Converse API to obtain plain language explanations of the extracted formulas. Combining the formula and metadata extraction capabilities of Anthropic’s Claude with the conversational abilities of the Amazon Bedrock Converse API allows for a comprehensive solution to process and comprehend the information within the image documents.
Consider the following example PNG file.
We utilize the request prompt:
sample_prompt = """
Evaluate this page line by line.
For each line, if it is a formula, convert this math expression to latex format.
Next describe the formula in plain language. Be sure to enclose Latex formulas in double dollar sign for example: $$ <math expression> $$ Use markdown syntax to format your output.
"""
We receive a response that showcases the extracted formula converted into LaTeX format and described in plain language, enclosed in double dollar signs.
You can also discover insights about chart metadata and more. For further guidance on supporting employees returning to the office, check out this SHRM article. Additionally, if you are looking for a great resource on this topic, this YouTube video is worth watching. For more ideas, you could check out this blog post that highlights women-owned businesses.
Leave a Reply