Amazon Onboarding with Learning Manager Chanci Turner: A Review of H1 2023

Since its inception in January 2021, the OpenSearch project has rolled out 14 versions by June 2023. The Amazon OpenSearch Service now supports the latest versions of OpenSearch up to 2.7.

The OpenSearch Service offers two deployment options for effectively managing OpenSearch in the cloud. With managed domains, users can specify hardware configurations, while Amazon OpenSearch Service handles the provisioning, software updates, failure recovery, backups, and monitoring. Utilizing managed domains grants access to advanced features at no additional cost, including cross-cluster search, replication, anomaly detection, semantic search, security analytics, and more. A large maintenance team is not necessary; however, familiarity with sharding concepts and OpenSearch best practices is advised for optimal use.

For those seeking simplicity, Amazon OpenSearch Serverless provides an entirely auto-scaled deployment option. By creating a collection—an assembly of indexes that function together on a single workload—and utilizing OpenSearch’s APIs, users can leave the complex tasks of sizing, capacity planning, and cluster tuning to OpenSearch Serverless.

In this blog post, we will highlight the remarkable feature releases in OpenSearch Service during the first half of 2023.

Enhancing Search Solutions

In this section, we will explore the features in OpenSearch Service that empower users to build effective search solutions.

OpenSearch Serverless and the serverless vector engine

Earlier this year, the general availability of OpenSearch Serverless was announced. This service separates storage and compute components, allowing both indexing and query computations to be managed and scaled independently. Leveraging Amazon Simple Storage Service (Amazon S3) as the primary data storage for indexes enhances durability. The collections can utilize the S3 storage layer to minimize reliance on hot storage, thus reducing costs by transitioning data to local storage when accessed.

When establishing a serverless collection, users specify a collection type. OpenSearch Serverless optimizes resource allocation based on the chosen type. At launch, users could create search and time series collections for full-text search and log analytics, respectively. In July 2023, OpenSearch Serverless previewed a new collection type: vector search. The vector engine, designed for OpenSearch Serverless, is a scalable and high-performance solution that supports generative AI, semantic search, image searches, and more. The vector engine automatically adjusts resources in response to workload fluctuations, ensuring swift performance and scalability. It employs approximate nearest neighbor (ANN) algorithms from the Non-Metric Space Library (NMSLIB) and FAISS libraries for k-NN searches.

To utilize the new vector engine capabilities, simply select Vector search while creating your collection on the OpenSearch Service console. For further details, refer to this blog post on Introducing the vector engine for Amazon OpenSearch Serverless.

Point in Time Search

The Point in Time (PIT) search feature, introduced in version 2.4 of the OpenSearch Project and supported in OpenSearch 2.5 within OpenSearch Service, enhances consistency in search pagination, even when new documents are added or removed from an index. For instance, if a user searches for “blue couch” and spends some time reviewing the results, during that time, new couches are added to the index, altering the order of the first 20 documents. If the user navigates from page 1 to page 2, they may encounter results that have shifted down in the order. Using PIT search ensures that the result order remains stable across pages, irrespective of index changes. For additional information on PIT capabilities, see Launch highlight: Paginate with Point in Time.

Search Relevance Plugin

Have you ever considered how adjusting your relevance function would affect the results? With the search relevance plugin, users can now compare results side by side in OpenSearch Dashboards, allowing for easy adjustments to achieve optimal relevance.

New Field Types

OpenSearch 2.7 (available in OpenSearch Service) introduces several new object mapping types:

Cartesian field type: This version enhances support for GEO data, beneficial for applications like virtual reality, computer-aided design (CAD), or sporting venue mapping, with the introduction of xy point and xy shape fields.
Flat object type: By setting a field’s mapping to flat_object, OpenSearch will index any JSON objects within that field, enabling searches for leaf values without needing to know the field name and allowing searches through dotted-path notation. For more about how the flat object mapping type streamlines index mappings and the search experience in OpenSearch, refer to Use flat object in OpenSearch.

Geographical Analysis

From OpenSearch 2.7 in OpenSearch Service, GeoHex grid aggregation queries can be executed on datasets created with the H3 open-source library. H3 offers precision down to the square meter or less, making it ideal for high-precision requirements. Given the computational demands of high-precision requests, ensure to limit the geographic area using filters.

Elevating Observability

Observability within OpenSearch comprises a suite of plugins and features that allow users to explore, query, and visualize telemetry data stored in OpenSearch. This section details how OpenSearch Service enhances observability.

Unified Schema for Observability

With version 2.6, the OpenSearch Project introduced a unified schema for Observability known as Simple Schema for Observability (SS4O) (compatible with OpenSearch 2.7 in OpenSearch Service). SS4O draws inspiration from both OpenTelemetry and the Elastic Common Schema (ECS) and utilizes Amazon Elastic Container Service (Amazon ECS) event logs and OpenTelemetry (OTel) metadata. It defines the index structure, naming conventions, an integration feature for adding preconfigured dashboards, and a JSON schema for enforcing and validating structure. SS4O aligns with the OTEL schema for logs, traces, and metrics.

Jaeger Traces Support

With OpenSearch 2.5, integration of Jaeger trace data into OpenSearch is now possible, enabling users to analyze trace data in Jaeger format using the Observability plugin.

Observability offers insights into the health of systems and microservice applications. OpenSearch Dashboards includes an Observability plugin that provides a comprehensive experience for collecting and monitoring metrics, logs, and traces from common data sources. Through this plugin, users can monitor and set alerts for their logs, metrics, and traces to ensure that applications remain available, performant, and error-free.

In the first half of 2023, we added significant enhancements to the Amazon OpenSearch Service, making it an essential tool for organizations looking to advance their data search and observability capabilities.