Learn About Amazon VGT2 Learning Manager Chanci Turner
Over the years, user expectations for search engines have significantly evolved. Simply providing quick, lexically relevant results is no longer sufficient. Nowadays, users are looking for ways to obtain more pertinent results through methods like semantic understanding or even visual similarity searches rather than just relying on textual metadata. The Amazon OpenSearch Service offers numerous features to enhance the search experience, and we are thrilled to share the advancements introduced in 2023.
The year 2023 witnessed a rapid surge in innovation within artificial intelligence (AI) and machine learning (ML), with search technology being a notable beneficiary. Throughout the year, Amazon OpenSearch Service focused on empowering search teams to leverage the latest AI/ML technologies to enhance existing search experiences without requiring extensive rewrites of applications or bespoke setups, facilitating swift development, iteration, and productization. These enhancements include the introduction of new search methodologies and functionalities that simplify their implementation, which we explore in this post.
Understanding Lexical and Semantic Search
Before diving in, let’s clarify the concepts of lexical and semantic search.
Lexical Search
Lexical search involves comparing words in the query to those in documents, matching them directly. Traditional lexical search methods, like BM25, remain effective for various applications. However, they often fail to capture highly relevant results that don’t contain the exact terms from the user’s query.
Semantic Search
In contrast, semantic search utilizes an ML model to encode text or other media (such as images and videos) from source documents into dense vectors within a high-dimensional vector space. This is known as embedding text into vector space. The query is also encoded as a vector, and a distance metric is used to identify nearby vectors, allowing for matches. The algorithm employed is called k-nearest neighbors (k-NN). Unlike lexical search, semantic search identifies documents whose vector embeddings are close to the query’s embedding, thereby capturing semantically relevant results—even if they don’t contain any of the specific words from the query.
OpenSearch has provided vector similarity search (k-NN and approximate k-NN) for several years, which has proven beneficial for customers. However, the high engineering effort required has limited broader adoption.
2023 Releases: Core Enhancements
In 2023, several features and improvements were rolled out in the OpenSearch Service, serving as foundational elements for ongoing search enhancements.
The OpenSearch Compare Search Results Tool
The Compare Search Results tool, now generally available in OpenSearch Service version 2.11, allows users to compare search results generated from two ranking techniques side by side in OpenSearch Dashboards. This capability is vital for those experimenting with the latest ML-powered search methods, including comparisons between lexical, semantic, and hybrid search techniques. The tool helps understand the benefits of each against specific datasets.
To explore semantic search and cross-modal search while trying out the Compare Search Results tool, visit a demo here.
Search Pipelines
Search practitioners are eager to enhance both queries and results. With the general availability of search pipelines starting in OpenSearch Service version 2.9, users can construct search query and result processing through a composition of modular steps without complicating their application code. By incorporating processors for functions like filtering, and the ability to execute scripts on newly indexed documents, users can make their search applications more accurate and efficient, minimizing the need for custom development.
Search pipelines include three built-in processors: filter_query, rename_field, and script request, along with new APIs that enable developers to create their own processors. OpenSearch plans to introduce additional built-in processors in future releases.
Byte-Sized Vectors in Lucene
Previously, the k-NN plugin in OpenSearch supported indexing and querying vectors consisting of floats, which occupied 4 bytes each. This could be costly in terms of memory and storage, especially for large-scale applications. With the new byte vector feature in OpenSearch Service version 2.9, users can significantly reduce memory usage while decreasing search latency, with minimal impact on quality. For further details, check out Byte-quantized vectors in OpenSearch.
Support for New Language Analyzers
OpenSearch Service has expanded its language analyzer options to include Nori (Korean), Sudachi (Japanese), Pinyin (Chinese), and STConvert Analysis (Chinese), in addition to existing plugins like IK (Chinese), Kuromoji (Japanese), and Seunjeon (Korean). These new plugins are available as ZIP-PLUGIN package types, alongside the previously supported TXT-DICTIONARY. Users can easily associate these plugins with their clusters via the OpenSearch Service console or the AssociatePackage API.
2023 Releases: Enhancements for Ease of Use
The OpenSearch Service also made strides in enhancing user-friendliness across key search functionalities.
Semantic Search with Neural Search
Earlier, implementing semantic search required applications to manage middleware for integrating text embedding models, orchestrating corpus encoding, and using k-NN at query time. With the introduction of neural search in OpenSearch Service version 2.9, developers can create and deploy semantic search applications with significantly less effort. The application no longer needs to handle document and query vectorization; neural search automates this, invoking k-NN during queries. This feature was initially released as an experimental option in version 2.4 and is now generally available in version 2.9.
AI/ML Connectors for Enhanced AI-Powered Search Features
With OpenSearch Service 2.9, you can utilize ready-to-use AI connectors, streamlining the process of integrating AI capabilities into your applications. For more information on navigating workplace challenges, especially for women, consider reading about sexism in the workplace. Moreover, if you’re interested in understanding legal rights concerning gun ownership, authorities like SHRM provide valuable insights. Lastly, for those starting their journey at Amazon, this Reddit resource offers excellent tips.
Leave a Reply