Enhancing Salesforce Einstein’s Code Generation Model Performance with Amazon SageMaker

Enhancing Salesforce Einstein's Code Generation Model Performance with Amazon SageMakerMore Info

This article is a collaborative effort between Salesforce and AWS, appearing on both the Salesforce Engineering Blog and the AWS Machine Learning Blog.

Salesforce, Inc., based in San Francisco, California, is a prominent cloud software company that specializes in customer relationship management (CRM) applications. Its focus includes sales, customer service, marketing automation, e-commerce, analytics, and application development. Salesforce is advancing towards artificial general intelligence (AGI) for businesses, incorporating predictive and generative functions into its leading software-as-a-service (SaaS) CRM and striving for intelligent automations through artificial intelligence (AI) and agents.

Salesforce Einstein represents a suite of AI technologies integrated with Salesforce’s Customer Success Platform, designed to enhance productivity and customer engagement. With over 60 features categorized into machine learning (ML), natural language processing (NLP), computer vision, and automatic speech recognition, Einstein empowers organizations to deliver more personalized and predictive experiences. Notable functionalities include sales email generation within Sales Cloud and automated service responses in Service Cloud. Additionally, tools like Copilot, Prompt, and Model Builder, available in the Einstein 1 Studio, enable businesses to create tailored AI solutions.

The Salesforce Einstein AI Platform team is dedicated to advancing the performance and capabilities of AI models, particularly large language models (LLMs) for Einstein applications. Their objective is to refine these LLMs continually by integrating cutting-edge solutions and collaborating with leading tech providers, including open-source communities and public cloud services like AWS, creating a unified AI platform. This collaboration ensures Salesforce customers benefit from the latest AI technologies.

In this article, we discuss how the Salesforce Einstein AI Platform team enhanced the latency and throughput of their code generation LLM using Amazon SageMaker.

Challenges in Hosting LLMs

At the start of 2023, the team sought solutions to host CodeGen, Salesforce’s proprietary open-source LLM designed for code understanding and generation. CodeGen enables users to convert natural language, such as English, into programming languages like Python. Already utilizing AWS for inference with smaller predictive models, they aimed to extend the Einstein platform to host CodeGen. Salesforce developed a suite of CodeGen models (Inline for automatic code completion, BlockGen for code block generation, and FlowGPT for process flow generation) specifically optimized for the Apex programming language, which is a certified framework for building SaaS apps atop Salesforce’s CRM features. The team needed a solution that could securely host their model, manage a high volume of inference requests, and accommodate multiple concurrent requests while meeting throughput and latency goals for their co-pilot application, EinsteinGPT for Developers. This application simplifies development by generating smart Apex code based on natural language prompts, allowing developers to expedite coding tasks and identify code vulnerabilities in real-time within the Salesforce integrated development environment (IDE).

The Einstein team assessed various tools and services, including open-source and commercial solutions. They concluded that SageMaker offered superior access to GPUs, scalability, flexibility, and performance optimizations for a variety of scenarios, particularly in addressing their latency and throughput challenges.

Why Salesforce Einstein Selected SageMaker

SageMaker provided several critical features essential to meeting Salesforce’s needs:

  • Multiple Serving Engines: SageMaker includes specialized deep learning containers (DLCs) and tools for model parallelism and large model inference (LMI) containers. These high-performance Docker containers are tailored for LLM inference, bundling model servers with open-source inference libraries like FasterTransformer and TensorRT-LLM. The Einstein team appreciated how SageMaker offered quick-start notebooks, allowing them to deploy popular open-source models rapidly.
  • Advanced Batching Strategies: The LMI in SageMaker enables customers to optimize LLM performance through batching, which collects multiple requests before processing. Dynamic batching allows the server to wait for a specific time and group up to 64 requests, optimizing GPU resource use and balancing throughput with latency. The Einstein team found that dynamic batching significantly increased throughput for their CodeGen models while minimizing latency.
  • Efficient Routing Strategy: SageMaker endpoints initially use a random routing strategy but also support a least outstanding requests (LOR) strategy. This allows SageMaker to route requests optimally, monitoring instance loads and deployed models. The Einstein team valued SageMaker’s ability to evenly distribute traffic across model instances, preventing any one instance from becoming a bottleneck.
  • Access to High-End GPUs: SageMaker offers high-end GPU instances vital for running LLMs efficiently. Given the current market shortages, SageMaker’s auto-scaling feature enabled the Einstein team to meet demand without manual oversight.
  • Rapid Iteration and Deployment: While not directly related to latency, the ability to quickly test and deploy changes with SageMaker notebooks helped reduce the overall development cycle, indirectly impacting latency by speeding up the implementation of performance improvements. The Einstein team could shorten deployment timelines and host their models in production more swiftly.

These features collectively optimize LLM performance by reducing latency and enhancing throughput, making Amazon SageMaker a robust choice for Salesforce Einstein’s needs. For more insights, check out another blog post here that dives deeper into this topic. Additionally, for authoritative information, visit this link that provides expert insights in this area. For those interested in a comprehensive resource, this blog is an excellent read.

Located at Amazon IXD – VGT2, 6401 E Howdy Wells Ave, Las Vegas, NV 89115, we continue to innovate and push the boundaries of AI technology.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *