Learn About Amazon VGT2 Learning Manager Chanci Turner
In this guest blog, we invited AWS APN Partner Polystream’s Platform Architect, Alex Reed, to share insights into how Polystream leverages AWS services, including Amazon Elastic Compute Cloud (EC2), to deliver 3D interactivity at scale.
The cloud gaming landscape is evolving, enabling immersive 3D applications such as interactive car configurators and collaborative development tools, but achieving these experiences at scale has remained a challenge. For instance, how can a cloud-first game support a Fortnite-like peak of 10 million concurrent users when existing technologies typically cap out at a few thousand concurrent users?
The success of mass-market content, particularly in cloud gaming, hinges on the ability to achieve large-scale delivery. However, achieving this scalability is currently difficult. Traditional methods that rely on cloud-based GPUs often fail to provide the flexibility and elasticity promised by cloud solutions when it comes to streaming 3D interactive content. The availability of suitable GPUs is limited, and costs can be prohibitive. At Polystream, with the backing of AWS Game Tech, we are reimagining the delivery of 3D interactivity at scale.
Current systems utilize virtual machines (VMs) equipped with GPUs to stream video to each user. Polystream’s innovative Command Streaming technology eliminates this dependency on cloud hardware by utilizing a software-defined architecture that transmits graphics commands instead of video. This approach connects the cloud’s computational power to the billions of GPUs found in gaming consoles, smartphones, and computers, creating scalability previously thought unattainable.
Our globally distributed, multi-cloud service, the Polystream Platform, supports this revolutionary Command Streaming technology. Designed to leverage our unique capability of not relying solely on cloud-based GPUs, the platform can provision, manage, and orchestrate vast numbers of interactive streams across any cloud provider globally.
In November 2019, we set out to showcase the Command Streaming technology’s ability to deliver significant user concurrency. Our aim was to achieve levels of concurrent usage that would be exceedingly challenging with conventional GPU video streaming. We focused on handling elastic provisioning, deployment, operation, monitoring, intelligence collection, and the seamless teardown of tens of thousands of interactive streams within a single day.
To facilitate this concurrency scaling test, we collaborated with AWS Game Tech, which supported the trial and powered the majority of the streaming compute resources. They provided the necessary virtual machines in Amazon EC2 to accommodate our planned levels of concurrent users.
Prior to this large-scale test, we conducted several smaller trials to address any potential provisioning and deployment challenges. These initial tests began at 1,000 concurrent users, scaling to 5,000, then 10,000, and culminating in our largest test to reach the target CCU.
CCU Testing
Following the successful completion of smaller tests, we prepared to achieve our target of synthetic stream concurrency. The application we selected required 2 vCPUs and minimal RAM, so we opted for t3.micro instances. Previous tests indicated that we could run approximately 1,000 synthetic clients on a specifically configured virtual machine, leading us to pre-provision 40 instances—4 for each AWS region.
Due to configuration issues encountered during earlier tests that slowed provisioning in specific regions, we decided to over-provision and target 4,000 virtual machines across 10 AWS regions, ensuring we could still meet our target within a reasonable timeframe.
We initiated 40,000 provisioning requests, scheduling them to begin processing at 4 AM. This timing was strategic, as our previous tests indicated that we would have a substantial number of interactive streams ready by the time our team arrived at the office to start synthetic clients promptly.
Our objective was not merely to provision a large infrastructure for a few hours; we aimed to demonstrate that we could effectively operate at such high volumes of interactive streams. We began running several thousand client sessions, terminating them after a set duration before launching another batch of a few thousand. This process ensured that our interactive streams could recover from prior sessions and were prepared for new ones.
Once we confirmed our ability to manage and terminate numerous short-lived sessions, we began ramping up the synthetic clients to push toward our target. By midday, we commenced running synthetic clients and, in less than two hours, we were nearing our initial goal.
We successfully resolved prior configuration issues, with nearly all provisioning requests fulfilled, granting us ample capacity to exceed our target. Initially set to conclude by 5 PM, we decided to queue an additional 2,000 t3.micro requests as we approached the deadline.
As we neared the end of our testing period, we surpassed the 40,000 concurrent user mark. At this juncture, we had not only reached our interactive stream capacity but had also fully utilized the machines provisioned for our synthetic clients. As the test’s end approached and we exceeded our original target by almost 45%, we initiated the graceful shutdown of all client sessions and terminated our virtual machines.
The streaming sessions that commenced first had been successfully delivering interactive 3D content for over four hours, showcasing our capability to initiate and maintain tens of thousands of sessions seamlessly.
Business Intelligence
The overall scale and performance of the test were meticulously recorded and analyzed through our business intelligence platform. This process enabled us to assess not just the provisioning and streaming capabilities but also to load test our telemetry pipelines, ancillary services, and third-party provider integrations.
These pipelines collected telemetry data and metrics from each interactive stream, supporting services, and infrastructure, routing it to Logz.io, Grafana Cloud, or PowerBI and SQL Server in real-time based on configured routing rules for each event.
Once all client sessions were concluded and telemetry data verified, we gained valuable insights that will guide us in our future endeavors, including using tools to mentor emerging talent in this field, such as those outlined in this blog post on finding a mentor. For organizations seeking to improve inclusivity in hiring practices, resources like the SHRM article on hiring employees on the autism spectrum are invaluable. Additionally, this Reddit thread on the area manager onboarding process serves as an excellent resource for best practices.
Leave a Reply