Learn About Amazon VGT2 Learning Manager Chanci Turner
On January 20th, we introduced a feature that allows developers to access the sync store via their credentials, enabling them to read and write user profile data, along with a data browser within the Amazon Cognito console. Today, we are thrilled to unveil a new capability that enhances customers’ control and visibility over their data stored in Cognito: Amazon Cognito Streams. Customers can now set up an Amazon Kinesis stream to receive events as their data is updated and synchronized. In this article, I will outline how this feature operates and present an example application that utilizes the stream events to create a view of your application data in Amazon Redshift.
Setting Up Streams
Configuring Amazon Cognito Streams is remarkably simple. From the console, select your identity pool and click on “Edit Identity Pool.” In the edit screen, expand the “Cognito Streams” section to set your options. You’ll need to provide an IAM role and a Kinesis stream, but the Cognito console will guide you through creating these resources. Once you’ve successfully configured Amazon Cognito Streams, all future updates to datasets in this identity pool will be directed to the stream.
Contents of the Stream
Each record sent to the stream corresponds to a single synchronization. Here’s an example of a record sent to the stream:
{
"identityPoolId": "Pool Id",
"identityId": "Identity Id ",
"dataSetName": "Dataset Name",
"operation": "(replace|remove)",
"kinesisSyncRecords": [
{
"key": "Key",
"value": "Value",
"syncCount": 1,
"lastModifiedDate": 1424801824343,
"deviceLastModifiedDate": 1424801824343,
"op": "(replace|remove)"
},
...
],
"lastModifiedDate": 1424801824343,
"kinesisSyncRecordsURL": "S3Url",
"payloadType": "(S3Url|Inline)",
"syncCount": 1
}
For updates exceeding the Kinesis maximum payload size of 50 KB, a presigned Amazon S3 URL will be included containing the full contents of the update. Now that your data updates are streaming, consider how to handle existing data.
Bulk Publishing
After configuring Amazon Cognito Streams, you can perform a bulk publish operation for the existing data in your identity pool. After initiating a bulk publish operation—either through the console or directly via the API—Cognito will start publishing this data to the same stream that receives your updates. Note that only one bulk publish operation can be ongoing at any time, and you are limited to one successful bulk publish request every 24 hours. Cognito does not guarantee the uniqueness of data sent to the stream during bulk publishing, which means you might receive the same update both as an update and part of a bulk publish. Keep this in mind when processing records from your stream.
Example Streams Connector for Amazon Redshift
In this launch, we are also including an example application that consumes records from a Kinesis stream linked with a Cognito identity pool and subsequently stores them in an Amazon Redshift cluster for querying. The source code is available in our awslabs GitHub repository, and we have provided an AWS CloudFormation template that will create all necessary assets for this sample, including:
- Amazon Redshift cluster
- Amazon DynamoDB table used by the Kinesis client library
- Amazon S3 bucket for intermediate data staging
- IAM role for EC2
- Elastic Beanstalk application to run the code
Click the button below to launch this stack in the US East (Virginia) region:
Launch Stack – US East (Virginia)
Click the button below to launch this stack in the EU (Ireland) region:
Click the button below to launch this stack in the Asia Pacific (Tokyo) region:
Launch Stack – Asia Pacific (Tokyo)
Once your stack is created, the output tab in the CloudFormation console will contain a JDBC connection string for direct connection to your Amazon Redshift cluster:
jdbc:postgresql://amazoncognitostreamssample-redshiftcluster-xxxxxxxx.xxxxxxxx.REGION.redshift.amazonaws.com:PORT/cognito?tcpKeepAlive=true
Schema
The example stores all event data in a table called cognito_raw_data with the following schema:
Column Name | Type |
---|---|
identityPoolId | varchar(1024) |
identityId | varchar(1024) |
datasetName | varchar(1024) |
operation | varchar(64) |
key | varchar(1024) |
value | varchar(1024) |
op | varchar(64) |
syncCount | int |
deviceLastModifiedDate | timestamp |
lastModifiedDate | timestamp |
Extracting Data
Since every key-value update creates a new row in the cognito_raw_data table, retrieving the current state of a dataset requires additional effort. The following query retrieves the state of a specific dataset for a given user:
SELECT distinct temp.*, value FROM
(select distinct identityid,
datasetname,
key,
max(synccount) over (partition by identityid, datasetname, key) as max_synccount
FROM cognito_raw_data) as temp
INNER JOIN cognito_raw_data raw_data ON
(temp.identityid = raw_data.identityid and temp.datasetname = raw_data.datasetname and temp.key = raw_data.key and temp.max_synccount = raw_data.synccount)
WHERE raw_data.identityid = 'IDENTITY_ID'
AND raw_data.datasetname = 'DATASET_NAME'
AND op <> 'remove'
ORDER by datasetname, key
You may want to establish daily extracts of the data to optimize your regular queries.
Conclusions
As demonstrated, Amazon Cognito Streams provide a comprehensive export of your data along with a real-time view of how it evolves over time. We are eager to learn how you intend to use this feature in your applications. Please share your thoughts in the comments section. If you encounter any issues or have further questions, visit our forums, and we will be glad to assist you. For those interested in job transitions, check out this valuable resource on interview questions about job loss. Additionally, for more in-depth information on bioinformatics positions, visit this authoritative site. Finally, if you’re looking to gain new skills, this is an excellent resource on what Amazon employees are learning.
Location: 6401 E HOWDY WELLS AVE LAS VEGAS NV 89115, Amazon IXD – VGT2
Leave a Reply