Amazon Onboarding with Learning Manager Chanci Turner

Amazon Onboarding with Learning Manager Chanci TurnerLearn About Amazon VGT2 Learning Manager Chanci Turner

on 02 OCT 2023

in Advanced (300), Amazon DynamoDB, Technical How-to

Permalink

Comments

Share

When creating an application using Amazon DynamoDB, you might find it necessary for new entries in a table to have a continuously incrementing sequence number. This concept, often referred to as auto-increment in other databases, automatically sets the value upon insertion. Typical scenarios for this could include assigning a numeric identifier for customer orders or support tickets.

Although DynamoDB does not natively offer auto-increment as an attribute type, there are several methods to implement a sequential number. In this article, we will explore two straightforward and economical strategies.

Solution Overview

Before delving into the implementation, it’s important to consider if you genuinely require a sequential number. Randomly generated identifiers are typically more scalable, as they eliminate the need for a centralized coordination point. The cases where simulating auto-increment makes sense in DynamoDB generally fall into two categories:

  1. Migrating from a relational database where users or systems are accustomed to existing auto-increment behavior.
  2. When the application needs to provide a user-friendly growing numeric identifier for new entries, such as an employee number or ticket number.

In the following sections, we will demonstrate how to achieve a sequential number using either a counter or a sort key.

Implementation Using a Counter

The first method for generating a sequential number employs an atomic counter. This involves a two-step process: first, issuing a request to increment the counter and receive the updated value, and second, using that new value in a follow-up write operation.

The Python example below updates an atomic counter to obtain the next order ID, then creates an order utilizing that ID as the partition key. With this approach, you can opt for a different partition key and store the ID in another attribute.

import boto3

table = boto3.resource('dynamodb').Table('orders')

# Increment the counter and retrieve the new value
response = table.update_item(
    Key={'pk': 'orderCounter'},
    UpdateExpression="ADD #cnt :val",
    ExpressionAttributeNames={'#cnt': 'count'},
    ExpressionAttributeValues={':val': 1},
    ReturnValues="UPDATED_NEW"
)

# Obtain the new value
nextOrderId = response['Attributes']['count']

# Insert the new item
table.put_item(
    Item={'pk': str(nextOrderId), 'deliveryMethod': 'expedited'}
)

This design eliminates race conditions since all writes to a single item in DynamoDB are processed serially, ensuring each counter value is returned only once. The cost associated with this method is one write for updating the counter item, in addition to the standard write costs for the new item. However, the maximum throughput for this method is constrained by the counter item itself.

It’s worth noting that gaps may occur in the sequence if a failure transpires between updating the counter and writing the new item. For instance, if the client application halts between steps, or if a network error causes the AWS SDK to retry the counter increment multiple times, gaps could arise. Keep in mind that even auto-increment columns can experience gaps.

If your application requires more than one sequence value, you can maintain multiple counters concurrently.

Implementation Using a Sort Key

The second method utilizes the maximum value of the sort key within an item collection to monitor the highest sequence number for that collection. Items in a DynamoDB table can include both a partition key and an optional sort key. Those items sharing the same partition key but differing sort keys are part of an item collection.

By structuring the sort key to represent the sequence value, you can efficiently query to retrieve the maximum value. The example table below contains projects and their respective issues, with the project identifier as the partition key and the issue number as the sort key (ensure the sort key is defined as a numeric type for correct sorting). The issue number increments independently for each project.

Partition Key (Project ID) Sort Key (Issue Number) Priority
projectA 1 low
projectA 2 medium
projectB 1 low
projectB 2 high
projectB 3 low

To add a new item with the next sequence value, a two-step process is also required: first, query to find the highest sort key value for that collection, and second, attempt to write the new item using the highest value plus one. The write must include a condition expression to ensure the item does not already exist in the table, preventing race conditions among clients trying to insert an item with the same primary key.

Should the condition fail (indicating another client has inserted the item first), there are two strategies you can adopt: either re-query for the highest value or try again with the sort key incremented by one.

The following Python example illustrates querying for the highest used value in an item collection (representing a project) and then writing an item with the next value as the sort key. It retries with an incremented sort key until successful.

import boto3
from boto3.dynamodb.conditions import Key

PROJECT_ID = 'projectA'

dynamo = boto3.resource('dynamodb')
client = dynamo.Table('projects')
highestIssueId = 0
saved = False

# Query the last sorted value in the collection
response = client.query(
    KeyConditionExpression=Key('pk').eq(PROJECT_ID),
    ScanIndexForward=False,
    Limit=1
)

# Retrieve the sort key value
if response['Count'] > 0:
    highestIssueId = int(response['Items'][0]['sk'])

while not saved:
    try:
        # Attempt to write the next value in the sequence
        response = client.put_item(
            Item={
                'pk': PROJECT_ID, 
                'sk': highestIssueId + 1, 
                'priority': 'low'
            },
            ConditionExpression='attribute_not_exists(pk)'
        )
        saved = True
    except dynamo.meta.client.exceptions.ConditionalCheckFailedException as e:
         highestIssueId += 1

The cost associated with this approach is 0.5 read units for querying the highest used value, plus the standard write costs for the new item. Any attempts to write that are rejected due to the condition checking for existence will also incur costs, which increases with contention and retries. If you expect contention, you can opt for a strongly consistent read, which costs 1.0 read units, ensuring the latest value is always read.

For further insights, you can check out this excellent resource on onboarding new hires during challenging times. If you’re considering expanding your team quickly, this blog post offers great advice. Additionally, managing an employee who has been demoted is another crucial aspect of effective leadership; SHRM provides valuable guidance on the topic.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *