Large-scale Internet of Things (IoT) applications produce data at rapid rates, and many IoT solutions necessitate the sequential storage of this data based on timestamps generated either at the sensor or ingestion level. Across various business sectors—including industrial, utilities, healthcare, oil and gas, logistics, consumer devices, and smart vehicles—time series data holds significant operational and business value.
Unlike traditional data, time series data is utilized to conduct time-window queries over both extensive and limited timeframes, with data continually being appended at high frequencies. Amazon Timestream is a managed time series database specifically designed for this use case, enabling queries with rolling time windows, handling missing data seamlessly, and integrating effortlessly with standard data processing, operational, and analytical pipelines, including business intelligence (BI) and machine learning (ML).
This article explores critical architectural patterns and considerations for ingesting data via AWS IoT services into Timestream while demonstrating several essential Timestream capabilities. Additionally, it highlights the creation of analytical pipelines that leverage Timestream’s native features for rapid dashboarding as well as more complex analytical applications.
You should consider a time series database if your data storage and query requirements align with one or more of the following scenarios:
- You require interpolation for missing data points at specific times, which may occur when:
- Data collection or transmission is unreliable, resulting in data gaps.
- Your data source employs deadbanding, meaning data points are only emitted when the difference from the previous value exceeds a specified threshold.
- You need to perform analyses across multiple data series that may not be producing data at the same frequency or may be synchronized but at different rates, with time granularity ranging from seconds to weeks.
- You must compute statistical values, such as averages, standard deviations, percentiles, and rankings, for your data series over varying time periods.
- You need to access data at different levels of granularity, allowing for adjustments based on the zoom level within a specific analytical timeline.
- You seek to develop comprehensive timelines for particular events, such as prior to an automated shutdown of an industrial or assembly line.
Core Timestream Concepts
The fundamental concepts of Timestream include:
- Time Series: A sequence of data points recorded over time.
- Record: A single data point within the time series.
- Dimension: An attribute that describes the metadata of the time series.
- Measure: An attribute that describes the data within the series.
- Timestamp: Each record is tagged with a timestamp indicating when the measure was collected or ingested.
- Table: A container for related time series that includes timestamps, dimensions, and measures.
- Database: A top-level container for tables.
For an in-depth explanation of these concepts, refer to Timestream Concepts.
Architecture Overview for AWS IoT to Timestream
The following diagram outlines a typical architecture that can be employed for the ingestion and consumption of IoT data using Timestream.
In this discussion, we elaborate on several options depicted in the preceding diagram, including:
- Ingesting data from AWS IoT Greengrass
- Utilizing the Timestream AWS IoT rule action for data ingestion
- Consuming data via APIs
- Visualizing data with Grafana
- Storing data in Amazon Simple Storage Service (Amazon S3) in CSV format for use with Amazon Forecast or other downstream analytics
Pattern 1: Data Ingestion into Timestream Using AWS IoT Greengrass
When designing ingestion paths for data generated by devices or sensors connected to AWS IoT Greengrass, multiple options can be utilized to leverage Timestream’s strengths. The choice of option is contingent upon the nature of your IoT data.
For low-volume data ingestion, you can transmit your data from AWS IoT Greengrass to AWS IoT Core utilizing the MQTT protocol and ingest it into Timestream via an IoT rule action.
For high-volume IoT data, the preferred method is to use the AWS IoT Greengrass stream manager. This tool processes data streams locally and automatically exports them to the AWS cloud, allowing you to work without concern for intermittent or limited connectivity. With the recently released AWS IoT Greengrass v2.0, you can add or remove pre-built software components like stream manager based on the specific requirements of your IoT use case and the capabilities of your device’s memory and CPU.
Data ingestion begins with a message stream configured to export to a consuming service such as AWS IoT Analytics, AWS IoT SiteWise, or Amazon Kinesis Data Streams. When data arrives at a Kinesis data stream in the cloud, it is consumed and written to Timestream. You can orchestrate the data pipeline from a Kinesis data stream using an AWS Lambda function or Amazon Kinesis Data Analytics for Apache Flink.
In terms of cost estimation, Lambda charges are based on the number of requests and the duration of execution, whereas Amazon Kinesis Data Analytics has an hourly rate based on the average number of Kinesis Processing Units (KPUs). The specific nature of your data flow will guide your choice between a Lambda function and Kinesis Data Analytics. For a steady data flow, Kinesis Data Analytics may be more advantageous than Lambda. Additionally, if you need to process streaming data before writing results to Timestream, consider this pattern.
Directly ingesting IoT data from AWS IoT Greengrass to Timestream via a Lambda function is not advisable. Doing so requires implementing features already found in stream manager, such as managing intermittent connectivity or buffering data while the transport path is re-established. It is prudent to avoid duplicating existing functionality that already meets your needs.
Example Use Case
For instance, you might gather system metrics like CPU, disk, and memory usage from an AWS IoT Greengrass core. After creating a database and table in Timestream, set up a Kinesis data stream and link a Lambda function to it. This Lambda function will retrieve records from the data stream and write them into a Timestream table.
The following Python code snippets for Lambda illustrate how to write data to Timestream. In this function, the database name is designated in the variable DATABASE_GGMETRICS
, and the table name is specified in TABLE_GGMETRICS
. Furthermore, the Lambda function must be assigned an AWS Identity and Access Management (IAM) role with permissions to write to the corresponding Timestream table.
def lambda_handler(event, context):
logger.debug("event:n{}".format(json.dumps(event, indent=2)))
records_ggmetrics = []
try:
c_ts_write = boto3.client('timestream-write', config=config)
For additional insights, check out this blog post to keep your knowledge growing. Also, for authoritative information on this topic, refer to this resource. Finally, for onboarding tips, this excellent resource can be very helpful.
Leave a Reply