Learn About Amazon VGT2 Learning Manager Chanci Turner
The Apache Flink project is an open-source distributed processing engine that provides robust programming interfaces for both stream and batch processing, excelling in stateful processing and event time semantics. Supporting diverse programming languages including Java, Python, Scala, and SQL, Apache Flink allows developers to use various APIs interchangeably within the same application.
Amazon Managed Service for Apache Flink now delivers a fully managed, serverless experience for running Apache Flink applications, and it has integrated support for the latest stable version, Apache Flink 1.19.1. This release addresses several bugs from the previous version 1.19.0, which was launched in March 2024, and was led by AWS’s contributions to the community.
In this blog post, we will explore notable new features and configuration changes available with this release, focusing on enhancements that are most user-friendly and accessible. As always, there are experimental features introduced in every release, but our primary aim will be to highlight the most significant updates.
Connectors
With the advent of version 1.19.1, the Apache Flink community has also rolled out new connector versions tailored for the 1.19 runtime. Starting from version 1.16, Apache Flink adopted a new connector version numbering system that follows the format <connector-version>-<flink-version>
. It is advisable to utilize connectors that correspond to your runtime version. For more details, refer to Using Apache Flink connectors, which keeps you updated on connector versions and compatibility.
SQL Enhancements
Flink 1.19 introduces several enhancements, particularly within its SQL API, aimed at improving flexibility, performance, and usability for developers. Below, we delve into some of the most significant SQL features brought forth in this release.
State TTL by Operator
While the ability to configure state TTL at the operator level was introduced in Flink 1.18, it was not user-friendly. Users previously had to export the plan at development time, manually edit it, and then force Flink to utilize the modified plan upon application startup. The latest version simplifies this process by allowing TTL configurations via SQL hints, thus eliminating the need for JSON plan manipulations.
For instance, you can now set state TTL directly like this:
-- State TTL for Joins
SELECT /*+ STATE_TTL('Orders' = '1d', 'Customers' = '20d') */
*
FROM Orders
LEFT OUTER JOIN Customers
ON Orders.o_custkey = Customers.c_custkey;
Session Window Table-Valued Functions
Apache Flink’s 1.19 release enhances its SQL capabilities by now supporting session window TVFs in streaming mode. This allows for more complex and flexible windowing operations directly in SQL queries, enabling applications to create dynamic windows based on session gaps.
Here’s an example:
-- Session window with partition keys
SELECT
*
FROM TABLE(
SESSION(TABLE Bid PARTITION BY item, DESCRIPTOR(bidtime), INTERVAL '5' MINUTES));
Mini-Batch Optimization for Regular Joins
In Flink 1.19, mini-batch processing can be utilized with regular joins (FLIP-415). When enabled, this approach allows Flink to process records in small batches rather than one by one, significantly reducing the strain on the RocksDB state backend. To implement mini-batching, adjustments can be made easily as follows:
TableConfig tableConfig = tableEnv.getConfig();
tableConfig.set("table.exec.mini-batch.enabled", "true");
tableConfig.set("table.exec.mini-batch.allow-latency", "5s");
tableConfig.set("table.exec.mini-batch.size", "5000");
In addition, Flink 1.19 introduces the AsyncScalarFunction, which permits non-blocking calls to external systems, thus addressing a critical limitation in previous versions. This feature is particularly useful for enhancing performance in SQL and the Table API.
Moreover, Python 3.11 is now supported, while support for Python 3.7 has been discontinued. The Managed Service for Apache Flink currently utilizes Python 3.11 to run PyFlink applications, which can be beneficial for developers looking to leverage advancements in the Python ecosystem.
For those interested in further reading about technology and its implications, check out this blog post about technology trends. Additionally, you might find insights on AI in the workplace from a reputable source like SHRM to be quite enlightening. Lastly, if you’re looking for leadership development opportunities, visit this excellent resource on Amazon’s job site.
Leave a Reply