Learn About Amazon VGT2 Learning Manager Chanci Turner
We are thrilled to announce the launch of Amazon EMR release 5.0, which provides users with the latest versions of 16 supported open-source applications within the big data ecosystem, including significant updates to Spark and Hive.
Just about a year ago, release 4.0 brought substantial enhancements to EMR, including a build and packaging system based on Apache Bigtop, standardized ports and paths, and improved application configuration through configuration objects. The initial release of 4.0 consolidated our supported Apache big data applications to Apache Hadoop, Apache Spark, Apache Hive, Apache Pig, and Apache Mahout.
In the months following, EMR expanded support for various open-source projects, enabling a range of use cases such as low-latency SQL queries over datasets stored in Amazon S3 using Presto, real-time data access and SQL analytics with Apache HBase and Phoenix, collaborative data science with notebooks in Apache Zeppelin, and crafting complex processing workflows with Apache Oozie.
We also ensured that most major projects were kept up-to-date with each EMR release, providing the latest version of Spark shortly after its open-source launch. Each new version included performance improvements, new features, and bug fixes that our customers needed to support their big data architectures.
What’s New in EMR Release 5.0
With EMR release 5.0, we mark a significant step in delivering a comprehensive selection of the most current open-source applications in the Hadoop ecosystem to our users:
- Upgrade to Spark 2.0 just a week after the Apache release, offering improved SQL support, notable performance enhancements, the new Structured Streaming API, and better SparkR support. We have also compiled it using Scala 2.11.
- Transition from Hive 1.x to Hive 2.1, which features various performance upgrades, enhanced Parquet file format support, and bug fixes.
- Swap out Hadoop MapReduce for Tez as the default execution engine for Hive and Pig, reflecting a broader shift toward modern frameworks like Tez and Spark.
- Introduce the latest versions of Hue and Zeppelin, which serve as notebook and query UIs for Hadoop ecosystem applications, allowing data scientists and business intelligence analysts to engage with data more efficiently.
- Ensure that all sandbox applications are now released on EMR.
- Utilize the latest versions of all supported applications: Hadoop 2.7.2, Spark 2.0, Presto 0.150, Hive 2.1, Tez 0.8.4, Pig 0.16, HBase 1.2.2, Phoenix 4.7.0, Zeppelin 0.6.1 (Snapshot), Hue 3.10, Oozie 4.2.0, Sqoop 1.4.6, Ganglia 3.7.2, HCatalog 2.1.0, Mahout 0.12.2, and ZooKeeper 3.4.8.
If you have any questions regarding release 5.0, feedback, or if you’d like to share an interesting use case that utilizes these applications, please leave a comment below. You can also check out this excellent resource for more information on Amazon EMR and its capabilities. Additionally, join our live webinar, Introducing Amazon EMR Release 5.0, at 9 AM PDT on Tuesday, August 23.
For further insights into onboarding strategies, consider exploring this informative blog post. In the world of HR practices, Melissa Anderson, an authority on the topic, provides valuable perspectives, which you can find here.
Leave a Reply