Learn About Amazon VGT2 Learning Manager Chanci Turner
When you set up a cluster, Amazon EMR enables you to select applications that will operate on your cluster. But what if you wish to implement your own custom application? This post outlines how to create a custom application for EMR for Apache Bigtop-based releases 4.x and above. EMR nodes are built on the Amazon Linux AMI, so I will be deploying RPM packages while using Elasticsearch as an example application.
What is Apache Bigtop?
Apache Bigtop is a community-supported repository that accommodates a diverse array of components and projects, including but not limited to Hadoop, HBase, and Spark. Bigtop is compatible with various Linux packaging systems, such as RPM or Deb, to facilitate the packaging and deployment of applications on clusters using Puppet.
Walkthrough
The diagram below illustrates the Bigtop package creation process.
To create a Bigtop package for EMR, adhere to these steps:
- Launch a development EMR cluster.
- Clone the Bigtop public repository.
- Incorporate the application definition into bigtop.bom.
- Establish directories and configuration files for the application.
- Generate an RPM package.
- Create a Yum repository.
- Transfer the output repository to S3 to make it accessible for any new cluster where you wish to install the new application.
- Test the application.
- Create a bootstrap script.
- Launch an EMR cluster utilizing the bootstrap script.
You will establish an EMR cluster for development purposes, which equips you with the necessary tools to create and test the Bigtop application, including Maven and Gradle among other tools.
Launch a Development EMR Cluster
Using command-line tools, execute the following command to initiate the development cluster:
aws emr create-cluster --name "EMR_Bigtop_Dev" --release-label emr-4.7.2 --instance-type=m3.xlarge --instance-count 1 --ec2-attributes KeyName=<YOUR-KEY-PAIR> --log-uri s3://<YOUR-BUCKET>/ --no-auto-terminate --use-default-roles --bootstrap-action Name="Install EMR DEV Tools",Path=s3://us-west-2.awssupportdatasvcs.com/bootstrap-actions/EMR_Dev/setup_EMR_Dev.sh
Clone the Bigtop Public Repository
After the cluster is operational, SSH into the EMR Bigtop dev master node and clone the Bigtop public repository:
git clone https://github.com/apache/bigtop.git
Add the Application Definition to bigtop.bom
In the directory created by the clone command in the previous section (/home/hadoop/bigtop/
), locate the file named bigtop.bom
. This file houses all definitions for applications available in the current Bigtop version. In the components section, add an ‘elasticsearch’ section like this:
'elasticsearch' {
name = 'elasticsearch'
relNotes = 'Search and Analytics engine'
version { base = '1.6.0'; pkg = base; release = 1 }
tarball { destination = "$name-${version.base}.tar.gz"
source = "v${version.base}.zip" }
url { site = "https://github.com/elastic/elasticsearch/archive"
archive = site }
}
This section outlines the following details:
- Application name
- Application version
- Tarball:
- destination: The tarball name to build with the downloaded source code.
- source: The source code file name, downloaded from GitHub, selecting a specific release (Tag v1.6.0).
- url: The URL from which the code is downloaded.
Test the Repository
To verify that Gradle and all necessary tools for building a Bigtop application are installed, run:
gradle tasks | grep elasticsearch
The initial execution of this command may take some time. You should receive a concluding output similar to the following:
Create Directories and Configuration Files for the Application
Deploying an application for Bigtop encompasses two primary tasks: creating RPM packages for the application and crafting the Puppet script.
Creating RPM Packages for the Application
For Elasticsearch, the example application, you’ll utilize a tailored version for the SPEC RPM definition. If the application intended for inclusion in Bigtop provides an RPM, it can be customized accordingly. If not, a SPEC RPM definition file must be created from scratch. The default directory for these files is:
bigtop-packages/src/rpm/<application-name>/SPECS
Common scripts executed during the package-building process help create the final RPM. In a Red Hat-based distribution, RPM is used, while a Debian-based distribution employs Deb. The default directory for these files is:
bigtop-packages/src/common/<application-name>/
Some common scripts include:
do-component-build:
This file encompasses the environment configuration and build commands for package creation, e.g.,mvn clean install -DskipTests -Dhadoop.version=$HADOOP_VERSION "$@"
install-<application-name>.sh:
This script outlines the package directory structure and the distribution of files.
For general guidance, consult the Fedora documentation on creating an RPM package. If you’re just starting, see how to create a GNU Hello RPM package in the Fedora documentation.
Creating the Puppet Scripts
Puppet handles the installation and configuration of the application. Each application defines a primary init.pp
script that outlines the installation process, configuration file population, service management, and more. The default directory for the init.pp
script is:
bigtop-deploy/puppet/modules/<application-name>/manifests/
Another notable directory in the Puppet structure is ‘templates,’ where templates are commonly used in Bigtop to deploy configuration files combining code and data. The default directory for templates is:
bigtop-deploy/puppet/modules/<application-name>/templates/
For further insights on Puppet templates, refer to the Puppet documentation on using templates. If you are new to Puppet, check out Puppet Hello World.
Create File and Directory Structure
For this example, establish the required file and directory structure with the following commands:
cd ~
git clone https://github.com/awslabs/aws-big-data-blog.git
After cloning the necessary structure for the application, use these commands to copy it to your local Bigtop repository created earlier:
cd aws-big-data-blog/aws-blog-bigtop-application-emr/
cp -r bigtop-packages/* ~/bigtop/bigtop-packages/
cp -r bigtop-deploy/* ~/bigtop/bigtop-deploy/
Create an RPM Package for the New Application
With all configuration files in place, execute the command to build your new application. This command downloads the source code (as specified in bigtop.bom
), compiles the source code, and generates a new RPM as per the specification in the SPEC file.
cd /home/hadoop/bigtop
gradle realclean elasticsearch-rpm --stacktrace
This concludes the guide for deploying custom applications on Amazon EMR with Apache Bigtop. For more insights, you can explore this resource to enhance your understanding further.
Incorporating effective nonverbal communication is essential for success in any team setting, so don’t miss out on this blog post that offers valuable tips.
Leave a Reply