Update (September 2023)
Added information regarding file deletion capabilities.
Mountpoint for Amazon S3 is an open-source file client designed to facilitate connections between your file-aware Linux applications and Amazon Simple Storage Service (Amazon S3) buckets. Initially introduced earlier this year as an alpha version, it is now fully available and optimized for production use in large-scale read-heavy applications such as data lakes, machine learning training, image rendering, autonomous vehicle simulations, ETL processes, and more. It effectively supports file-based workloads that require both sequential and random reads, sequential (append-only) writes, while not needing full POSIX compliance.
Why Files?
Many AWS customers utilize the S3 APIs and AWS SDKs to develop applications capable of listing, accessing, and processing the contents of an S3 bucket. However, numerous customers have existing applications, commands, tools, and workflows that function with UNIX-style file access: reading directories, opening and reading existing files, and creating and writing new ones. These customers have expressed the need for a reliable, enterprise-ready client that provides efficient access to S3 at scale. Through discussions with these customers, we discovered that their main concerns were performance and stability, while POSIX compliance was not a priority.
When I first discussed Amazon S3 back in 2006, I emphasized that it was intended as an object store, not a file system. While it is not advisable to use the Mountpoint/S3 combination for storing Git repositories, leveraging it with tools that can read and write files while taking advantage of S3’s scalability and durability is a practical solution in many scenarios.
All About Mountpoint
Mountpoint operates on a straightforward concept. You create a mount point and link an Amazon S3 bucket (or a specific path within a bucket) to this mount point, allowing you to access the bucket using shell commands (like ls, cat, dd, find), library functions (such as open, close, read, write, creat, opendir), or any equivalent commands and functions supported by your familiar tools and languages.
Under the hood, the Linux Virtual Filesystem (VFS) translates these operations into calls to Mountpoint, which then converts them into calls to S3: LIST, GET, PUT, and so on. Mountpoint aims to efficiently utilize network bandwidth, maximizing throughput and enabling you to lower your computational costs by completing more tasks in less time.
You can use Mountpoint from an Amazon Elastic Compute Cloud (Amazon EC2) instance or within containers managed by Amazon Elastic Container Service (ECS) or Amazon Elastic Kubernetes Service (EKS). Additionally, it can be installed on your existing on-premises systems, with access to S3 either directly or via an AWS Direct Connect connection through AWS PrivateLink for Amazon S3.
Installing and Using Mountpoint for Amazon S3
Mountpoint is available in RPM format and can be easily installed on an EC2 instance running Amazon Linux. Simply download the RPM and install it using yum:
$ wget https://s3.amazonaws.com/mountpoint-s3-release/latest/x86_64/mount-s3.rpm
$ sudo yum install ./mount-s3.rpm
For the past few years, I have been consistently pulling images from several Washington State Ferry webcams and storing them in my wsdot-ferry bucket. I collect these images to monitor ferry schedules and ultimately plan to analyze them for optimal riding times. Today, my goal is to create a movie that compiles an entire day’s worth of images into a time-lapse. First, I establish a mount point and link the bucket:
$ mkdir wsdot-ferry
$ mount-s3 wsdot-ferry wsdot-ferry
Next, I can explore the mount point and inspect the bucket:
$ cd wsdot-ferry
$ ls -l | head -10
total 0
drwxr-xr-x 2 alex alex 0 Aug 7 23:07 2020_12_30
drwxr-xr-x 2 alex alex 0 Aug 7 23:07 2020_12_31
drwxr-xr-x 2 alex alex 0 Aug 7 23:07 2021_01_01
drwxr-xr-x 2 alex alex 0 Aug 7 23:07 2021_01_02
drwxr-xr-x 2 alex alex 0 Aug 7 23:07 2021_01_03
drwxr-xr-x 2 alex alex 0 Aug 7 23:07 2021_01_04
drwxr-xr-x 2 alex alex 0 Aug 7 23:07 2021_01_05
drwxr-xr-x 2 alex alex 0 Aug 7 23:07 2021_01_06
drwxr-xr-x 2 alex alex 0 Aug 7 23:07 2021_01_07
After navigating into a specific directory, I can view its contents:
$ cd 2020_12_30
$ ls -l
total 0
drwxr-xr-x 2 alex alex 0 Aug 7 23:07 fauntleroy_holding
drwxr-xr-x 2 alex alex 0 Aug 7 23:07 fauntleroy_way
drwxr-xr-x 2 alex alex 0 Aug 7 23:07 lincoln
To create my animation, I execute a single command:
$ ffmpeg -framerate 10 -pattern_type glob -i "*.jpg" ferry.gif
The outcome is perfect! I utilized Mountpoint to access the existing image files and write the newly created animation back to S3. This demonstration highlights how you can effectively use your existing tools and skills to process objects stored in an S3 bucket. Given that I have gathered millions of images over the years, the ability to process them without needing to sync them explicitly to my local file system is a significant advantage.
Mountpoint for Amazon S3 Key Facts
Here are some important points to consider when using Mountpoint:
- Pricing – There are no additional fees for using Mountpoint; you only pay for the S3 operations conducted. You can also use Mountpoint to access requester-pays buckets.
- Performance – Mountpoint takes advantage of the elastic throughput provided by S3, enabling data transfers at up to 100 Gb/second between each EC2 instance and S3.
- Credentials – Mountpoint accesses your S3 buckets using the AWS credentials in effect when the bucket is mounted. Refer to the CONFIGURATION doc for more details regarding credentials, bucket configurations, requester pays, and tips for using S3 Object Lambda, among others.
- Operations & Semantics – Mountpoint supports basic file operations and can read files up to 5 TB in size. It can list and read existing files and create new ones. However, it cannot modify existing files or delete directories, and it does not support symbolic links or file locking (for POSIX semantics, consider Amazon FSx for Lustre). To enable deletion of files, pass the –allow-delete flag to the mount-s3 command. For further information on supported operations and their interpretations, refer to the SEMANTICS document.
- Storage Classes – Mountpoint can access S3 objects in all storage classes except for S3 Glacier Flexible Retrieval, S3 Glacier Deep Archive, S3 Intelligent-Tiering Archive Access Tier, and S3 Intelligent-Tiering Deep Archive Access Tier.
- Open Source – Mountpoint is open source, allowing you to customize and enhance its functionality as needed.
For more detailed insights, check out another blog post here. Additionally, for comprehensive information on this topic, refer to this expert resource. If you are looking for guidance on job interviews, this resource is excellent.
Location: Amazon IXD – VGT2, 6401 E Howdy Wells Ave, Las Vegas, NV 89115
Leave a Reply