Basic File System Access to StorageGRID via s3fs

Introduction

 

During the past year, I've talked to a lot of customers about object storage. For developers, object storage is a relief as it takes care of many data management challenges. Challenges they usually have to deal with when programming against traditional storage systems that offer NFS, CIFS or block access. However, the touchpoints for people outside development are still fairly small. One question I hear often is: "Ok great, I see the value of object storage, but how do I get data in?". The obvious answer is obviously through APIs like S3, Swift, or CDMI, however, for existing applications, this might not be the answer. In this post, we'll have a quick look how we can use the open source tool "s3fs" to ingest files into StorageGRID Webscale, NetApp's high scalable, software-defined object store.

 

Open Source tool: s3fs

 

s3fs is an open source project that allows Linux and Mac OS X hosts to mount an S3 bucket via FUSE. The advantage of s3fs is, that it preserves the native object format for all files, so all files/objects can be accessed via the file system and S3. This mean that e.g., data can be dumped into StorageGRID via a copy into the file system, but then read by an application or developer via e.g., the S3 API:

 

Representation in the file system:

s3fs2.png

 

Representation in the object store:

s3fs1.png

 

Installation

 

Installation instructions for s3fs can be found in its GitHub repo at https://github.com/s3fs-fuse/s3fs-fuse. In my example, I've persisted my StorageGRID S3 credentials under '/etc/auth.txt' and I've created a bucket 'test-bucket' in StorageGRID. The hostname and port of my S3 Endpoint in StorageGRID is 's3.mycompany.com:8082'

 

Mounting an bucket through s3fs is fairly straight forward:

 

s3fs test-bucket /mnt/testbucket \
-o passwd_file=/etc/auth.txt \
-o url=https://s3.mycompany.com:8082 \
-o sigv2 -o no_check_certificate \
-o use_path_request_style

It took me a few tries to get everything working, so if it doesn't work, use the debug foreground option of s3fs:

 

s3fs test-bucket /mnt/testbucket \
-o passwd_file=/etc/auth.txt \
-o url=https://s3.mycompany.com:8082 \
-o sigv2 -o no_check_certificate \
-o use_path_request_style \
-d -d -f -o f2 -o curldbg

 

For testing purposes, I've disabled SSL certificate verification, which is not recommended in a production setup. If the mount doesn't work on the first try (i.e., 'mount' or 'df -k' hang), do a 'sudo umount /mnt/testbucket' before trying again.

 

 

 

Summary

 

s3fs offers a fairly simple way to access objects in StorageGRID via S3 and a file system at the same time. However, don't expect any miracles, the performance is quite slow. Furthermore, s3fs doesn't offer important features like e.g., HA and file locking, plus it is only eventually consistent. In my personal view, s3fs is a great tool, but at the same time, it still feels fairly experimental. Nevertheless, it is one way of getting data into StorageGRID and being able to access this data via S3 afterwards.