AWS: Using regular filesystem on an EBS multi-attach volume in a one-writer, many-readers scenario

mc flag

I'm wanting to share data among multiple AWS instances in a high-performance, low-latency manner. Giving all instances read-only access (except one instance that would handle writes) is fine. Two points about this use-case:

  1. Nodes attached to the volume might come and go at any time (start, stop, be terminated, etc).
  2. Shared data includes 1000s of potentially small files that need to be listed and have metadata checked.

So I initially tried EFS, but it is rather slow for operations that need to enumerate or modify 100s or 1000s of small files.

So now I'm considering EBS multi-attach. However, to prevent data corruption AWS recommends a using only clustered filesystem like GFS2 or OCFS. Both of those appear to be complex and finicky to configure, as well as fragile for the use-case of a cluster where nodes might come and go at any time. For example, GFS2 requires cluster software on all nodes to be restarted if the number of nodes goes from more than 2 to exactly 2; or, adding a new node involves logging in to a current node, running some commands, and possibly re-distributing an updated config file to all other nodes. It just seems really inflexible as well as a lot of extra overhead.

But if I was sure only 1 instance would be doing the writing to the disk (or possibly each instance could only write to its own subfolder or even disk partition), could I use a regular filesystem like XFS for this volume and get away with it? Or would there be subtle data corruption issues even if access is technically read-only or write-access is restricted to instance-specific subfolders or partitions?

Or is there a completely different solution I'm missing?

John M avatar
xk flag
what solution did you go with in the end up? looking at something very similar for myself
mc flag
Short answer: you absolutely need a clustered filesystem for multi-attach, but I don't recommend doing it at all due to the pain points I described in the original post. For infrequently changing files you can distribute them on custom AMIs or EBS snapshots, or download them from S3 on boot, or use rsync or something. For everything else just use EFS.
kr flag

Sharing static volume content appears to work fine with multi-attach and regular XFS. Hot "adds" to the volume are only visible to the instance that wrote the data. With that established, I did not test hot "updates" or "deletes", assuming they would also be seen only by the author but may potentially break access to that data for other instances. Rebooted, restarted, and/or reconnected instances do see the latest volume state. So, the use case with one instance writing new data infrequently that triggers, e.g., forced reboots for the others to eventually see that data appears to be a use case that this technology may support.


Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.