Score:0

AWS storage slow simultaneous reads

in flag

We are finding that using AWS file storage (EFS or EBS using GP2 or GP3) from an EC2 instance is very slow when doing simultaneous reads. Here's an example:

I'm reading 30 binary files into memory, totaling 46 MB.

Doing this once takes about 16 ms. However, if I spawn 8 parallel processes on the same EC2 instance, each reading different sets of 30 binary files, each one takes an average of 105 ms (556% slower than a single process). It's almost like the 8 reads are happening serially instead of in parallel (though not quite). Note: There is no writing happening to these files at the time.

If I repeat the same test on my laptop, using local file storage, the same 8 simultaneous reads of the same files are each only about 70% slower than a single read.

Why is the performance hit of simultaneous reads of the same file so much greater using AWS storage?

Is there anything I can configure about the volume that would reduce that performance penalty?

Update: This does not seem to be dependent on reading the same files. I get the same performance whether each process is reading the same 30 files or 30 different files. Title and details updated to account for this.

Tim avatar
gp flag
Tim
Interesting question. Not sure the answer, but I wonder if you could look into disk caching, do the first read, the subsequent reads should come from the RAM cache and be near instant. I wonder if it's due to disk being across the network. 105ms still seems fairly quick, is it being so slow causing a problem?
JoeMjr2 avatar
in flag
@Tim This is not the actual use case. I just simplified it to demonstrate the issue. The actual use case is more involved, and getting the actual data needed out of the 8 files takes about 360ms one at at time, and an average of 2.5 seconds each when 8 are done at once. This is indeed a problem at scale. The issue with caching is that (in this example) the file set totals 46 MB, and there may be many such sets of files needed at a time, which would be a lot to cache in memory, so keeping them only on disk is ideal.
Tim avatar
gp flag
Tim
Maybe you could work around it somehow - one thread starts, downloads the files, then makes them available locally. Hopefully someone can help answer your question.
Tim P avatar
af flag
Have you tested with a larger EC2 instance type, striping the data across multiple EBS volumes or a larger EBS volume? EBS performance is a function of network capacity and I'm guessing EFS as well. If the files are large enough you might be hitting the limit of the EC2 instance or a single EBS volume. As an example, we were able to increase DB performance by creating a RAID 0 array across two EBS volumes coupled with the larger instance we used for the DB. For smaller instances we did not see the same gains.
Score:0
in flag

It turns out that this performance hit was due to a CPU bottleneck on the client. I was trying to read the file with 8 simultaneous processes, but the Docker container I was running it in was limited to only 2 cores. When I upped this to at least 8 cores, the performance went up considerably.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.