Score:-1

Server

how EBS and EFS achieve micro second-level latency

cifer

11/28/23, 5:09 PM

I once monitored the read/write iowait in my EC2 instance which is attached an EBS volume, the iowait column ouput of iostats is only 200 ~ 500us, as I known EBS is actually a network volume and I think local network latency should be at ~10ms level, how EBS achieve this <1ms latency?

The same question for EFS as well, I haven't tested EFS but in the performance page it shows the read latency can be as low as 250us.

https://docs.aws.amazon.com/efs/latest/ug/performance.html#performance-overview

265

1 + 4

amazon-ebs

amazon-web-services

amazon-efs

Tim

11/28/23, 9:19 PM

AWS use custom hardware in the nitro card, have fast networks, and SSDs. No-one really knows though exactly how they get their latency down.

TomTom

12/2/23, 3:20 PM

Why would you even think that a network latency should be 10ms? YOu are aware that 1gigabit is SUPER SLOW for a network and we likely talk either 200 gigabit or infiniband in that area. 10ms is WAN - DSL, across country. Not same data center.

TomTom

12/2/23, 3:25 PM

@Tim The ways to get this type of performance are well documented - there are special suppliers with whole wite papers, among them Mellanox to start with.

Tim

12/2/23, 11:04 PM

@TomTom true, but how AWS does it isn't documented. I suspect the nitro cards play a key part, along with fast networks and the AWS Hyperplane

Score:1

Server

John Mahowald

12/2/23, 3:06 PM

Pull up Latency Numbers Every Programmer Should Know when doing rough latency approximations. Not because it is accurate, but because it conveys the orders of magnitude.

Ethernet switches capable of tens of microseconds of latency across them exist. Cross a handful of those to get to a storage node in the same (enormous) data center. Add a few more microseconds of transit time between these routers.

Regarding the storage stack, sub-millisecond total rules out spinning rust in the first tier. Hard drive arrays take multiple milliseconds to seek, even in the best circumstances. Obviously solid state storage is in use, again taking maybe tens of microseconds for an I/O operation.

Assuming 250 microseconds for a read, doesn't leave much time budget left for traversing the various layers of the storage stack. Converting IP packets to block operations, and having some operating system do those, and the client OS doing something with them. Presumably takes a considerable amount of engineering work to optimize the hypervisor, OS, drivers, and hardware.

Note that EFS performance table has the qualifiers "As low as" around the amazing low latency. This is not an service level agreement. Although you are getting the fast end of that in practice.

Further note that the multiple zone storage could have writes one millisecond slower. This is the cost of durability of storing your data in multiple buildings. Merely the speed of light in fiber 10 km away adds maybe 50 us of just propagation delay. An unknown number of extra switches on that path. Multiple zones need to acknowledge a write, for that to be better durability that one zone. Still quite fast, but not all flash array in the same rack fast.

Disclaimer: this all is a proprietary black box that they are not sharing. However, it is worth understanding that private networking in the same city can achieve more than 10x faster latency than you thought. With enough engineering.

+ 3

cifer

12/2/23, 5:29 PM

Thanks for the detailed answer, but [Latency Numbers Every Programmer Should Know](https://colin-scott.github.io/personal_website/research/interactive_latency.html) is really erroneous

John Mahowald

12/2/23, 7:54 PM

The latency number thing is not meant to be precise, its meant to convey orders of magnitude. If you wish to cite evidence its not good enough even for a Fermi estimation, I'll consider removing the reference.

cifer

12/3/23, 9:59 PM

yes I understand it means to convey orders of magnitude, but some of the method it calculating the latency number across years are wrong, which leads a far deviated results. I left a comment here in its github issues page: https://github.com/colin-scott/interactive_latencies/issues/20#issuecomment-1335581819

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: how EBS and EFS achieve micro second-level latency

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.