Pull up Latency Numbers Every Programmer Should Know when doing rough latency approximations. Not because it is accurate, but because it conveys the orders of magnitude.
Ethernet switches capable of tens of microseconds of latency across them exist. Cross a handful of those to get to a storage node in the same (enormous) data center. Add a few more microseconds of transit time between these routers.
Regarding the storage stack, sub-millisecond total rules out spinning rust in the first tier. Hard drive arrays take multiple milliseconds to seek, even in the best circumstances. Obviously solid state storage is in use, again taking maybe tens of microseconds for an I/O operation.
Assuming 250 microseconds for a read, doesn't leave much time budget left for traversing the various layers of the storage stack. Converting IP packets to block operations, and having some operating system do those, and the client OS doing something with them. Presumably takes a considerable amount of engineering work to optimize the hypervisor, OS, drivers, and hardware.
Note that EFS performance table has the qualifiers "As low as" around the amazing low latency. This is not an service level agreement. Although you are getting the fast end of that in practice.
Further note that the multiple zone storage could have writes one millisecond slower. This is the cost of durability of storing your data in multiple buildings. Merely the speed of light in fiber 10 km away adds maybe 50 us of just propagation delay. An unknown number of extra switches on that path. Multiple zones need to acknowledge a write, for that to be better durability that one zone. Still quite fast, but not all flash array in the same rack fast.
Disclaimer: this all is a proprietary black box that they are not sharing. However, it is worth understanding that private networking in the same city can achieve more than 10x faster latency than you thought. With enough engineering.