Score:1

Mirror Accelerated Parity / NVMe / ReFs / Fast-Tier Issues

ar flag

I'm currently building a lab server with some cheapish hardware. 2 NVMe SSDs, bunch of 3.5 HDDs. After creating a Tiered Storage (NVMe-Mirror & HDD-parity), formating it with ReFS, everything behaves quite as it should:

  • Using performance-counters I can see the fast-tier filling up, then starting to destage to the parity tier, once it hits 85%.
  • I can verify new writes always hitting the fast-tier.
  • Reads are issued to the fast- or slow-tier depending on data.

Only the sizing of the fast-tier seems strange: I've used 2 * 220 GB NVMe SSds, and created a 215GB fast-tier out of it. HDDs sum up to something of 6 TB. Powershell reports this sizes as it should be:

FriendlyName             TierClass   MediaType ResiliencySettingName FaultDomainRedundancy   Size FootprintOnPool StorageEfficiency
------------             ---------   --------- --------------------- ---------------------   ---- --------------- -----------------
M. Acc. Parity-NVMe-Tier Performance SSD       Mirror                1                     215 GB          430 GB           50,00 %
NVMe-Tier                Unknown     SSD       Mirror                1                       0 B             0 B
HDD-Tier                 Unknown     HDD       Parity                1                       0 B             0 B
M. Acc. Parity-HDD-Tier  Capacity    HDD       Parity                1                       6 TB            9 TB           66,67 %

But the Issue I am Facing now: When moving on data to that tiered storage, I can see from the performance-counter that the Fast-Tier is reporting 85% Usage and starts to destage files to the slow-tier after I moved something like 40-50GB to the virtual disk.

I thought about possible reasons for this for a couple of days now, maybe somebody has an idea on that?

My current thought: The NVMe SSDs are - as mentioned - quite cheap, so they are TLC-SSds. They can deliver quite a nice performance, as long as they are operating in pSLC Mode. However that would waste 67% of disk capacity (1 bit per cell rather than 3) - and that would kind of match my observation (33% of 220GB would be ~ 71 GB, so we are hitting 85% of total usage quite quickly)

Well, I wouldn't mind If the fast-tier is that small but on the other side doesn't have to deal with slow TLC-Performance - but why is the tier size reported as 220 GB then? And is there a way to set pSLC Mode, or is this controlled by ReFS / done by trimming etc?

I would be particulary interested, in what causes the disk to be stuck in pSLC Mode, as to my understanding, the disk should automatically switch to TLC Mode, once it runs out of available disk space. (But I also read, that MS disabled trimming with ReFS, maybe related to that?)

It seems to be intentional, or why does the ReFS-PerformanceCounter know about the actual Fast-Tier-Fill-Level, if ReFS would assume a 215GB Disk as well?

Example-Screenshot: Writing 8 GB to the fast tier, having 8GB of other data destaged, before the 8GB are deleted from the fast tier again. looks like 8GB = 10%, so ReFS is seeing the tier as ~ 80GB I'd say.

enter image description here

  • Windows Server 2019 Standard
  • Building this Lab on two nodes, can see the exact same behaviour on both nodes. (identical Hardware)
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.