Score:0

how does a striped RAID work with large number of disks?

jp flag

still trying to configure my large (24) array of disks (2.4TB) for an archive/nas of mixed huge and small files. But apart from this, I am now more focused to understand how striped RAIDs work under the hood but more I read and more I get confused because most of the literature's example are based on "low" number of disks (I asked the producer but he was reluctant to answer publicly some of those questions because "reserved information")

  • stripe size is usually (Number of data disks) x (size of strip) (or chunk) Eg. 8x64KB=512KB or 10x256KB=2560KB
  • how are files split and saved into the stripe? one file per stripe (the remaining strips are filled with zeros) or many files for a stripe until it all its strips are filled?
  • for large array, is the stripe size still important? I mean I discovered my PERC uses a fixed value 1MB for the stripe size if its value is bigger than 1MB (e.g. 8x256KB). In this case, how should the stripe be arranged? is it still large as 8x256KB=2MB and internally divided in 2x 1MB? or is it large 1MB divided by 8 data disks?
  • nowadays should I configure a striped RAID with "power of 2" in mind? my PERC allows me to configure any number of disks for any kind of RAID level, which are not power of 2
  • knowing these limitations(?), is it worth to set the array as a 2x12Disks RAID60 and 256KB of strip size? we need not to waste too much space
Score:2
br flag

how are files split and saved into the stripe? one file per stripe (the remaining strips are filled with zeros) or many files for a stripe until it all its strips are filled?

Arrays like this don't think in terms of files, just blocks, the filesystem itself defines what files are made up of what blocks, it's not the underlying disk system that does that.

So don't think of it as files just blocks, imagine all the files on your filesystem but take away all the data about folders and files, it's just one big pile of blocks - and it's those blocks that get striped across available disks for performance and resilience.

Generally speaking the default for file systems and RAID arrays like this will fit 95% of all applications just fine. The ability to tune them is great if you have the time to play about and test all the various combinations or if you have an application that has unusual requirements (such as it constantly reading or writing either lots of tiny random files or at the other end huge sequential files) - in those cases then yes some of the tuning can have significant benefits. But again generally speaking the defaults are usually pretty good for most use cases. I do VoD so we do often tune our storage volumes to have very large strips/blocks because we know they're all large sequential files, but then we don't put our DB files or logs etc. on those arrays/volumes because they'd be terrible for that use.

Anyway back to recommendations, glad you seem to have settled on R60 - we get people here all the time with issues with R5/50, it's dead, don't use it at all - R6/60 and R1/10 are the only game in town, unless you have a boner for ZFS anyway :) - anyway if I was doing this I'd do exactly what you suggest - R60 made up of 2 x 12-disk R6's, leave the stripe at defaults and then as your application starts to make use of this array you can look at how it's performing and if you really feel you need to tune it and will get a lot of benefit from doing so then go ahead, but I bet you'll be just fine with the defaults.

Best of luck.

pink0.pallino avatar
jp flag
Thanks, another brick into my wall of knowledge. I like ZFS but used only inside built-in NAS software, never with OS from the scratch. Also it seems a powerful controller, what are the benefits using "software raid" against HW? About my other doubts, from your answer I guess now that a "stripe of strips" is filled up until "completion" and later next one is used. What about if the controller uses a "stripe size" shorter than the value I set? using "power of 2" number of disk would increase performance because the alignment is chunk-stripe-PV/LVM-XFS is perfect?
Zac67 avatar
ru flag
@pink0.pallino Generally, larger stripe sizes perform better with large, sequential writes (because of less processing overhead) and worse with small, random writes (because of write amplification). You should select the stripe size that fits your application. If you don't know run tests. All RAID variants generally perform best when the number of disks is a power of two plus the disks added for redundancy, e.g. 4, 6, 10, or 18 for RAID 6, or 8, 12, 20, 36 for RAID 60.
pink0.pallino avatar
jp flag
@Zac67 yes, I am choosing a 256KB strip size, for 2x (10+2 RAID6) array. This should fit our needs. My curiosity was about how the controller manages this large stripe (2560KB) if in such case it uses its own stripes of 1MB, which is not multiple of what I choose and I did not want set it "wrong". Can you explain why with RAID60 the "power of two" rule is 8, 12,20,36 and not the same of RAID6? I always thought that there is a RAID0 on the top of two RAID6 arrays and for each of these the rule 4, 6, 10, 18
Zac67 avatar
ru flag
RAID 60 (also called RAID 6+0) consists of two RAID 6 subarrays that are striped like RAID 0, see https://en.wikipedia.org/wiki/Nested_RAID_levels#RAID_60_(RAID_6+0)
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.