Score:0

How to configure the ZFS in an Ubuntu installation to get a RaidZ1 and its data security even with only one hard disk?

ng flag

ZFS offers the wonderful possibility, when using Raid1, RaidZ1, RaidZ2 and RaidZ3, that data caused by defects in memory cells of SSDs or in defective blocks of classic hard disks are automatically detected and automatically corrected.

With the installation mechanism offered by Ubuntu up to now, at least if one installs this only on one hard disk, however, an installation is only carried out in a single partition (Raid0) and not in one of the above-mentioned other Raid variants and a necessary number of partitions.

How can a newly performed installation of Ubuntu, during or after an installation using ZFS, be adapted to use, for example, Raid Z1 with three data partitions, so that the advantage that ZFS has over, for example, ext4 in terms of the possibility of preserving the integrity of data even in the case of defective SSD memory cells or defective HD blocks is no longer given away?

Depending on the selected RaidZ level and the number of partitions used, the following storage capacities are obtained with compression and deduplication switched off:

  • RaidZ1 with 3 partitions: 66%.
  • RaidZ1 with 4 partitions: 75%
  • RaidZ1 with 5 partitions: 80%
  • RaidZ1 with 6 partitions: 83%

Source:

By using the compression supported by ZFS, one can save about 0 to 100% storage space. Realistically, depending on the application, a saving of 50% on average may well be possible.

By using the deduplication supported by ZFS, one can save about 0 to 100 %. The average value that occurs in reality is highly dependent on the data used.

The following source explains what the advantage of non-corrupt data is and why no data backups usually help with such data.

Presumably, RaidZ1 with three rpool and three bpool partitions and a resulting simple redundancy is not too bad a choice for a system with only one hard disk in order to recognise damage caused by defective memory cells of SSDs as well as defective blocks of HDs independently and to correct corresponding data errors.

With the GPT one such hard disk there is also only simple redundancy, although the GPT is probably not yet self-repairable these days. But perhaps ZFS will improve on this point one day.

in flag
This does not strike me as an effective use of hardware or ZFS. The file system already has a good deal of protection against situations like bit rot and memory cell degradation. Also, as any data worth this effort is worth backing up, I fail to see how it might be superior to ZFS replication or other automated archival processes for irreplaceable data
in flag
Despite all the words, this still comes across as an over-engineered solution for a problem that really doesn’t exist for the average person who lives within the atmosphere of the Earth. Only devices that are regularly subject to the harshness of interplanetary travel would need this level of redundancy but, even then, space agencies would resolve this by shielding their SSDs and installing 3~6 devices to reduce risk of a single point of failure. ZFS is already pretty robust and does not suffer from the problems outlined, which is why I’ve used it for 15+ years on BSD and 5+ years on Ubuntu
in flag
I agree completely with regards to cell degradation and general hardware failure. The proposed solution in the question, however, strikes me as over-engineered. Most people (including the data scientists that I support at work) are generally fine with ZFS defaults on Ubuntu. People who want a little bit of additional redundancy will mirror across vdevs, use something like `set copies=X`, archive snapshots on external disks, or use replication. Are none of these standard options, which do not affect write performance, viable for your use case?
Alfred.37 avatar
ng flag
@mantiago, As I described, the default settings in Ubuntu (Raid0) do not detect defective storage cells, nor does it provide redundancy for this case.
Artur Meinild avatar
vn flag
I also think this complicates things way too much. If you want extra redundancy, use ZFS with a mirror disk, that's it..
Alfred.37 avatar
ng flag
@Artur Meinild, Thanks to ZFS, there is no need for multiple drives for redundancy, nor do various devices even offer the possibility of installing a second or even a third drive.
Artur Meinild avatar
vn flag
In my opinion, you don't have redundancy if you can't swap an entire drive. What are you going to do if you have data error on one partition? Then the entire pool will be degraded until you can export each partition and rebuild the entire setup. Your idea might sound feasible on paper, but in reality you should look into other existing backup options, like @matigo suggests.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.