Score:6

xfs: difference between block size and sector size

bd flag

mkfs.xfs has two following options among others:

-b block_size_options
      This  option  specifies  the  fundamental  block  size  of  the  filesystem.    The   valid
      block_size_options  are:  log=value  or size=value and only one can be supplied.  The block
      size is specified either as a base two logarithm value with log=, or in bytes  with  size=.
      The  default  value is 4096 bytes (4 KiB), the minimum is 512, and the maximum is 65536 (64
      KiB).  Although mkfs.xfs will accept any of these values and create a valid filesystem, XFS
      on Linux can only mount filesystems with pagesize or smaller blocks.
      
      
-s sector_size
      This option specifies the fundamental sector size of the filesystem.   The  sector_size  is
      specified  either as a value in bytes with size=value or as a base two logarithm value with
      log=value.  The default sector_size is 512 bytes. The minimum value for sector size is 512;
      the maximum is 32768 (32 KiB). The sector_size must be a power of 2 size and cannot be made
      larger than the filesystem block size.      

Well isn't this description redundant. The only hint that sector may be something that block internally uses is "The sector_size must be a power of 2 size and cannot be made larger than the filesystem block size". Perhaps sector here is meant as the sector size of underlying block device? Default of 512 bytes would indicate that.

Obviously, that's just a guess. I would like to know what are differences between a block and a sector here, in context of XFS and how either impacts filesystem performance.

cn flag
What is the real problem you are trying to solve? https://xyproblem.info/
Score:5
ca flag

The short answer is that block size is the minimum allocation size, while sector size is the underlying physical device sector size. However, such concise answer fails to convey the true difference between block and sector size.

The key point to understand is that sector size is the atomic write size of the underlying physical device - in other words, the unit size which is expected to completely succeed or fail, with no intermediate outcome (ie: partial writes). This concept is extremely important for XFS journal safeguards: misconfiguring the sector size means venturing into dangerous territory.

Block size is a more "mundane" unit: it describe the minimum filesystem allocation for file data. On a filesystem with 4k block size, writing a single byte of data (ie: echo -n 0 > /root/test.file) results in a file with 4K true size:

[root@localhost ~]# echo -n 0 > test.file
[root@localhost ~]# stat test.file
  File: test.file
  Size: 1               Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 100664426   Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:admin_home_t:s0
Access: 2023-06-10 17:47:50.973092242 +0200
Modify: 2023-06-10 17:47:50.974092238 +0200
Change: 2023-06-10 17:47:50.974092238 +0200
 Birth: 2023-06-10 17:47:50.973092242 +0200
[root@localhost ~]# du -hs test.file
4.0K    test.file

Side note: as you can see from stat, Linux internally counts size in 512B-size "logical sector" units (on the example above, 8x 512B "linux" blocks = 1x 4K XFS block).

The short summary is that while block size is "merely" an optimization parameter, sector size should really be right (hence the autodetection) - or filesystem corruption on crash/powerloss is possible.

ph flag
So it is not clear, if a HDD has a logical sector of 512 but physical 4k, what sector size on XFS will be safer? And which one faster?
shodanshok avatar
ca flag
A 4Ke disk (4K physical, 512B logical) should have no reliability issue with both 512B and 4K writes. While sub-sector writes are going to be cause read/modify/write, a special non-volatile cache stores the to-be-modified data. Speed is another story - 4Ke disks should really receive only 4K aligned writes to get good performance.
ph flag
Ok, better force 4k sector then. Bad that 4k phy sector is not automatically recognized on my drive but it might be because of the JMicron USB bridge. Just a nitpick, it is `512e` disk vs `4kn` disks. No `4ke` disks afaik.
Score:5
cn flag

Sector size refers to the size of the sector size of the underlying block device. It is the allocation unit of the disk. This is a "hardware" attribute of the disk. You can see it with:

lsblk -o NAME,PHY-SEC,LOG-SEC,MAJ:MIN,SIZE,RO,TYPE,MOUNTPOINTS,VENDOR,MODEL,SERIAL

The sector size by default for mkfs.xfs is the advertised sector size of the device. If LOG-SEC is 512, and PHY-SEC is 4096, you should use 4096. When in doubt use PHY-SEC for performance.

Please note that filesystems can't be copied from block devices with 512 sector size to 4096 (or 8192) physical sector size. You can copy the files, but you can not add to a LVM VG as a PV and use pvmove to move the data.

Block size is the allocation unit for the file system, aka cluster size. It is the smallest amount that can be allocated by file system for a file or for metadata.

The block size needs to be larger, and a should be a power of 2 of sector size. If you intend to use the file system only for large files, you should increase the block size, otherwise keep the default.

If you are using a RAID array or any block device abstraction you should follow the manufacturer documentation to have the optimal performance.

For performance reasons, it is also important to have partitions aligned too. Most modern Linux tools are creating the partitions aligned to 1MB which is fine in most cases.

If you do not know what to do, leave the defaults. They are fine for normal use cases. If you want to improve performance, avoid disk storage, use RAM based storage, use zram (compressed RAM based swap), use SSDs.

The sector size is detected by mkfs.xfs, and the man page is outdated. Here is my test:

[mvutcovi@laptop-rh ~]$ truncate --size=1G xfs-test.img
[mvutcovi@laptop-rh ~]$ ls -lh xfs-test.img 
-rw-r--r--. 1 mvutcovi mvutcovi 1.0G Jun 10 10:12 xfs-test.img
[mvutcovi@laptop-rh ~]$ 

[mvutcovi@laptop-rh ~]$ sudo losetup --sector-size=4096 --find --show xfs-test.img 
/dev/loop0
[mvutcovi@laptop-rh ~]$ lsblk -o NAME,PHY-SEC,LOG-SEC,MAJ:MIN,SIZE,RO,TYPE,MOUNTPOINTS,VENDOR,MODEL,SERIAL /dev/loop0 
NAME  PHY-SEC LOG-SEC MAJ:MIN SIZE RO TYPE MOUNTPOINTS VENDOR MODEL SERIAL
loop0    4096    4096   7:0     1G  0 loop                          
[mvutcovi@laptop-rh ~]$ 

[mvutcovi@laptop-rh ~]$ sudo mkfs.xfs /dev/loop0 
meta-data=/dev/loop0             isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
[mvutcovi@laptop-rh ~]$ 

[mvutcovi@laptop-rh ~]$ sudo wipefs -a /dev/loop0
/dev/loop0: 4 bytes were erased at offset 0x00000000 (xfs): 58 46 53 42
[mvutcovi@laptop-rh ~]$ sudo mkfs.xfs -s size=4096 /dev/loop0 
meta-data=/dev/loop0             isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
[mvutcovi@laptop-rh ~]$

[mvutcovi@laptop-rh ~]$ sudo wipefs -a /dev/loop0
/dev/loop0: 4 bytes were erased at offset 0x00000000 (xfs): 58 46 53 42
[mvutcovi@laptop-rh ~]$ sudo mkfs.xfs -s size=512 /dev/loop0 
illegal sector size 512; hw sector is 4096
Usage: mkfs.xfs
/* blocksize */     [-b size=num]
/* config file */   [-c options=xxx]
/* metadata */      [-m crc=0|1,finobt=0|1,uuid=xxx,rmapbt=0|1,reflink=0|1,
                inobtcount=0|1,bigtime=0|1]
/* data subvol */   [-d agcount=n,agsize=n,file,name=xxx,size=num,
                (sunit=value,swidth=value|su=num,sw=num|noalign),
                sectsize=num
/* force overwrite */   [-f]
/* inode size */    [-i perblock=n|size=num,maxpct=n,attr=0|1|2,
                projid32bit=0|1,sparse=0|1,nrext64=0|1]
/* no discard */    [-K]
/* log subvol */    [-l agnum=n,internal,size=num,logdev=xxx,version=n
                sunit=value|su=num,sectsize=num,lazy-count=0|1]
/* label */     [-L label (maximum 12 characters)]
/* naming */        [-n size=num,version=2|ci,ftype=0|1]
/* no-op info only */   [-N]
/* prototype file */    [-p fname]
/* quiet */     [-q]
/* realtime subvol */   [-r extsize=num,size=num,rtdev=xxx]
/* sectorsize */    [-s size=num]
/* version */       [-V]
            devicename
<devicename> is required unless -d name=xxx is given.
<num> is xxx (bytes), xxxs (sectors), xxxb (fs blocks), xxxk (xxx KiB),
      xxxm (xxx MiB), xxxg (xxx GiB), xxxt (xxx TiB) or xxxp (xxx PiB).
<value> is xxx (512 byte blocks).
[mvutcovi@laptop-rh ~]$ 




[mvutcovi@laptop-rh ~]$ sudo wipefs -a /dev/loop0
[mvutcovi@laptop-rh ~]$ sudo losetup --detach /dev/loop0
[mvutcovi@laptop-rh ~]$ sudo losetup --sector-size=512 --find --show xfs-test.img 
/dev/loop0
[mvutcovi@laptop-rh ~]$ sudo mkfs.xfs /dev/loop0 
meta-data=/dev/loop0             isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
[mvutcovi@laptop-rh ~]$ 

[mvutcovi@laptop-rh ~]$ sudo wipefs -a /dev/loop0
/dev/loop0: 4 bytes were erased at offset 0x00000000 (xfs): 58 46 53 42
[mvutcovi@laptop-rh ~]$ sudo mkfs.xfs -s size=4096 /dev/loop0 
meta-data=/dev/loop0             isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
[mvutcovi@laptop-rh ~]$

Here is the code part of this:

  /* set configured sector sizes in preparation for checks */
  if (!cli->sectorsize) {
    /*
     * Unless specified manually on the command line use the
     * advertised sector size of the device.  We use the physical
     * sector size unless the requested block size is smaller
     * than that, then we can use logical, but warn about the
     * inefficiency.
     *
     * Set the topology sectors if they were not probed to the
     * minimum supported sector size.
     */
    if (!ft->lsectorsize)
      ft->lsectorsize = dft->sectorsize;
Tom Yan avatar
in flag
Except `The default sector_size is 512 bytes.` which means either that or `The sector size is automatically determined by mkfs.xfs` is false. I suppose part of the question is do we need to specify `-s sector_size` explicitly / manually if e.g. the logical block/sector size of a drive is 4096 bytes (i.e., AF **4Kn**), and is it really an option for specifying the logical block/sector size of a drive. (We can't really assume "sector" to be anything, like in Linux *code* "sector" is NOT the same thing as logical block, but as of today always 512b block.)
cn flag
Thank you. You are right. I am updating my answer.
cn flag
I just made a test and it seems that it is indeed detecting the sector size. I think the man page is outdated. Need to check the source code too. I am adding the test to the answer so you can check it too.
cn flag
Here is where it's using the sector size from the device, not the default of 512, as documented in the man page. The man page is outdated. https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/mkfs/xfs_mkfs.c#n1993
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.