Score:0

Server

Synology - BTRFS volume crashed & recovered. Why did it crash?

rgb255_255_255

1/17/23, 1:26 AM

This is a post mortem, I'm putting the info out there for people who may have this issue in the future.

This happened on a Synology RS2818RP+ running Synology DSM 6.2-25556. The system contains a Xeon CPU and ECC memory. It has 8 x HUH721010ALE604 (HGST WD Ultrastar DC HC510 10TB 7200 RPM SATA) comprisign a RAID6 md array. The file system is BTRFS.

(NOTE this is NOT BTRFS RAID but rather "plain" RAID with BTRFS on top for benefits like check-summing / snapshotting etc)

Last night, when I got an email from a NAS that said

volume 1 on has crashed, it is possible that more files may be corrupted under this circumstance. Please go to Storage Manager > Volume for more information.

938

0 + 0

synology

Score:1

Server

rgb255_255_255

1/17/23, 1:26 AM

I visited the WebGUI. There were no errors at all in the system other than the volume being offline.

The SMART status of each disk was checked and seemed OK
"Storage Manager" showed "Healthy"
"Storage Pool" showed "Healthy"

Only "volume" showed crashed.

I SSH'ed in and checked. mdadm --D /dev/md2 which is where my array was. It was showing State: Clean, Degraded

I checked dmesg, and found this:

[5638907.327288] ------------[ cut here ]------------
[5638907.332247] WARNING: CPU: 3 PID: 10234 at fs/btrfs/extent-tree.c:4207 btrfs_write_dirty_block_groups+0x365/0x390 [btrfs]()
[5638907.343601] BTRFS: Transaction aborted (error -2)
[5638907.343603] Modules linked in: nfsd exportfs rpcsec_gss_krb5 cifs udf isofs loop tcm_loop(O) iscsi_target_mod(O) target_core_ep(O) target_core_multi_file(O) target_core_file(O) target_core_iblock(O) target_core_mod(O) syno_extent_pool(PO) rodsp_ep(O) hid_generic usbhid hid usblp usb_storage denverton_synobios(PO) overlay exfat(O) btrfs synoacl_vfs(PO) hfsplus md4 hmac bnx2x(O) mdio mlx5_core(O) mlx4_en(O) mlx4_core(O) mlx_compat(O) qede(O) qed(O) atlantic(O) tn40xx(O) i40e(O) ixgbe(O) be2net(O) igb(O) i2c_algo_bit e1000e(O) vxlan ip6_udp_tunnel udp_tunnel fuse vfat fat crc32c_intel aesni_intel glue_helper lrw gf128mul ablk_helper arc4 cryptd ecryptfs sha256_generic ecb aes_x86_64 authenc des_generic ansi_cprng cts md5 cbc cpufreq_powersave cpufreq_performance acpi_cpufreq processor cpufreq_stats
[5638907.425092]  dm_snapshot dm_bufio crc_itu_t crc_ccitt quota_v2 quota_tree psnap p8022 llc sit tunnel4 ip_tunnel ipv6 zram sg etxhci_hcd xhci_pci xhci_hcd uhci_hcd ehci_pci ehci_hcd usbcore usb_common [last unloaded: denverton_synobios]
[5638907.448308] CPU: 3 PID: 10234 Comm: btrfs-transacti Tainted: P           O    4.4.59+ #25556
[5638907.457047] Hardware name: Synology Inc. RS2818RP+/Type2 - Board Product Name1, BIOS M.212 2019/11/01
[5638907.466571]  0000000000000000 ffff880068a0fc50 ffffffff812bf70d ffff880068a0fc98
[5638907.474851]  ffffffffa0939b8d ffff880068a0fc88 ffffffff8104b7cd ffff8801704f9e00
[5638907.483132]  ffff88003b616338 0000000000000001 00000000fffffffe ffff8801704f9f50
[5638907.491422] Call Trace:
[5638907.494179]  [<ffffffff812bf70d>] dump_stack+0x4d/0x70
[5638907.499623]  [<ffffffff8104b7cd>] warn_slowpath_common+0x7d/0xc0
[5638907.505932]  [<ffffffff8104b859>] warn_slowpath_fmt+0x49/0x50
[5638907.511996]  [<ffffffffa0892d45>] btrfs_write_dirty_block_groups+0x365/0x390 [btrfs]
[5638907.520056]  [<ffffffffa0932df8>] commit_cowonly_roots+0x230/0x2d1 [btrfs]
[5638907.527250]  [<ffffffffa08a90e8>] btrfs_commit_transaction+0x528/0xcb0 [btrfs]
[5638907.534793]  [<ffffffffa08a9905>] ? start_transaction+0x95/0x3d0 [btrfs]
[5638907.541810]  [<ffffffffa08a387c>] transaction_kthread+0x1ec/0x220 [btrfs]
[5638907.548915]  [<ffffffffa08a3690>] ? btrfs_cleanup_transaction+0x510/0x510 [btrfs]
[5638907.556701]  [<ffffffff810672a6>] kthread+0xc6/0xe0
[5638907.561883]  [<ffffffff810671e0>] ? kthread_create_on_node+0x180/0x180
[5638907.568717]  [<ffffffff81567abf>] ret_from_fork+0x3f/0x80
[5638907.574423]  [<ffffffff810671e0>] ? kthread_create_on_node+0x180/0x180
[5638907.581346] ---[ end trace 27185b26c2db1370 ]---
[5638907.586280] BTRFS: error (device md2) in btrfs_write_dirty_block_groups:4207: errno=-2 No such entry
[5638907.595721] BTRFS info (device md2): forced readonly
[5638907.600997] BTRFS warning (device md2): Skipping commit of aborted transaction.
[5638907.608618] BTRFS: error (device md2) in cleanup_transaction:2019: errno=-2 No such entry
[5638907.617108] BTRFS info (device md2): delayed_refs has NO entry

So the data was there, and the array was in read only. My research took me to this SuSe KB: https://www.suse.com/support/kb/doc/?id=000018769

I references the same error as I got BTRFS: Transaction aborted (error -2).

The article states

Good thing is, it's a WARNING, not a fatal error. WARNINGs like this one, e.g. regarding quota, typically are runtime only things that are fixed by BTRFS after the WARNING is issued. Not a bad problem.

Which was somewhat reassuring.

I ran a

syno_poweroff_task -d

In order to shutdown all the Synology services that may be accessing the volume. This stops the WebUI etc., but keeps SSH on.

I then did a

umount /volume1

In order to stop I/O to the volume (although it was already in RO mode per the output of dmesg above. I then did a btrfsck on the md2 device. Output below

# btrfsck /dev/md2
Syno caseless feature on.
Checking filesystem on /dev/md2
UUID: 7a29febb-e9b5-4f77-afd7-4e1e10971340
checking extents
checking free space tree
checking fs roots
checking csums
checking root refs
found 30769182859264 bytes used err is 0
total csum bytes: 44052
total tree bytes: 50331648
total fs tree bytes: 24477696
total extent tree bytes: 23691264
btree space waste bytes: 1190925
file data blocks allocated: 30769149861888
referenced 30769131089920

And in dmesg:

[5644451.646580] BTRFS info (device md127): using free space tree
[5644451.652561] BTRFS info (device md127): has skinny extents
[5644459.827213] BTRFS info (device md127): checking UUID tree

Once that completed, I simply rebooted the system and everything came up as per normal.

I'm currently running a volume consistency check" on it which I believe is a mdadm "resync" that's been running for almost 24 hours now with no issues.

I guess the goal of making this post is to figure out if anyone with a bit more knowhow on btrfs has experienced this, and if they have any ideas what caused this?

0 + 0

spaceman-spiff

8/2/23, 2:20 AM

It seems this was done on DSM6 as DSM7 has removed syno_poweroff_task command. Any idea how to carry this out on DSM 7?

rgb255_255_255

8/3/23, 3:04 AM

@spaceman-spiff - have a look at this https://community.synology.com/enu/forum/1/post/146217

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Synology - BTRFS volume crashed & recovered. Why did it crash?

TH: Synology - วอลุ่ม BTRFS ขัดข้อง & กู้คืนแล้ว ทำไมมันถึงพัง?

RO: Synology - Volumul BTRFS s-a prăbușit și a fost recuperat. De ce s-a prăbușit?

RU: Synology - Том BTRFS аварийно завершил работу и был восстановлен. Почему он разбился?

VI: Synology - Khối lượng BTRFS bị lỗi và đã khôi phục. Tại sao nó bị sập?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.