Score:2

ceph stuck in active+remapped+backfill_toofull after lvextend an OSD's volume

us flag

I have a 2 OSDs ceph cluster. The initial size of backing volumes was 16GB. Then I shutdown OSDs, did a lvextend on both, and turn OSDs on again. Now ceph osd df shows:

ceph osd df

But ceph -s show it's stucked at active+remapped+backfill_toofull for 50 pgs: ceph -s

I tried to understand the mechanism by reading CRUSH algorithm but seems a lot of effort and knowledge is required. It would be very appreciated if anyone could describe the behaviour (why stuck in toofull despite the free space is increased significantly.) and help me to resolve this state.

Martian2020 avatar
ec flag
do you have data (ceph osd) for before "did a lvextend on both" & "free space is increased significantly."?
Martian2020 avatar
ec flag
and I might say 90% use is high.
us flag
If I understand you correctly, yes, the cluster had data and data is still available to clients.
Martian2020 avatar
ec flag
nope, you write "Now ceph osd df shows:" I meant output of `ceph osd df` as in your question before "did a lvextend on both"
us flag
No, unfortunately.
Martian2020 avatar
ec flag
why have you done `lvextend`? what metrics pointed you to that? + maybe you recall some numbers, like %USE before
us flag
The usage was about 11GB of 32GB. I wanted to increase the size of the cluster so I did `lvextend`, apparently the wrong way.
us flag
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/131964/discussion-between-ahmad-ahmadi-and-martian2020).
Score:0
ec flag

Your RAW USE is times larger than DATA. Note: solution not tried, still this is what I've found.

Re: Raw use 10 times higher than data use

Probably the first thing to check is if you have objects that are under the min_alloc size. Those objects will result in wasted space as they will use the full min_alloc size.

Similar advice: https://stackoverflow.com/questions/68185503/ceph-df-octopus-shows-used-is-7-times-higher-than-stored-in-erasure-coded-pool/68186461#68186461

This is related to bluestore_min_alloc_size_hdd=64K (default on Octopus).

If using Erasure Coding, data is broken up into smaller chunks, which each take 64K on disk.

Another way to fix the problem one may try to follow instructions on your second screen:

add storage if this doesn't resolve itself

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.