After rebooting k8s nodes, OSD didn't join the cluster with errors related to authentication. I have added them to auth list
and that error disappears.
Now OSD nodes join the cluster but they don't show as up and pgs don't show up in ceph -s
.
I spent 2 weeks on this issue, but I don't understand why OSD don't show as up.
When setting ms
subsystem logging to 20, there is an error showing OSD >> MGR - Operation not permitted
:
4038023360,v1:10.244.135.63:6801/4038023360] conn(0x55c2deb3a000 0x55c2dd0ee000 crc :-1 s=READY pgs=2984 cs=0 l=1 rev1=1 rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1
((1) Operation not permitted)
and checking OSD status directly from its daemon:
[root@rook-ceph-osd-3-79b4cddd7f-52kwm ceph]# ceph daemon osd.3 status
{
"cluster_fsid": "6078f23a-41af-4f36-aa54-ddc67de63c18",
"osd_fsid": "b89817d2-3752-4f20-a916-b992990dee8d",
"whoami": 3,
"state": "booting",
"oldest_map": 9094,
"newest_map": 9677,
"num_pgs": 33
}
What I tried so far:
- Check
dmesg | scsi
- seems fine
- Check network - exec into OSD and ping mgr and mons, OK.
- Check
iostat -x
, util is low
- Upgraded rook -> 1.9, ceph -> 17
What I'll try (unfortunately... :( ):
- ZAP OSD disk, clear all ceph cluster components, and Redeploy
Any clues to fix this issue are very appreciated.