I'm having some problems with the mounting of a ceph-cluster on debian machines, don't know if I'm doing something wrong, if it's a version problem or anything else.
I'm using the ceph cluster from OVH, and then mounting with fstab on around 20 vm's ( 2 bare metal servers with a proxmox instance on each one ).
The problem appears when there is some network failure between the ceph cluster and our bare metal, from that point on, the mounts of ceph are completely unusable. Versions being used, and can only be brought back to use if I restart the server.
- Ceph-Cluster: 14.2.16
- Debian 10 Buster
- Ceph installed on debian: 14.2.21 nautiles ( stable )
Ceph configuration:
[global]
fsid = xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
mon_host = XX.XX.XXX.XX XX.XX.XXX.XX XX.XX.XXX.XX
fstab configuration:
:/ /mnt/ceph ceph name=ceph_user,_netdev,noatime 0 0
Running mount
:
xx.xx.xx.xx:6789,xx.xx.xx.xx:6789,xx.xx.xx.xx:6789:/ on /mnt/ceph type ceph (rw,noatime,name=ceph_user,secret=<hidden>,acl)
Edit just happened now, so adding some more info:
When this happens, this is what appears when I try ls the folder /mnt/:
d????????? ? ? ? ? ? ceph
If I try mount -a
:
mount error 16 = Device or resource busy
Log from /var/log/messages:
Jul 23 21:48:27 prod7-2 kernel: [28344.425057] libceph: mon2 xx.xx.xxx.xx:6789 session lost, hunting for new mon
Jul 23 21:48:27 prod7-2 kernel: [28344.427340] libceph: mon1 xx.xx.xxx.xx:6789 session established
Jul 23 21:48:54 prod7-2 kernel: [28371.560529] ceph: mds0 caps stale
Jul 23 21:52:53 prod7-2 kernel: [28610.660328] ceph: mds0 hung
Jul 23 21:53:25 prod7-2 kernel: [28642.659775] libceph: mon1 xx.xx.xxx.xx:6789 session lost, hunting for new mon
Jul 23 21:53:25 prod7-2 kernel: [28642.677667] libceph: mon0 xx.xx.xxx.xx:6789 session established
Jul 23 21:53:39 prod7-2 kernel: [28656.231175] libceph: mds0 xx.xx.xxx.xx:6801 socket closed (con state OPEN)
Jul 23 21:53:40 prod7-2 kernel: [28657.459175] libceph: reset on mds0
Jul 23 21:53:40 prod7-2 kernel: [28657.459179] ceph: mds0 closed our session
Jul 23 21:53:40 prod7-2 kernel: [28657.459180] ceph: mds0 reconnect start
Jul 23 21:53:40 prod7-2 kernel: [28657.498027] ceph: mds0 reconnect denied
Jul 23 21:53:40 prod7-2 kernel: [28657.513419] libceph: mds0 xx.xx.xxx.xx:6801 socket closed (con state NEGOTIATING)
Jul 23 21:53:41 prod7-2 kernel: [28658.454421] ceph: mds0 rejected session
Am I doing something wrong? Thanks