We are in progress of deploying small Openstack Yoga cluster on Centos 9 and
HP Gen9 servers (single controller non-HA, and about 10 compute nodes; currently only
one compute attached).
In general our current 1 controller + 1 compute setup seems to behave well
(actions like new volume/port/VM creation and connecting it all toghether
work OK; Openstack Dashboard accessible, no issues there).
We have a problem though during VM deletion - nova/cinder/neutron compute services
seem to be losing connection to RabbitMQ on controller. This is causing an
issues with properly deleting all VM-related resources, including VM itself
which is entering Error state.
Below I am putting some log samples from instance deletion:
[instance: f664dbae-9053-457a-b233-4d7348b5abc6, ~2022-10-11 10:40]
Do you have any suggestions how to deal with this issue?
thanks,
Wojtek PL
LOGS - RabbitMQ logs controller (host-3) - during cnSBC AIO VM deletion:
[root@host-3 /var/log/rabbitmq]# grep error [email protected]
135.210.211.57 - host-11 (compute), 135.210.211.73 - host-3 (controller)
2022-10-11 10:40:45.270606+02:00 [error] <0.25259.4> closing AMQP connection <0.25259.4> (135.210.211.57:50872 -> 135.210.211.73:5672 - neutron-openvswitch-agent:215678:8000e9b1-dc06-46d2-b8bf-9267d42af525):
2022-10-11 10:40:45.270606+02:00 [error] <0.25259.4> {inet_error,etimedout}
2022-10-11 10:40:45.334723+02:00 [error] <0.25280.4> closing AMQP connection <0.25280.4> (135.210.211.57:50886 -> 135.210.211.73:5672 - neutron-openvswitch-agent:215678:dd2031af-6622-4c2e-b051-afae4884b92a):
2022-10-11 10:40:45.334723+02:00 [error] <0.25280.4> {inet_error,etimedout}
2022-10-11 10:40:45.398605+02:00 [error] <0.25322.4> closing AMQP connection <0.25322.4> (135.210.211.57:50894 -> 135.210.211.73:5672 - neutron-openvswitch-agent:215678:f79fbbf2-9898-470d-ad4e-f5cdba818277):
2022-10-11 10:40:45.398605+02:00 [error] <0.25322.4> {inet_error,etimedout}
2022-10-11 10:40:45.526524+02:00 [error] <0.31938.6> closing AMQP connection <0.31938.6> (135.210.211.57:56004 -> 135.210.211.73:5672 - neutron-openvswitch-agent:215678:129f82c8-1ede-4432-9487-cf8a52f33be7):
2022-10-11 10:40:45.526524+02:00 [error] <0.31938.6> {inet_error,etimedout}
2022-10-11 10:40:45.846559+02:00 [error] <0.32092.6> closing AMQP connection <0.32092.6> (135.210.211.57:32912 -> 135.210.211.73:5672 - neutron-sriov-nic-agent:215662:debfc1e1-a3b7-487e-9db7-0cbb7771265d):
2022-10-11 10:40:45.846559+02:00 [error] <0.32092.6> {inet_error,etimedout}
2022-10-11 10:40:46.550629+02:00 [error] <0.31764.6> closing AMQP connection <0.31764.6> (135.210.211.57:44946 -> 135.210.211.73:5672 - nova-compute:1948:315b488e-4e39-4afa-963b-dcd8749a20f4):
LOGS - NOVA logs on compute (host-11) - during cnSBC AIO VM deletion:
[root@host-11/var/log/nova]# tail nova-compute.log
2022-10-11 10:40:40.735 1948 INFO nova.virt.libvirt.driver [req-ba7e35f9-47a4-4cbe-866b-443b7c4a72e7 02f14854c6814feba0b29f6d56f48bbd 6a14bb3901cd4f74a6adc2983be0a52e - default default] [instance: f664dbae-9053-457a-b233-4d7348b5abc6] Deletion of /var/lib/nova/instances/f664dbae-9053-457a-b233-4d7348b5abc6_del complete
2022-10-11 10:40:52.738 1948 INFO nova.compute.manager [-] [instance: f664dbae-9053-457a-b233-4d7348b5abc6] VM Stopped (Lifecycle Event)
2022-10-11 10:41:02.465 1948 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Server unexpectedly closed connection
2022-10-11 10:41:40.498 1948 ERROR oslo_service.periodic_task [req-30cbba9a-b129-45a9-a0e2-ab1b8780a51d - - - - -] Error during ComputeManager._sync_scheduler_instance_info: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 76875df5592a4048878f4111908f65b5
2022-10-11 10:41:40.498 1948 ERROR oslo_service.periodic_task Traceback (most recent call last):
2022-10-11 10:41:40.498 1948 ERROR oslo_service.periodic_task File "/usr/lib/python3.9/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 441, in get
2022-10-11 10:41:40.498 1948 ERROR oslo_service.periodic_task return self._queues[msg_id].get(block=True, timeout=timeout)
2022-10-11 10:41:40.498 1948 ERROR oslo_service.periodic_task File "/usr/lib/python3.9/site-packages/eventlet/queue.py", line 322, in get
2022-10-11 10:41:40.498 1948 ERROR oslo_service.periodic_task return waiter.wait()
2022-10-11 10:41:40.498 1948 ERROR oslo_service.periodic_task File "/usr/lib/python3.9/site-packages/eventlet/queue.py", line 141, in wait
2022-10-11 10:41:40.498 1948 ERROR oslo_service.periodic_task return get_hub().switch()
2022-10-11 10:41:40.498 1948 ERROR oslo_service.periodic_task File "/usr/lib/python3.9/site-packages/eventlet/hubs/hub.py", line 313, in switch
2022-10-11 10:41:40.498 1948 ERROR oslo_service.periodic_task return self.greenlet.switch()
2022-10-11 10:41:40.498 1948 ERROR oslo_service.periodic_task _queue.Empty
[...]
2022-10-11 10:41:40.498 1948 ERROR oslo_service.periodic_task
2022-10-11 10:41:41.437 1948 WARNING nova.servicegroup.drivers.db [-] Lost connection to nova-conductor for reporting service status.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 97d6d1842ade42a28b4aa6f01eb331e6
2022-10-11 10:41:41.437 1948 WARNING oslo.service.loopingcall [-] Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted interval by 50.00 sec
2022-10-11 10:41:47.467 1948 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Server unexpectedly closed connection
[...]
2022-10-11 10:41:52.741 1948 ERROR nova.virt.driver [req-ffced6ba-33ca-4573-9e1c-436097b7cd17 - - - - -] Exception dispatching event <LifecycleEvent: 1665477637.737095, f664dbae-9053-457a-b233-4d7348b5abc6 => Stopped>: Timed out waiting for a reply to message ID 0484d7b9c3e946f0a6b77ad2df3021c2: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0484d7b9c3e946f0a6b77ad2df3021c2
2022-10-11 10:42:27.327 1948 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : 76875df5592a4048878f4111908f65b5
2022-10-11 10:42:27.328 1948 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : e83baa1bf7a6486b94739c2fd8407182
2022-10-11 10:42:27.329 1948 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : 97d6d1842ade42a28b4aa6f01eb331e6
2022-10-11 10:42:27.330 1948 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : 0484d7b9c3e946f0a6b77ad2df3021c2
2022-10-11 10:42:27.333 1948 INFO nova.servicegroup.drivers.db [-] Recovered from being unable to report status.
2022-10-11 10:42:27.333 1948 WARNING oslo.service.loopingcall [-] Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted interval by 35.90 sec
2022-10-11 10:42:27.351 1948 INFO nova.compute.manager [-] [instance: f664dbae-9053-457a-b233-4d7348b5abc6] Took 46.61 seconds to deallocate network for instance.
2022-10-11 10:42:27.352 1948 ERROR nova.compute.manager [req-ba7e35f9-47a4-4cbe-866b-443b7c4a72e7 02f14854c6814feba0b29f6d56f48bbd 6a14bb3901cd4f74a6adc2983be0a52e - default default] [instance: f664dbae-9053-457a-b233-4d7348b5abc6] Setting instance vm_state to ERROR: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e83baa1bf7a6486b94739c2fd8407182
2022-10-11 10:42:27.351 1948 INFO nova.compute.manager [-] [instance: f664dbae-9053-457a-b233-4d7348b5abc6] Took 46.61 seconds to deallocate network for instance.
2022-10-11 10:42:27.352 1948 ERROR nova.compute.manager [req-ba7e35f9-47a4-4cbe-866b-443b7c4a72e7 02f14854c6814feba0b29f6d56f48bbd 6a14bb3901cd4f74a6adc2983be0a52e - default default] [instance: f664dbae-9053-457a-b233-4d7348b5abc6] Setting instance vm_state to ERROR: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e83baa1bf7a6486b94739c2fd8407182