Score:0

Openstack nova cold migration failing on rocky 9 due to sftp subsystem error

eh flag

I have deployed a multinode deployment of OpenStack using kolla ansible (deployed following the openstack deployment guide) on 2 rocky linux 9.1 machines. When attempting to migrate one instance between nodes, it fails and the instance enters the error state. I get the following error in the logs:

2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 11467, in migrate_disk_and_power_off
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     self._cleanup_remote_migration(dest, inst_base,
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     raise self.value
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 11445, in migrate_disk_and_power_off
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     libvirt_utils.copy_image(from_path, img_path, host=dest,
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/nova/virt/libvirt/utils.py", line 243, in copy_image
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     remote_filesystem_driver.copy_file(src, dest,
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/nova/virt/libvirt/volume/remotefs.py", line 104, in copy_file
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     self.driver.copy_file(src, dst, on_execute=on_execute,
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/nova/virt/libvirt/volume/remotefs.py", line 196, in copy_file
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     processutils.execute(
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib/python3.9/site-packages/oslo_concurrency/processutils.py", line 438, in execute
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server     raise ProcessExecutionError(exit_code=_returncode,
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server Command: scp -r /var/lib/nova/instances/d043223f-ab70-4f55-bd2b-897768681094_resize/disk 10.0.102.1:/var/lib/nova/instances/d043223f-ab70-4f55-bd2b-897768681094/disk
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server Exit code: 255
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server Stdout: ''
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server Stderr: "Warning: Permanently added '[10.0.102.1]:8022' (ED25519) to the list of known hosts.\r\nsubsystem request failed on channel 0\r\nConnection closed\r\n"
2023-01-30 14:52:49.767 7 ERROR oslo_messaging.rpc.server

My hypothesis is that there is some mismatch between server and client somewhere, with one using legacy scp and the other using sftp, however I'm not sure how to correct this.

pt flag
What errors (if any) do you see from `sshd`?
N. Komodo avatar
eh flag
@larsks looks like the port 8022 mentioned in logs is controlled by the nova-ssh docker container, which helpfully doesnt seem to keep ssh logs
us flag
The hypervisors require passwordless ssh access to live-migrate instances.
N. Komodo avatar
eh flag
@eblock Live migration works fine. As the title says, the issue is with cold migration.
us flag
It was not clear that live migration works. What happens if you try to scp (maybe a test file within that directory or an actual instance) between the nodes manually?
us flag
As nova user, of course.
N. Komodo avatar
eh flag
@eblock fails with the same error when just running the command, when i add -O to run it in legacy mode the transfer works fine
us flag
Interesting, seems like this explains it a bit: https://www.redhat.com/en/blog/openssh-scp-deprecation-rhel-9-what-you-need-know. Not sure if there’s an option in nova.conf to add a legacy option, will check tomorrow.
Score:0
eh flag

Appending

Subsystem       sftp    /usr/libexec/openssh/sftp-server

to /etc/kolla/nova-ssh/sshd_config and restarting the nova_ssh container solves this.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.