Score:0

Ansible playbook: download a file from a repository only if it was not changed on a target machine

za flag

I am setting up an autoinstall environment with PXE boot. To install Linux this way, one basically needs to download linux and initrd.gz files which are buried deeply in the repository structure and also supply a pxelinux and/or pxegrub configuration which will reference these files. This is for Debian; other distros should be similar. I did that before by hand.

I wrote the following basic playbook to download files from the server and put them into tftp server:

- name: test download
  hosts: test-netinstsrv
  gather_facts: false
  vars:
    netboot:
      base: https://deb.debian.org/debian/dists/bullseye/main/installer-amd64/current/images/
      files_base: netboot/debian-installer/amd64/
      files: [ linux, initrd.gz ]
  tasks:
  - name: Create a temporary directory
    ansible.builtin.tempfile:
      state: directory
      suffix: netboot
    register: tmpdir
    delegate_to: localhost
    run_once: true
    changed_when: false
  - name: Set permissions
    ansible.builtin.file:
      path: "{{ tmpdir.path }}"
      mode: 0755
    delegate_to: localhost
    run_once: true
    changed_when: false

  - name: Download files
    ansible.builtin.uri:
      url: "{{ netboot.base }}{{ netboot.files_base }}{{ item }}"
      dest: "{{ tmpdir.path }}/{{ item }}"
      method: "get"
    loop: "{{ netboot.files }}"
    delegate_to: localhost
    run_once: true
    register: downloaded_file
    changed_when: false

  - name: Copy files to the remote
    ansible.builtin.copy:
      src: "{{ tmpdir.path }}/{{ item }}"
      dest: "/tmp/"
    loop: "{{ netboot.files }}"
  - name: Remove temporary directory
    ansible.builtin.file:
      path: "{{ tmpdir.path }}"
      state: absent
    delegate_to: localhost
    run_once: true
    changed_when: false

(The actual playbook is much more massive, this is an extract.)

This works, but what is inefficient is that for every play it always downloads both files, even if nothing has changed. I had to put changed_when almost everywhere so not to see bogus changes in a play recap; the only real change could be is the one copy module can make. As I plan to extend this to several versions of Debian (all currently supported) and add other distributions, the amount of download will become excessive.

I use an uri module to download files, but I didn't find in the documentation how to make this download conditional. What I want to achieve is for the download module to make something like If-Modified-Since request to which repository web server can return 304 Not Modified and this would imply a skip of the update of boot images on target servers, saving from a download.

This is slightly complicated by the fact it is expected there is no Internet access from target servers. All downloading is therefore has to be done by the controller machine, so I delegate most tasks to localhost.

One idea was to first fetch those files from one of my target servers to the controller and then hope that uri module will detect that files are already there and won't re-download them unless they are changed. Will it work? Is there any other way to achieve this?


Update

I tried with get_url module (even before answer appeared). The changes to the playbook basically are:

  - name: Create required subdirectories
    ansible.builtin.file:
      path: "{{ tmpdir.path }}/{{ netboot.files_base }}"
      state: directory
    delegate_to: localhost
    run_once: true

  - name: Download files
    ansible.builtin.get_url:
      url: "{{ netboot.base }}{{ netboot.files_base }}{{ item }}"
      dest: "{{ tmpdir.path }}/{{ netboot.files_base }}{{ item }}"
      checksum: "sha256:{{ netboot.base }}SHA256SUMS"
    loop: "{{ netboot.files }}"
    delegate_to: localhost
    run_once: true
    register: downloaded_file

This fails:

failed: [test-netinstsrv -> localhost] (item=linux) => {"ansible_loop_var": "item", "changed": false, "item": "linux", "msg": "Unable to find a checksum for file 'linux' in 'https://deb.debian.org/debian/dists/bullseye/main/installer-amd64/current/images/SHA256SUMS'"}
failed: [test-netinstsrv -> localhost] (item=initrd.gz) => {"ansible_loop_var": "item", "changed": false, "item": "initrd.gz", "msg": "Unable to find a checksum for file 'initrd.gz' in 'https://deb.debian.org/debian/dists/bullseye/main/installer-amd64/current/images/SHA256SUMS'"}

Probably, because SHA256SUMS has the following structure:

...
52eb21964231223563a59656708270c5708c8dcf5b3a1c5cccb1924af9964332  ./netboot/debian-installer/amd64/initrd.gz
b00b339f8b1aada1841d86650377dd8e7299eaa7f34d0bbf21deb561467015cd  ./netboot/debian-installer/amd64/linux
...

Probably, if I download that file and re-format it, it helps...

gxx avatar
gb flag
gxx
This is probably not helpful, not an answer to your actual question and probably only of value if you run this on a Debian machine: The netboot installer is available as a binary package, see https://packages.debian.org/bookworm/debian-installer-12-netboot-amd64. This makes installation (and verification) via APT pretty easy.
Score:2
cn flag

ansible.builtin.get_url is a generic file download module that does precisely that, it can skip downloads if certain conditions are met.

Modified since headers based on destination file modified time. Skipping download (and module returning not changed) if the response headers say not modified.

In addition, provide a checksum parameter. If the destination file exists and matches, it is not downloaded. Further, if the resulting downloaded (temporary) file does not match the provided checksum, the module fails. This latter use case could be useful to pin known versions, such that any changes to the downloaded file is obvious.

Nikita Kipriyanov avatar
za flag
I found it and tried to use right before answer appeared. Found the answer when went here for update. So I updated the question with my findings, can you suggest something to that too?
Score:1
za flag

get_url doesn't recognize the format which SHA256SUMS file has, which is disappointing, provided how common this format is for checksum files. Additionally it requires the file to exist locally, which is not as efficient too.

I was able to achieve exactly what I wanted with the following task set:

  - name: Download SHA256SUMS file to compare against
    ansible.builtin.get_url:
      url: "{{ netboot.base }}SHA256SUMS"
      dest: "{{ tmpdir.path }}/"
    delegate_to: localhost
    run_once: true
    become: false
    changed_when: false
    
  - name: Find checksums of files on the server
    ansible.builtin.stat:
      path: "{{ netboot.remote_location }}{{ item }}"
      checksum_algorithm: sha256
      get_checksum: yes
    loop: "{{ netboot.files }}"
    register: remote_stat
    changed_when: false
    
  - name: Compare server checksums against SHA256SUMS file
    command: "grep -E '{{ item.stat.checksum }}  ./{{ netboot.files_base }}{{ item.item }}' {{ tmpdir.path }}/SHA256SUMS"
    loop: "{{ remote_stat.results }}"
    when: item.stat.exists
    delegate_to: localhost
    become: false
    ignore_errors: true
    changed_when: false
    register: comparison
    
  - name: Download missing or different files
    ansible.builtin.get_url:
      url: "{{ netboot.base }}{{ netboot.files_base }}{{ item.item.item }}"
      dest: "{{ tmpdir.path }}/{{ item.item.item }}"
    loop: "{{ comparison.results }}"
    when: (item.skipped is defined) or item.failed
    delegate_to: localhost
    run_once: true
    become: false
    
  - name: Copy files to the remote
    ansible.builtin.copy:
      src: "{{ tmpdir.path }}/{{ item.item.item }}"
      dest: "{{ netboot.remote_location }}"
    loop: "{{ comparison.results }}"
    when: (item.skipped is defined) or item.failed

It downloads SHA256SUMS, checks files on the server against it, and then only downloads those files which are missing or differ. In the ideal situation it only downloads this small file and only moves anything else if something is wrong; there is even no need to fetch files to the controller machine.

It has the ability to "pin" something too, by fixing the checksum of SHA256SUMS file itself.

I am not sure yet if it will work when there are multiple servers in the play (I only tested it on one). Probably run_once should be removed from the second last task above. Also I hope it is possible to make this generic enough so it will accept other directory structures used by other distros.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.