You can ask the find module to calculate a checksum. For example, given the files
shell> tree dir1
dir1
├── a.txt
├── b.txt
├── c.txt
├── d.txt
├── e.txt
└── f.txt
and their's content
shell> find dir1 -type f | sort | xargs cat
123
123
456
789
789
789
the playbook below
- hosts: localhost
vars:
dir1: "{{ playbook_dir }}/dir1"
dir2: "{{ playbook_dir }}/dir2"
files_unique: "{{ out.files|groupby('checksum')|
map(attribute='1.0.path')|
list }}"
tasks:
- find:
paths: "{{ dir1 }}"
file_type: file
get_checksum: true
register: out
- debug:
var: files_unique
- copy:
src: "{{ item }}"
dest: "{{ dir2 }}"
loop: "{{ files_unique }}"
copies the unique files from the directory dir1 to the directory dir2
shell> ansible-playbook pb.yml
PLAY [localhost] *****************************************************************************
TASK [find] **********************************************************************************
ok: [localhost]
TASK [debug] *********************************************************************************
ok: [localhost] =>
files_unique:
- /export/scratch/tmp7/test-033/dir1/a.txt
- /export/scratch/tmp7/test-033/dir1/d.txt
- /export/scratch/tmp7/test-033/dir1/c.txt
TASK [copy] **********************************************************************************
changed: [localhost] => (item=/export/scratch/tmp7/test-033/dir1/a.txt)
changed: [localhost] => (item=/export/scratch/tmp7/test-033/dir1/d.txt)
changed: [localhost] => (item=/export/scratch/tmp7/test-033/dir1/c.txt)
PLAY RECAP ***********************************************************************************
localhost: ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
shell> tree dir2
dir2
├── a.txt
├── c.txt
└── d.txt