Score:1

rsync: Remember copied files, even if they are deleted at the destination

cn flag

I want to run an rsync job that copies data from A to B. The data at the destination will be deleted after processing. However, the processed data should no longer be copied. Is there a way that rsync remembers the deleted data on the target and only copies the new data in the source? Maybe there is a way to make a ongoing list and exclude the data from the list?

Michael Hampton avatar
cz flag
Move the source files to another directory.
Score:3
in flag

Two different approaches are possible:

  • make a selection of the files that need transferring and feed that list to rsync and only those files will be copied. The find command is particularly useful for that.
    For example use find -ctime 1 -print0 /path/ | rsync --files-from -
    But other sources, filename patterns, a database query, input from the application that creates the files etc. etc. are also good candidates to select specific files to copy.

  • rsync can maintain a log with the actions it performed. (Check man rsync and look for the --log-file=FILE and --log-file-format=FMT options). After a successful batch, append the file/path names from that log to the list of previously copied files. Then use that concatenated list as the --exclude-from=FILE in the next rsync run to prevent those files from getting copied again.

Note that neither approach is immediately 100% fool proof and you need to carefully consider the implications of edge cases, files that don't get copied, files that copied a second time and what happens when state/history is lost.

Score:0
qa flag

I've stumbled into this question while searching for a solution to a similar use case.

I wrote a script (I'm not a bash expert, so use with caution) that solves sync of two directories and keeps the history. It's a one way sync and files from source directory will never be synced to destination directory after the first time they are transferred.

#!/bin/bash
#
# Sync two directories with rsync, but keep history to optimize the process.
# On subsequent runs it will only sync files added since last sync.
#
# Accepts two arguments source and destination directory.
# Make sure source and destination do not end with a slash.
# Script assumes that both source and destination directory already exist. It
# is meant only to sync source content to destination content.
#
# When started it will output where it saves history. Do not delete that file!
# Delete history file if you want to re-sync from clean state.
#
# Example usage
# In parent directory of Audiobooks run:
# ./sync_with_history.sh Audiobooks user@xhostname:/media/ServerMedia/Audiobooks
# to sync files to a remote server. You'll need ssh access set up.
#
# To sync local directories simply run:
# ./sync_with_history.sh Audiobooks /path/to/destination/Audiobooks
# in the parent directory of Audiobooks.

# Create history file name
escaped1=$(echo $1 | tr / -)
escaped2=$(echo $2 | tr / -)
sync_with_history_done_list="sync_with_history_done_list-$escaped1-to-$escaped2"
echo "Saving history to $sync_with_history_done_list"
# Ensure sync_with_history_done_list exists
touch $sync_with_history_done_list

# List all not rsync-ed files to a list
find $1 -mindepth 1 -type f -printf '%P\n' | grep -vFf $sync_with_history_done_list > sync_with_history_todo_list
cat sync_with_history_todo_list | while read line
do
    echo "Sending: $line"
        echo "$line" > files-to-include
        # NOTE: use rsync -a if you want to keep permissions, owner, group etc.
        # I use -r because I don't need those.
        rsync -r --files-from=files-to-include $1/ $2/
        echo "$line" >> $sync_with_history_done_list
done

# Clean up. leave only sync_with_history_done_list
touch files-to-include
rm files-to-include
rm sync_with_history_todo_list

You can find the script on Github: https://gist.github.com/Spoygg/f6cdfbe6627a41fcf75fa7320b9dee3d with more details I left out here.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.