Score:1

What is the fastest method of getting a small amount of data from whitelisted servers?

us flag

I have a bunch of servers which build various different programs for various different systems.

Once a build has completed, it gets archived into a single file and compressed, then an md5sum is created of the file. One server might build multiple different versions, resulting in multiple archive files and archive.md5 files.

Finally, a script runs on various other servers that checks the md5sums of each file, compares them to the local md5sums, and if they are different, downloads and unpacks the updated build.

Currently, the md5sum check happens as scp [email protected] /path/to/builds/*.md5 . followed by a comparison of the md5s for each build.

99% of the runtime of the script is the scp (even though it only takes a few seconds). I am looking to optimize that data transfer as much as possible. The request comes from servers that are whitelisted (or can be, if the solution has its own port), and the data itself is meaningless anyway, so I don't need to worry about either authentication or encryption. I believe that scp was used out of convenience by my predecessor when there were far fewer servers, versions, and builds.

I have full root access to all servers, so I can do whatever I like. What would be the fastest way to get the .md5 data from the remote server? It could be either the files themselves, or the content of the files (e.g. from cat /path/to/builds/*.md5).

Thanks!

Score:0
in flag

The options which come to mind for me are:

  • Export the directory with the MD5 files via NFS or SMB (Samba) and mount on whichever machine you are doing the comparisons on
  • Run a small webserver e.g. lighttpd to serve the files over HTTP
  • Work out how to make SCP faster - it should not take "a few seconds" unless you have DNS issues, or slow authentication for some reason

These assume you are doing this on a local network and not across the internet, otherwise there are additional security considerations even if the files are "meaningless".

Ben Holness avatar
us flag
The servers are in various places over the internet around the world, not on a local network, which is probably why SCP is a little slow? That also makes NFS or SMB not ideal. HTTP is a possibility, but I was thinking more on the lines of a small daemon or something that doesn't need much in the way of protocol or handshake and just sends the data back.
Mintra avatar
in flag
I suppose TFTP in read-only mode would be as small as it gets! However it might be a bit unreliable across the internet since it uses UDP. As for the slow SCP, I would have a look at https://linux-tips.com/t/disabling-reverse-dns-lookups-in-ssh/222 and see if any of the discussion there sounds relevant.
Ben Holness avatar
us flag
The domains of the various servers are all in /etc/hosts, so I would be surprised if it was DNS related, but I will look at the reverse dns thing just in case.
Ben Holness avatar
us flag
Also I don't necessarily need the actual files, the content of them would be fine.
Mintra avatar
in flag
You possibly could also consider _not_ rolling your own MD5 checksum process and using _rsync_ on the main build files instead. That will work out for itself whether anything has changed before copying only the changes over. This usually uses SSH under the hood so you may end up with the same login delay, however.
Ben Holness avatar
us flag
If the files have changed, that triggers a bunch of other actions that need to be taken, so I need to know which ones are different not only to know that I need to download the files, but also to know if I need to take further actions. That's super easy (and working right now) by comparing md5s, I'm not sure if I can do that with rsync, and besides it's probably similarly slow if it uses SSH.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.