Problem:
Should start off with saying that this is for a software house, and it's internal. None of the guys are "users", they are all staff.
- We test on servers, including upgrading existing installations to prove that the upgrade process works etc.
- People sometimes log into these servers to do testing of changes.
- They don't put it back to how it is expected to be in production, meaning the environment is now considered "dirty".
- Checking this is considerably more than "This app is installed correctly." This is much more about ensuring the following, as a subset, match production:
- network interfaces and routing
- configuration files
- packages deployed to the server
- the scripts on the server
- VM config
- disk usage, permissions, locations etc.
- stuff I've not thought of
- This costs time and money trying to find out why the action didn't work as expected.
- For those who've helpfully suggested CI/CD to solve this (which I'm a big fan of and agree with for every other use case) a "wipe and redeploy" takes around 4 hours.
Question:
- Is there a method of easily verifying that an installation is "as it should be"?
- Note things like md5summing the disk isn't going to help as there are time dependent files on there.
- Note that if someone monkeys around with the server, I don't care provided they put it back. Meaning that file timestamps are not going to help with this.
Before I get my hands dirty scripting an endless list of hash checks of all "essential" files, of which I am going to miss one or two at least and someone will natural change those and only those ones (all of the angry emojis) is there a better way of doing this that I can append to a build or upgrade script that will let me know if the installation is reliable or not?