Not sure if this falls into crypto from contextual point of view but it is about hashing algorithms. I have two directories -- assets/ and cache/. Anytime there is a file added, deleted, or changed in the assets/
directory, a corresponding, application specific file will be generated in the cache/
directory. On top of that, an additional "cache file" gets created that stores the following information:
assetFile: my-file.png # relative to assets directory
cacheFile: my-file.ktx2 # relative to cache directory
assetHash: 709a0aef5d1ecda90fb3f3542aa71bef08b9fab8 # hash from contents of asset file
cacheHash: 0511420356589c5669c83daeff059d68078aef80 # hash from contents of cache file
These files are generally huge. Some of these files can be 15-20mb in size.
The general purpose is not about security or privacy but it is more about checking if the source data is changed. So, in the example above, if my-file.png
is changed, the application will check the file's contents against the hash stored in the "cache file" and if the hashes do not match, it will recreate my-file.ktx2
and update the "cache file" with a new hash. The assets are generally added by the user themselves; so, if they try to willfully tamper with this system, they will be breaking their own workflow.
What kind of hashing algorithm can I use here that is fast enough to create the hash but also reliable enough in terms of not having false negatives (i.e collisions)?
I am currently using Sha256 hash and it is quiet slow, especially on large files.