Does anyone have any experience or recommendations about how to compare two files and check if they are identical? I don’t mean data files, I’m thinking about media files here so something along the lines of creating a hash of the files and comparing them?
With some assistance from our good friends at SnapLogic I have a solution to this now. To get a hash of a file, you can use a Binary to Document snap, and then use a Mapper on the resulting document. The expression “Digest.sha256($content)” will generate an sha256 hash from the file. This hash can then be compared with other hashes to rcognise duplicate files.
In my use case I have run this process on files in two different directories and then done an outer Join on the hash values to produce a report on which files match and which do not.