Forum Discussion

tom_saunders's avatar
tom_saunders
New Contributor
5 years ago
Solved

Compare two files to see if they are the same

Does anyone have any experience or recommendations about how to compare two files and check if they are identical? I don’t mean data files, I’m thinking about media files here so something along the lines of creating a hash of the files and comparing them?

  • With some assistance from our good friends at SnapLogic I have a solution to this now. To get a hash of a file, you can use a Binary to Document snap, and then use a Mapper on the resulting document. The expression “Digest.sha256($content)” will generate an sha256 hash from the file. This hash can then be compared with other hashes to rcognise duplicate files.

    In my use case I have run this process on files in two different directories and then done an outer Join on the hash values to produce a report on which files match and which do not.

1 Reply

  • With some assistance from our good friends at SnapLogic I have a solution to this now. To get a hash of a file, you can use a Binary to Document snap, and then use a Mapper on the resulting document. The expression “Digest.sha256($content)” will generate an sha256 hash from the file. This hash can then be compared with other hashes to rcognise duplicate files.

    In my use case I have run this process on files in two different directories and then done an outer Join on the hash values to produce a report on which files match and which do not.