02-26-2020 03:01 PM
There’s a way to calculate the total number of input or output documents, which is monumentally handy.
However, is there a way to calculate the number of bytes read on input/output? This would be particularly useful for file chunking ie. if you want to segment files written to S3 by 100mb files, as opposed to having to make a guess as to the number of documents you need to read.
I can imagine a way to do this right now roughly by building a Script snap to calculate every documents’ flattened JSON space requirements, but I worry this would actually slow down the pipeline a lot since it requires basically either traversing a document dynamically or by re-serializing each document in order to produce a byte count.
Is there another way we can get the number of bytes read or written on a snap?