I have 3 binaries as three documents. Is it possible to merge the three binaries into a single document and write it to a file?
I have used a union to gather three CSV files as binaries into 3 documents, but I need to write all three documents to the same file. Is it possible?
Just a suggestion: have you tried the group by N snap?
I have not tried that. I will give that a shot today.
After messing around with Group By N, I’m not sure that will work. You can group by content, but I could never find a way to write the combined binaries to a file. This is what my flow looks like right now:
If I add a file writer after the Document to Binary snap, it writes to a file but only the last document and not the other two binaries.
Would a Join Snap set to Merge work?
Whether I use a Union --> Document to Binary --> File Writer or Join (Merge) --> Pivot --> Document to Binary --> File Writer, I run into the same issue. I have three documents and only the final document is written to the file. How do I write all three documents to a single file?
In the FileWriter, are you using the “Append” setting for the “File Action”? What protocol are you using? Are you sure that protocol supports doing an append?
Append in the File Writer was the first thing I tried last week but Append isn’t supported in S3. I would like to avoid having to use SFTP to accomplish this. That complicates things in other ways.
Note that using a Union snap will not work like you want since it will immediately pass whatever input comes in to the output. Since Snaps all run in parallel, there is no guarantee you will get your desired order of documents on the output (i.e. header, detail, trailer). It could theoretically come out as: trailer, header, detail.
How much data is in the Detail branch of the screenshot above? Hundreds of rows, millions? If it’s more on the order of tens/hundreds, the new Gate snap might work for this use case. You can replace CSVFormatters, Bin2Doc, and Union with the Gate snap followed by a Mapper, JSON Splitter, and a CSV Formatter. I’m attaching an example pipeline that does just that.
CommunityMergeBinary_2020_03_02.slp (10.1 KB)
I’ll give this a shot. My concern is that the header, detail, and trailer records all have different numbers of columns. 2 - 40ish - 3
So close. I got close to this point last week. What happens when you have different numbers of columns when using a CSV formatter, it adds additional columns to the front of each detail and trailer row. The CSV formatter doesn’t like different numbers of columns
What are the contents of the header and trailer? Are they valid CSV rows or plain text?
The header is 2 columns (no actual header) - record_type and description. The trailer is 3 columns (no actual header) - record_type, number of detail rows, and end of file field. There can be N number of detail records. They all start with record_type column and then have ~40 columns. The header and trailer rows are also pipe-delimited.
The pipeline you shared above gave me an idea. I tested it and it looks like it will work for what I need to do, although I still need to figure out how to get the total number of detail rows to put in the trailer.
Get your header/details/trailer --> CSV format however you need them --> Bin2Doc --> Join Merge --> Mapper (Base64.decode($content).concat(Base64.decode($detail_content), Base64.decode($trailer_content))) --> DocToBin --> File Writer
I don’t believe that this is the most efficient way of accomplishing this task but it looks like it will work. What do you think @tstack?