SnapLogic uses bcp in the SQL Server Bulk Load snap; however it does not make use of the -b batch_size argument. As a result, all rows are imported as one batch. We were attempting to load 582 million rows (temp file was 426 GiB in size) and bcp failed because the database’s transaction log ran out of space. Adding support for bcp’s -b batch_size argument would allow for better database resource utilization and would also yield better performance of the bulk load.
We have created an enhancement request to allow the batch size to be configurable in the snap. A workaround you could try is to create a
/tmp/mybcp.sh script on the Groundplex nodes with the contents
/<REAL_BCP_PATH>/bcp "$@" -b 1000000
Change it to be executable
chmod +x /tmp/mybcp.sh
/tmp/mybcp.sh as the BCP absolute path in the snap. The steps would be similar on Windows. The argument encoding (for double quotes) would be the issue to look out for with this workaround.
Thank you. This worked; I just needed to swap the order of the arguments.
/<REAL_BCP_PATH>/bcp “$@” -b 1000000
Thanks for the note, I have updated the steps above
I have a similar issue moving large number of rows. Get table locking and also memory when the file hits > 110MM rows. Could you provide more information about setting the interim workaround up on the groundplex and pipeline?
Is your pipeline also writing to SQL Server? Is the Groundplex on Window or Linux?
On Linux, you can create the shell script as mentioned and change the “SQL Server - Bulk Load” snap “BCP absolute path” property to point to the path of the shell script created.
On Windows, create a batch file and use
%* instead of
Windows GP running as a Service, SQL Server 2017.
I have a basic bulk copy going.
Created the file bcp.bat which contains bcp.exe %* -b 100000 and referenced it on the GP. Running a test now