Creating output file names dynamically for each SQL Server table data stream taken as input from configuration File

I am trying to create a pipeline which will take a configuration file as an input and it will have Table name and column names that i will need for that table. You can put any no of tables and for each table no of column selected will be different. I want to write the data in output file and output file will be created for each table separate with naming pattern out_.csv. I used a SQL execute snap to get the data and pass to target but not able to create separate files. All data are getting appended in one file and header row is coming only for the first table. Any suggestions on how this can be done? Please feel free to ask question regarding the requirement. This will help to do one time loads from a DB to a file system and i will be able to control the table names and column names on the fly.

you can go for a parent child pipeline, in your parent pipeline, read the configuration file and then pass these as parameters to the child pipeline where for each run you will be passing a table name and corresponding columns and then in your file write snap you can use the parameter to name your file.

if possible please share example of your config file, that will make it easier to understand the problem.

Regards
Anubhav

Example of the configuration file:

Table_1, (Col A, Col B, Col C)
Table_2, (Col D, Col E, Col A)
Table_3, All

Here All means all the columns for Table_3 so in the select query it should be replaced with ‘*’. Col_A i have kept it in both Table_1 and Table_2 because some columns might be common working as foreign key to another table.

Out put will be <Table_Name>_OUT.csv/Amazon S3, Table_2.csv/Amazon S3 etc.

I was doing it using one pipeline and kind of works only problem i am facing is the output part whether instead of creating separate files it is writing the out put in the same file. In the above example if i am running my pipeline with first two tables then it is writing the header for the first file and all it’s data correctly but for the second table it is writing the data in the same file without the header row and row values are starting after putting just the delimiters of for the first table…example below.

Current output: Only 1 file with name Table_1.csv
Col A, Col B, Col C
1, Rakesh, 1140 easton avenue
,1234,4567,1

Expected output: 2 file with names Table_1.csv and Table_2.csv
Col A, Col B, Col C
1, Rakesh, 1140 easton avenue

Col D, Col E, Col A
1234,4567,1

As suggested by Anubhav

  1. Create one main pipeline that will read the table name and column names from csv file
  2. Create one sub pipeline that will be called from the main pipeline. Child pipeline will have two parameters (tableName and columnNames). This subpipeline will execute SQL for the given table and columns and will write the result to csv

Main pipeline:
main-pipeline

Sub-pipeline
Sub-pipeline

File-writer

1 Like

I did that and it worked. Just wondering why it will not work if i have one pipeline and use the same parameters. It creates the output in the file but the problem i am seeing it is not able to cretae multiple files based on different input table names and appending everything in opne file. I tried the Append, overwrite options in the file writer snap and did not make it work.

Once you “go binary” (triangle shape output) in the CSV Formatter, there is no data retained other than the binary data itself. So you don’t have the TableName anymore. You can still reference values that were passed into the pipeline as a parameters though (_TableName). So if you call a child pipeline, passing the table name as a pipeline parameter, then the child pipeline can use that parameter in the downstream File Writer.

Ok. Any way you assign a parameter in the same pipeline and access it from file writer snap?

You can’t change a pipeline parameter within the pipeline. You might be able to pass a value through as a binary header in the CSV Formatter and then reference it downstream, but that might also make your file a little wonky, not sure. Calling child pipelines is pretty straightforward though, as this is a typical reason to do so.

Thanks for the detailed inputs. Certainly appreciate that everyone.