09-20-2021 09:53 AM
Trying to do ETL to move source data to target db. I don’t want to create a pipeline for every single table. There are hundreds of tables.
To start I want it to do this simply and assume a truncate and reload of all the tables. I will worry about incremental loads later.
I want to run a SQL Execute snap with this code to get a list of all the tables.
SELECT
TOP 1
s.name AS srcSchema,
t.name AS srcTbl,
‘mynewschema’ AS targSchema,
t.name AS targTbl
FROM
MyDb.sys.schemas s
JOIN MyDb.sys.tables t ON
s.schema_id = t.schema_id
WHERE
s.name in (‘dbo’)
ORDER BY
s.name,t.name
I want to then run the pipeline to go through the list of tables, truncate the target, select the source data, BCP the data into the target staging area. Once there, I have a SQL Merge routine already built that will pull in the differences to our lake and I can easily call that with a SQL Execute.
I’m struggling to find an example of this in any documentation or online.
Creating hundreds of pipelines is not going to work.
11-23-2021 06:07 AM
If you can get the list of tables into documents (one document per table) - it should be pretty simple to build a (parametrized) pipeline for this, calling it for each document (each table) once.
You can also parallelize that with settings on the execute pipeline snap, assuming both source and destination support this.
Your child pipeline:
start with a SQL Select snap - which has a 2nd (optional) output view returning the table schema.
in our case, we convert the schema a little (uppercase column names, remove indexes) before ingesting it with a load snap (in our case snowflake bulk load) - which again has 2 input views - the 2nd one being for the schema), and tick the “create table” checkbox in the load snap.