Create and load table for different files from same pipeline


I have multiple different files with different structure. My pipeline should read each file, identify the column names, column length and data type and create the table in given db connection. Once table is created it should be loaded with data present in the incoming file.

How can I achieve this. Please help.

What are the file formats (e.g. CSV, JSON, XML, …)?

Do you need to do any transformations to the values before insertion?

The DB Insert snaps (e.g. MySQL Insert) have an option to create the table if it is not present. I think the table creation is done based on the values in the first record to be, so string properties will be VARCHAR columns in the DB, numbers will be a number column, and so on. Is that sufficient or are you looking for more?

Auto create table with the perspective data typing and field length dynamically isn’t something you can easily do today. My experience with Redshift so far is Snap would default all fields to varchar and having max length of 32000 which isn’t ideal. What you can do is perform a dynamic max(length([field]) via SQL to obtain the length the field should be then try to port the data from the original table to the ideal one.