Change datatypes dynamically of every column

Question

Hello everybody,
Need help creating a pipeline. The data am working on looks like below:

Header
Datatype
Data
Data
Data
This is a CSV file so every column comes as a string. I want to create a pipeline which will look at the first row and change the datatype of that particular column using mapper.
My thought was to initially find only columns in the form of a table(group) and pass one by one as a parameter into a mapper placed within a pipeline execute. The mapper within the pipeline will have something similar to group[1].contains(‘date’)?Date.parse(_parameter):group[1].contains(‘char’)?_parameter:group[1].contains(‘char’)?_parameter:parseInt(_parameter)
Can anyone help me how to bring out only column names as an array/table so that I can try out my logic.
(OR) if you have any other easier method to do this, then can you please share.
Thanks,
Sunil

tstack · Answer

vsunilbabu:

This is a CSV file so every column comes as a string. I want to create a pipeline which will look at the first row and change the datatype of that particular column using mapper.

The CSVParser has some functionality for doing type conversion, but the types are expected to be in a separate file (see the Input Views section of the doc).
If you are not able to get your data in that form, I’m attaching an example pipeline that might do what you want.  This pipeline uses a Router snap to split the first row off and then a Join to merge it back in with all the remaining rows.  A Mapper snap is then used to do the type conversion with the following expression:
$.mapValues(
    (value, key) =&gt; match $types.get(key) {
        'char' =&gt; value,
        'integer' =&gt; parseInt(value),
        'date' =&gt; Date.parse(value, "dd/mm/YY"),
        _ =&gt; value
    }
)

Since that’s a little involved, I’ll go into some more detail.  First, the mapValues() method is used to rewrite the value of each property in the input document.  That method takes a callback that does the actual work.  The callback uses the match operator to check the type of each property and then executes the conversion expression (e.g. the type of “Priority” is “integer”, so the match arm with parseInt(value) is executed).
TypeConversion_2019_09_16.slp (10.5 KB)

vsunilbabu · Answer

Because I was stuck, unable to come up with list of column names to pass into pipeline execute as parameters, I used CSV generator initially to test my logic. I could not do it.
Any help is appreciated. Pipeline should look at the first row and change the datatype of that column.
Thanks in advance.
Regards,
Sunil

vsunilbabu · Answer

Thank you @tstack  . This worked perfectly.

vsunilbabu · Answer

Can I ask a question on top of this.
I output of my mapper has correct datatypes,

I placed a Snowflake Bulk Load snap after the mapper with below configuration

This snap should create a table in the database if it is not already existing.
Pipeline ran successfully and the table was also created.
But when I see the datatypes in my database, all the columns are Varchar

Is there any specific reason for this. How can I ensure that the datatypes at the end of mapper to reflect in my database(snowflake)?
Thanks,
Sunil

vsunilbabu · Answer

Hello,
Can anyone look into this please. Is this an issue with Snowflake snaps?
I used Snowflake Bulk Load snap many times before to upload data. Usually, I create necessary tables using DDL statements before hand in snowflake and then use Bulk Load snap in my pipelines, but this is the first time am using ‘Create table if not present’.
Thanks,
Sunil

Forum Discussion

Change datatypes dynamically of every column

6 Replies

Recent Discussions

Way to lock down in Prod org to "Monitor" only access?

trace API and proxy calls

Pagination Logic Fails After Migrating from REST GET to HTTP Client Snap

Pipeline Execute Pool size

Concat values of a field based on value of another field