Change datatypes dynamically of every column

Hello everybody,

Need help creating a pipeline. The data am working on looks like below:
image

Header
Datatype
Data
Data
Data

This is a CSV file so every column comes as a string. I want to create a pipeline which will look at the first row and change the datatype of that particular column using mapper.

My thought was to initially find only columns in the form of a table(group) and pass one by one as a parameter into a mapper placed within a pipeline execute. The mapper within the pipeline will have something similar to group[1].contains(‘date’)?Date.parse(_parameter):group[1].contains(‘char’)?_parameter:group[1].contains(‘char’)?_parameter:parseInt(_parameter)

Can anyone help me how to bring out only column names as an array/table so that I can try out my logic.

(OR) if you have any other easier method to do this, then can you please share.

Thanks,
Sunil

Because I was stuck, unable to come up with list of column names to pass into pipeline execute as parameters, I used CSV generator initially to test my logic. I could not do it.

Any help is appreciated. Pipeline should look at the first row and change the datatype of that column.
Thanks in advance.

Regards,
Sunil

The CSVParser has some functionality for doing type conversion, but the types are expected to be in a separate file (see the Input Views section of the doc).

If you are not able to get your data in that form, I’m attaching an example pipeline that might do what you want. This pipeline uses a Router snap to split the first row off and then a Join to merge it back in with all the remaining rows. A Mapper snap is then used to do the type conversion with the following expression:

$.mapValues(
    (value, key) => match $types.get(key) {
        'char' => value,
        'integer' => parseInt(value),
        'date' => Date.parse(value, "dd/mm/YY"),
        _ => value
    }
)

Since that’s a little involved, I’ll go into some more detail. First, the mapValues() method is used to rewrite the value of each property in the input document. That method takes a callback that does the actual work. The callback uses the match operator to check the type of each property and then executes the conversion expression (e.g. the type of “Priority” is “integer”, so the match arm with parseInt(value) is executed).

TypeConversion_2019_09_16.slp (10.5 KB)

Thank you @tstack . This worked perfectly.

Can I ask a question on top of this.

I output of my mapper has correct datatypes,
image

I placed a Snowflake Bulk Load snap after the mapper with below configuration

This snap should create a table in the database if it is not already existing.

Pipeline ran successfully and the table was also created.

But when I see the datatypes in my database, all the columns are Varchar
image

Is there any specific reason for this. How can I ensure that the datatypes at the end of mapper to reflect in my database(snowflake)?

Thanks,
Sunil

Hello,

Can anyone look into this please. Is this an issue with Snowflake snaps?

I used Snowflake Bulk Load snap many times before to upload data. Usually, I create necessary tables using DDL statements before hand in snowflake and then use Bulk Load snap in my pipelines, but this is the first time am using ‘Create table if not present’.

Thanks,
Sunil

@vsunilbabu If Create new table if not present option is selected without providing the schema in a secondary input view, varchar will be used for all the column’s data types.

In your use case, you can provide the schema of the table that you want to create in a second input view to get exactly the data type that you want in the Snowflake Table.

The first example mentioned in the Snowflake Bulk Load snap’s documentation covers a similar use case with an example: https://docs-snaplogic.atlassian.net/wiki/spaces/SD/pages/1438549/Snowflake+-+Bulk+Load

cc: @dianevmiller