cancel
Showing results for 
Search instead for 
Did you mean: 

Change datatypes dynamically of every column

vsunilbabu
New Contributor II

Hello everybody,

Need help creating a pipeline. The data am working on looks like below:
image

Header
Datatype
Data
Data
Data

This is a CSV file so every column comes as a string. I want to create a pipeline which will look at the first row and change the datatype of that particular column using mapper.

My thought was to initially find only columns in the form of a table(group) and pass one by one as a parameter into a mapper placed within a pipeline execute. The mapper within the pipeline will have something similar to group[1].contains(‘date’)?Date.parse(_parameter):group[1].contains(‘char’)?_parameter:group[1].contains(‘char’)?_parameter:parseInt(_parameter)

Can anyone help me how to bring out only column names as an array/table so that I can try out my logic.

(OR) if you have any other easier method to do this, then can you please share.

Thanks,
Sunil

6 REPLIES 6

vsunilbabu
New Contributor II

Because I was stuck, unable to come up with list of column names to pass into pipeline execute as parameters, I used CSV generator initially to test my logic. I could not do it.

Any help is appreciated. Pipeline should look at the first row and change the datatype of that column.
Thanks in advance.

Regards,
Sunil

The CSVParser has some functionality for doing type conversion, but the types are expected to be in a separate file (see the Input Views section of the doc).

If you are not able to get your data in that form, I’m attaching an example pipeline that might do what you want. This pipeline uses a Router snap to split the first row off and then a Join to merge it back in with all the remaining rows. A Mapper snap is then used to do the type conversion with the following expression:

$.mapValues(
    (value, key) => match $types.get(key) {
        'char' => value,
        'integer' => parseInt(value),
        'date' => Date.parse(value, "dd/mm/YY"),
        _ => value
    }
)

Since that’s a little involved, I’ll go into some more detail. First, the mapValues() method is used to rewrite the value of each property in the input document. That method takes a callback that does the actual work. The callback uses the match operator to check the type of each property and then executes the conversion expression (e.g. the type of “Priority” is “integer”, so the match arm with parseInt(value) is executed).

TypeConversion_2019_09_16.slp (10.5 KB)

vsunilbabu
New Contributor II

Thank you @tstack . This worked perfectly.

Can I ask a question on top of this.

I output of my mapper has correct datatypes,
image

I placed a Snowflake Bulk Load snap after the mapper with below configuration
image

This snap should create a table in the database if it is not already existing.

Pipeline ran successfully and the table was also created.

But when I see the datatypes in my database, all the columns are Varchar
image

Is there any specific reason for this. How can I ensure that the datatypes at the end of mapper to reflect in my database(snowflake)?

Thanks,
Sunil