10-03-2020 12:00 AM
Hello everybody,
Need help creating a pipeline. The data am working on looks like below:
Header
Datatype
Data
Data
Data
This is a CSV file so every column comes as a string. I want to create a pipeline which will look at the first row and change the datatype of that particular column using mapper.
My thought was to initially find only columns in the form of a table(group) and pass one by one as a parameter into a mapper placed within a pipeline execute. The mapper within the pipeline will have something similar to group[1].contains(‘date’)?Date.parse(_parameter):group[1].contains(‘char’)?_parameter:group[1].contains(‘char’)?_parameter:parseInt(_parameter)
Can anyone help me how to bring out only column names as an array/table so that I can try out my logic.
(OR) if you have any other easier method to do this, then can you please share.
Thanks,
Sunil
10-03-2020 06:30 AM
Because I was stuck, unable to come up with list of column names to pass into pipeline execute as parameters, I used CSV generator initially to test my logic. I could not do it.
Any help is appreciated. Pipeline should look at the first row and change the datatype of that column.
Thanks in advance.
Regards,
Sunil
10-03-2020 12:07 PM
The CSVParser has some functionality for doing type conversion, but the types are expected to be in a separate file (see the Input Views section of the doc).
If you are not able to get your data in that form, I’m attaching an example pipeline that might do what you want. This pipeline uses a Router snap to split the first row off and then a Join to merge it back in with all the remaining rows. A Mapper snap is then used to do the type conversion with the following expression:
$.mapValues(
(value, key) => match $types.get(key) {
'char' => value,
'integer' => parseInt(value),
'date' => Date.parse(value, "dd/mm/YY"),
_ => value
}
)
Since that’s a little involved, I’ll go into some more detail. First, the mapValues()
method is used to rewrite the value of each property in the input document. That method takes a callback that does the actual work. The callback uses the match
operator to check the type of each property and then executes the conversion expression (e.g. the type of “Priority” is “integer”, so the match arm with parseInt(value)
is executed).
TypeConversion_2019_09_16.slp (10.5 KB)
10-12-2020 01:56 AM
Thank you @tstack . This worked perfectly.
10-12-2020 03:48 AM
Can I ask a question on top of this.
I output of my mapper has correct datatypes,
I placed a Snowflake Bulk Load snap after the mapper with below configuration
This snap should create a table in the database if it is not already existing.
Pipeline ran successfully and the table was also created.
But when I see the datatypes in my database, all the columns are Varchar
Is there any specific reason for this. How can I ensure that the datatypes at the end of mapper to reflect in my database(snowflake)?
Thanks,
Sunil