Forum Discussion

vsunilbabu's avatar
vsunilbabu
New Contributor II
5 years ago

Change datatypes dynamically of every column

Hello everybody,

Need help creating a pipeline. The data am working on looks like below:

Header
Datatype
Data
Data
Data

This is a CSV file so every column comes as a string. I want to create a pipeline which will look at the first row and change the datatype of that particular column using mapper.

My thought was to initially find only columns in the form of a table(group) and pass one by one as a parameter into a mapper placed within a pipeline execute. The mapper within the pipeline will have something similar to group[1].contains(‘date’)?Date.parse(_parameter):group[1].contains(‘char’)?_parameter:group[1].contains(‘char’)?_parameter:parseInt(_parameter)

Can anyone help me how to bring out only column names as an array/table so that I can try out my logic.

(OR) if you have any other easier method to do this, then can you please share.

Thanks,
Sunil

6 Replies

  • vsunilbabu's avatar
    vsunilbabu
    New Contributor II

    Because I was stuck, unable to come up with list of column names to pass into pipeline execute as parameters, I used CSV generator initially to test my logic. I could not do it.

    Any help is appreciated. Pipeline should look at the first row and change the datatype of that column.
    Thanks in advance.

    Regards,
    Sunil

    • tstack's avatar
      tstack
      Former Employee

      The CSVParser has some functionality for doing type conversion, but the types are expected to be in a separate file (see the Input Views section of the doc).

      If you are not able to get your data in that form, I’m attaching an example pipeline that might do what you want. This pipeline uses a Router snap to split the first row off and then a Join to merge it back in with all the remaining rows. A Mapper snap is then used to do the type conversion with the following expression:

      $.mapValues(
          (value, key) => match $types.get(key) {
              'char' => value,
              'integer' => parseInt(value),
              'date' => Date.parse(value, "dd/mm/YY"),
              _ => value
          }
      )
      

      Since that’s a little involved, I’ll go into some more detail. First, the mapValues() method is used to rewrite the value of each property in the input document. That method takes a callback that does the actual work. The callback uses the match operator to check the type of each property and then executes the conversion expression (e.g. the type of “Priority” is “integer”, so the match arm with parseInt(value) is executed).

      TypeConversion_2019_09_16.slp (10.5 KB)

  • @vsunilbabu If Create new table if not present option is selected without providing the schema in a secondary input view, varchar will be used for all the column’s data types.

    In your use case, you can provide the schema of the table that you want to create in a second input view to get exactly the data type that you want in the Snowflake Table.

    The first example mentioned in the Snowflake Bulk Load snap’s documentation covers a similar use case with an example: https://docs-snaplogic.atlassian.net/wiki/spaces/SD/pages/1438549/Snowflake+-+Bulk+Load

    cc: @dmiller