Forum Discussion

davidmil's avatar
davidmil
New Contributor
2 years ago

Postgres to parquet writer

I am trying to write data from postgres to parquet. The source has numeric data type which has can values as int, decimal or float. I have mapped numeric type to decimal in parquet but the issue is that it converts int values to decimal as well. ex 1 becomes 1.00 and when i map the numeric datatype to int i loose the decimal values. It will be a general pipeline for many objects and i wont have the column schema at the runtime. Is there any workaround for this in parquet writer where we can distinguish between int and decimal for numeric data type.

5 Replies

    • SpiroTaleski's avatar
      SpiroTaleski
      Valued Contributor

      davidmil 

      If that is the case then probably you should check if the incoming numeric value is integer or float. One way is,  to check for a remainder when dividing by 1: 

      n % 1 == 0 --> integer
      n % 1 != 0 --> float

      BR,

      Spiro Taleski

      • davidmil's avatar
        davidmil
        New Contributor

        HI SpiroTaleski 

        Can you suggest any snap through which i can achieve this. It is a dynamic pipeline which would iterate for all the tables in schema so i won't be able to hardcode column names at runtime 

  • manichandana_ch's avatar
    manichandana_ch
    New Contributor III

    Hi davidmil 

    Best option is to read and convert the data types from source, as SpiroTaleski  mentioned you can use expression file and maintain the datatype conversions, in such case, you need not worry for any sources/source metadata. you can maintain all possible source datatypes in the file and convert them to parquet supported datatypes and pass to parquet writer schema input, everything will be handled in this case.