05-12-2023 06:11 AM
Hello!
I have REST API source from which I ingest data and load it to BigQuery. I am executing this dynamically with parameters as there are multiple endpoints to query. I am using mapper to define the schema before loading, in this case with the pass-through enabled. However, I have run into a problem, that sometimes the API doesn’t return certain object in the json. Then, when I auto-create the table during load, it may not define this missing column in the schema based on the sample, but then breaks when this column suddenly appears. Due to this being a dynamic pipeline, I cannot hard-code any single columns to the mapper, as the column names are different for each job. Also, the number of columns might differ.
So is there a way to dynamically define the column names I want to keep from the source? For example I a good solution could be one where I can define a single json, array or some other sort of string that contains the namespace of columns I’d like to keep, and just give that as a parameter to the mapper.
Just to showcase with an example what I would want out with the dynamic expression:
Source 1:
[
{
“a”:“123”,
“b”:“qwerty”,
“c”:“aaa”,
“d”:asd",
“e”:“111”
},
{
“a”:“456”,
“b”:“zxcvbn”,
“c”:“bbb”,
“d”:fgh",
“e”:“222”
}
]
Target schema 1:
a, b, c, e
Source 2:
{
“h”:“123”,
“i”:“qwerty”,
“j”:“aaa”,
“k”:asd",
“l”:“111”
},
Target schema 2:
i, j, k
Hopefully you get what I’m after 🙂 Thanks
-Tuomas
Solved! Go to Solution.
05-12-2023 06:31 AM
Hi @Tuomas,
One way that you can try is to define the columns for each schemas and to append that object to the response documents.
Schemas columns object example:
[
{
"schema1":['a','b'],
"schema2":['c','d'],
"schema3":['e','f']
}
]
Also, you can make this as part of a config file.
Then you can append this object to every document using the Join Snap with the following configuration:
So, finally you can dynamically filter out the columns from the root object using the following expression in a Mapper Snap:
$.filter((v,k) => $.get(_schema).indexOf(k) != -1)
Sample pipeline:
–sl-comm-dynamic-columns-parameter_2023_05_12.slp (6.8 KB)
Let me know if this helps you @Tuomas.
BR,
Aleksandar.
05-12-2023 06:31 AM
Hi @Tuomas,
One way that you can try is to define the columns for each schemas and to append that object to the response documents.
Schemas columns object example:
[
{
"schema1":['a','b'],
"schema2":['c','d'],
"schema3":['e','f']
}
]
Also, you can make this as part of a config file.
Then you can append this object to every document using the Join Snap with the following configuration:
So, finally you can dynamically filter out the columns from the root object using the following expression in a Mapper Snap:
$.filter((v,k) => $.get(_schema).indexOf(k) != -1)
Sample pipeline:
–sl-comm-dynamic-columns-parameter_2023_05_12.slp (6.8 KB)
Let me know if this helps you @Tuomas.
BR,
Aleksandar.
05-14-2023 10:30 PM
Seems to work like a charm 🙂 Thank you!
05-17-2023 01:42 AM
A follow-up question:
Any chance I could rename some of the columns at the same time using similar logic? 🙂
BR,
Tuomas
05-17-2023 02:57 AM
Hi @Tuomas,
Please take a look at the attached pipeline below:
–sl-comm-dynamic-columns-parameter_2023_05_17.slp (8.3 KB)
Let me know if this helps.
BR,
Aleksandar.