Hello!
I have REST API source from which I ingest data and load it to BigQuery. I am executing this dynamically with parameters as there are multiple endpoints to query. I am using mapper to define the schema before loading, in this case with the pass-through enabled. However, I have run into a problem, that sometimes the API doesn’t return certain object in the json. Then, when I auto-create the table during load, it may not define this missing column in the schema based on the sample, but then breaks when this column suddenly appears. Due to this being a dynamic pipeline, I cannot hard-code any single columns to the mapper, as the column names are different for each job. Also, the number of columns might differ.
So is there a way to dynamically define the column names I want to keep from the source? For example I a good solution could be one where I can define a single json, array or some other sort of string that contains the namespace of columns I’d like to keep, and just give that as a parameter to the mapper.
Just to showcase with an example what I would want out with the dynamic expression:
Source 1:
[
{
“a”:“123”,
“b”:“qwerty”,
“c”:“aaa”,
“d”:asd",
“e”:“111”
},
{
“a”:“456”,
“b”:“zxcvbn”,
“c”:“bbb”,
“d”:fgh",
“e”:“222”
}
]
Target schema 1:
a, b, c, e
Source 2:
{
“h”:“123”,
“i”:“qwerty”,
“j”:“aaa”,
“k”:asd",
“l”:“111”
},
Target schema 2:
i, j, k
Hopefully you get what I’m after Thanks
-Tuomas