Counting headers in CSV

Hi Team,

I was trying to develop a pipeline where it compares the data of two systems . I have collected huge data from those two systems and I need to find the mismatch of data . I thought of splitting this work in two ways . First to find the mismatch of data under the same header and this is easy by using joins and the other is there are some headers present in system 1 which are not in system 2 and vice versa . So it could be useful if we have any header comparison technique handled by Snap or atleast anything like which displays the count of headers or rows ?

Thanks

Hi @Harsha3,

You can create array *field_list** of columns from the header from the 1st system, by using the following statement: *.keys().filter(x => x.toLowerCase() != ‘original’).map(x => x.toLowerCase())

The same could be done for the 2nd system and lets call this array $target_field_list.
The columns (fields) that exist in the 1st system and don’t exist in the 2nd one you can find by using the following:
$field_list.filter(x => $target_field_list.indexOf(x) == -1)

Similar to this, by using the following statement you can find the fields that exist in 2nd system and don’t exist in the 1st system:
$target_field_list.filter(x => $field_list.indexOf(x) == -1)

Recently I’ve resolved the same problem and below is a screenshot of the pipeline that I developed for detecting differences in table structures between source and target table:
image
Note: Join type in Join Snap is outer join set as true = true:
image

If both datasets (data streams) have same structure and if you want to compare content of the streams you can use Diff snap. You can find more info at the following link.

Regards,
Lazo

2 Likes