Hi Team, I’m using an Inner Join and observing that the output preview doesn’t appear. Is this an expected behaviour? I however, have access to the data I’m expecting out of Inner join (as I’m writing to an excel file) but using a mapper after Inner Join doesn’t help as there is no data in the input schema of the mapper. How do I filter the data that I don’t need out of the Inner Join when the preview is not available and Input schema is showing nothing? I also have access to the other filtered data as I have enabled the error view in Inner Join and that does have an output preview to it. Appreciate your help and time on this. Best Regards, Darsh

It’s quite common when developing and testing software to use very small sample data sets, either dummy data (made up for testing purposes) or a subset of the real data. This also lets you verify the expected results more easily. You can’t do that if you’re testing with millions of rows. After you validate the correct functioning of your pipelines against the test data, you create a version that runs against the real data.

@darshthakkar - as @ptaylor mentioned, when you are working with Pipeline Preview data and need to see data movement through the entire pipeline, you need to reduce your source data sets to a limited enough set that any snaps that can potentially filter data (i.e. Join, Filter, etc.) will still have data to pass. In your case, I would suggest putting a temporary filtering condition (i.e. Where Clause) on your database Select snap so that your data will match after the Join snap. Once you have verified that the pipeline logic is correct, simply remove the filter from your select and execute the entire pipeline and validate the complete results.

@darshthakkar - I think you may be running into a common misunderstanding with the Data Preview functionality. By default, SnapLogic pulls only the first 50 records for preview into each snap. If the two sources you are reading from do not have any records that align to meet your join criteria with an inner join, then nothing will be output. For example, if SourceA contains records such as CustomerId from 1 to 50, and SourceB contains records with CustomerId from 2000 - 2050, then the inner join doesn’t match anything and all records are effectively filtered. You can try updating your User Settings / Preview Document Count as high as 2000 to see if that helps: Or you can set filter conditions on your source data to ensure alignment of data that will match in your Join - in the case of the CustomId values I gave, you could set a filter condition in your database select to pull only records with CustomerId values of 1-50. To prove if my theory is correct, you can switch your Join to a Left Outer join type and see that documents will flow through to the output view of the Join.

That is correct - if the Mapper has no input, it can’t determine the schema.

@koryknick: Thank you for your patience. I was able to test with 2 dummy files and could literally see the output preview on the Inner Join. Those dummy files had a max of 27 rows (which is very less in comparison to the data I deal with everyday) However, with the kind of data I’m dealing with my use case, its in hundreds of thousands of records for which a large chunk of data wouldn’t suffice the condition of Inner Join → In such cases even if I select the “Preview Document Count” as 2000, the output of Inner Join wouldn’t be there. What would you suggest me here? What else can I use instead of an Inner Join?

Output preview not available for Inner Join | SnapLogic - Integration Nation

26 Replies

koryknick
Employee
4 years ago
@darshthakkar - as @ptaylor mentioned, when you are working with Pipeline Preview data and need to see data movement through the entire pipeline, you need to reduce your source data sets to a limited enough set that any snaps that can potentially filter data (i.e. Join, Filter, etc.) will still have data to pass. In your case, I would suggest putting a temporary filtering condition (i.e. Where Clause) on your database Select snap so that your data will match after the Join snap. Once you have verified that the pipeline logic is correct, simply remove the filter from your select and execute the entire pipeline and validate the complete results.
- darshthakkar
  Valued Contributor
  4 years ago
  Thank you @koryknick for your suggestions. I will follow this practice going forward so that I’m not stuck with my development.
  
  I’ll be closing this thread soon listing the best practices and solution provided. As both @ptaylor and you were really instrumental in providing the suggestions, I wouldn’t be able to mark a solution for either of your comments but will be giving credits on my closing comment coming soon (may be end of the day today)
  
  Thank you again both of you for your help on this one.
  
  Regards,
  Darsh
koryknick
Employee
4 years ago
@darshthakkar - I think you may be running into a common misunderstanding with the Data Preview functionality. By default, SnapLogic pulls only the first 50 records for preview into each snap. If the two sources you are reading from do not have any records that align to meet your join criteria with an inner join, then nothing will be output.

For example, if SourceA contains records such as CustomerId from 1 to 50, and SourceB contains records with CustomerId from 2000 - 2050, then the inner join doesn’t match anything and all records are effectively filtered.

You can try updating your User Settings / Preview Document Count as high as 2000 to see if that helps:

Or you can set filter conditions on your source data to ensure alignment of data that will match in your Join - in the case of the CustomId values I gave, you could set a filter condition in your database select to pull only records with CustomerId values of 1-50.

To prove if my theory is correct, you can switch your Join to a Left Outer join type and see that documents will flow through to the output view of the Join.
- darshthakkar
  Valued Contributor
  4 years ago
  Thank you @koryknick for responding to this.
  I will try with 2 sample flat files and test the functionality of Inner Join. While I was reading your explanation, I felt it was right, Left Outer Join always gives me an output however that is not the case with Inner Join.
  
  What am I going to do next would be:
  
  Prepare 2 flat files with IDs similar
  
  Use a sort before Join
  
  Use Inner Join
  
  Check the Output preview
  
  If (4) fails (highly unlikeable), go to settings and change the preview count to max
  
  Would you suggest any other steps than the ones stated above? I can keep you posted once I test this.
  
  Regards,
  Darsh
- darshthakkar
  Valued Contributor
  4 years ago
  
  koryknick:
  
  By default, SnapLogic pulls only the first 50 records for preview into each snap. If the two sources you are reading from do not have any records that align to meet your join criteria with an inner join, then nothing will be output.
  
  If you don’t mind, can I ask you something?
  Why would I not be able to see anything under input schema for the mapper attached to the Inner Join? Is the data not matching for the first 50 records the reason?
koryknick
Employee
4 years ago
That is correct - if the Mapper has no input, it can’t determine the schema.
ptaylor
Employee
4 years ago
If the numbers in the one input view start at 1000 and the numbers in the other input view are all just three digits, I would expect no output from the Join.
- ptaylor
  Employee
  4 years ago
  What happens if you change the Preview Document Count to 2000?
  - darshthakkar
    Valued Contributor
    4 years ago
    
    ptaylor:
    
    What happens if you change the Preview Document Count to 2000?
    
    No change as I’m dealing with a lot of data and not all would suffice the join condition (as mentioned before)
darshthakkar
Valued Contributor
4 years ago
Seems like using mapper after Inner Join is practically impossible. Is there a workaround?
darshthakkar
Valued Contributor
4 years ago
I haven’t gotten a chance to test this out but it’s on my to-do list so should be completed before the end of this week and I can keep you posted @koryknick on how it goes.

Thank you for your help on this one.

Regards,
Darsh
- darshthakkar
  Valued Contributor
  4 years ago
  @koryknick: Thank you for your patience.
  I was able to test with 2 dummy files and could literally see the output preview on the Inner Join. Those dummy files had a max of 27 rows (which is very less in comparison to the data I deal with everyday)
  
  However, with the kind of data I’m dealing with my use case, its in hundreds of thousands of records for which a large chunk of data wouldn’t suffice the condition of Inner Join → In such cases even if I select the “Preview Document Count” as 2000, the output of Inner Join wouldn’t be there.
  
  What would you suggest me here? What else can I use instead of an Inner Join?
darshthakkar
Valued Contributor
4 years ago
Output previews are very helpful while developing the pipeline as we can literally see the ouput of what we are developing, the workaround for this would be exporting the results in a flat file and analyzing the flat file thereafter, isn’t this time consuming and prone to errors (as there would be human intervention)?

If you were thinking of any other ways, I would request you to please share.
darshthakkar
Valued Contributor
4 years ago
@koryknick: What if the inputs that go into an Inner Join have been sorted, would the ouput preview be randomized or would it be based on the sorted input?

I sorted both the inputs and I was expecting some values in preview but I didn’t. If output preview is purely randomized then we can disregard this scenario. Some snaps below for your reference:

Pipeline showing no output preview on Inner Join:

Snowflake Sort snap exported results for references:

Flat file Sort snap exported results for references:

Preview count is 50, if it would have been 1, I could understand nothing coming on an output preview:

Screenshot of Join settings:

Without the output preview, its challenging to go ahead with other transformations using mapper, sort, etc. What would you suggest using instead of this?

CC: @ptaylor
- ptaylor
  Employee
  4 years ago
  Look at the preview data for the Join’s two inputs. Can you find a pair of documents from each view that should be joined?
  
  If not, try increasing the preview count to 2000.
  - darshthakkar
    Valued Contributor
    4 years ago
    Thank you @ptaylor for your quick response.
    This opens up a new discussion as Sort snap’s output preview is not aligned with the exported results. I have tried validating it with Shift + Validate but haven’t found any luck. Some screenshots might help you understand this in detail:
    
    Output preview of snowflake execute:
    
    As the Name field is string in snowflake, used the following in the mapper:
    
    Sorting it in ascending order so that the order matches with the flat file:
    
    Output preview of Snowflake Sort:
    
    This is amusing as if you check the exported flat file, the results will come in ascending order (snaps shared in my previous comment)
darshthakkar
Valued Contributor
4 years ago
@ptaylor: What would you suggest to use instead of an Inner join then? I would need the output as there is a lot of transformations that I would need to do after the output of Inner join.

I believe, this might have been the case with Distinct results while saving/validating pipeline vs executing pipeline - #10 by ptaylor too but getting Null values was very surprising. Once I get a chance to test that, I will keep you posted on that thread itself. For now, let’s just stay focused on the workaround for Inner Join (it’s very easy for me to get distracted when the 2 phenomenon are somewhat similar)
- ptaylor
  Employee
  4 years ago
  You need to develop the pipeline using a smaller set of data so that each stage will have output.
  - darshthakkar
    Valued Contributor
    4 years ago
    That is a challenge as I cannot meddle with the source system which has huge amount of data (millions of rows) and with all the joins in place, we are reducing it to a couple of thousands.
    
    Would you suggest me to fetch smaller sets of data and then merge it at the end while designing a pipeline?

Forum Discussion

Output preview not available for Inner Join

26 Replies

Recent Discussions

Way to lock down in Prod org to "Monitor" only access?

trace API and proxy calls

Pagination Logic Fails After Migrating from REST GET to HTTP Client Snap

Pipeline Execute Pool size

Concat values of a field based on value of another field