Forum Discussion

darshthakkar's avatar
darshthakkar
Valued Contributor
4 years ago

Output preview not available for Inner Join

Hi Team,

I’m using an Inner Join and observing that the output preview doesn’t appear. Is this an expected behaviour?
I however, have access to the data I’m expecting out of Inner join (as I’m writing to an excel file) but using a mapper after Inner Join doesn’t help as there is no data in the input schema of the mapper. How do I filter the data that I don’t need out of the Inner Join when the preview is not available and Input schema is showing nothing?

I also have access to the other filtered data as I have enabled the error view in Inner Join and that does have an output preview to it.

Appreciate your help and time on this.

Best Regards,
Darsh

26 Replies

  • @darshthakkar - as @ptaylor mentioned, when you are working with Pipeline Preview data and need to see data movement through the entire pipeline, you need to reduce your source data sets to a limited enough set that any snaps that can potentially filter data (i.e. Join, Filter, etc.) will still have data to pass. In your case, I would suggest putting a temporary filtering condition (i.e. Where Clause) on your database Select snap so that your data will match after the Join snap. Once you have verified that the pipeline logic is correct, simply remove the filter from your select and execute the entire pipeline and validate the complete results.

    • darshthakkar's avatar
      darshthakkar
      Valued Contributor

      Thank you @koryknick for your suggestions. I will follow this practice going forward so that I’m not stuck with my development.

      I’ll be closing this thread soon listing the best practices and solution provided. As both @ptaylor and you were really instrumental in providing the suggestions, I wouldn’t be able to mark a solution for either of your comments but will be giving credits on my closing comment coming soon (may be end of the day today)

      Thank you again both of you for your help on this one.

      Regards,
      Darsh

  • @darshthakkar - I think you may be running into a common misunderstanding with the Data Preview functionality. By default, SnapLogic pulls only the first 50 records for preview into each snap. If the two sources you are reading from do not have any records that align to meet your join criteria with an inner join, then nothing will be output.

    For example, if SourceA contains records such as CustomerId from 1 to 50, and SourceB contains records with CustomerId from 2000 - 2050, then the inner join doesn’t match anything and all records are effectively filtered.

    You can try updating your User Settings / Preview Document Count as high as 2000 to see if that helps:

    Or you can set filter conditions on your source data to ensure alignment of data that will match in your Join - in the case of the CustomId values I gave, you could set a filter condition in your database select to pull only records with CustomerId values of 1-50.

    To prove if my theory is correct, you can switch your Join to a Left Outer join type and see that documents will flow through to the output view of the Join.

    • darshthakkar's avatar
      darshthakkar
      Valued Contributor

      Thank you @koryknick for responding to this.
      I will try with 2 sample flat files and test the functionality of Inner Join. While I was reading your explanation, I felt it was right, Left Outer Join always gives me an output however that is not the case with Inner Join.

      What am I going to do next would be:

      1. Prepare 2 flat files with IDs similar
      2. Use a sort before Join
      3. Use Inner Join
      4. Check the Output preview
      5. If (4) fails (highly unlikeable), go to settings and change the preview count to max

      Would you suggest any other steps than the ones stated above? I can keep you posted once I test this.

      Regards,
      Darsh

    • darshthakkar's avatar
      darshthakkar
      Valued Contributor

      If you don’t mind, can I ask you something?
      Why would I not be able to see anything under input schema for the mapper attached to the Inner Join? Is the data not matching for the first 50 records the reason?

  • That is correct - if the Mapper has no input, it can’t determine the schema.

  • If the numbers in the one input view start at 1000 and the numbers in the other input view are all just three digits, I would expect no output from the Join.

    • ptaylor's avatar
      ptaylor
      Employee

      What happens if you change the Preview Document Count to 2000?

      • darshthakkar's avatar
        darshthakkar
        Valued Contributor

        No change as I’m dealing with a lot of data and not all would suffice the join condition (as mentioned before)

  • darshthakkar's avatar
    darshthakkar
    Valued Contributor

    Seems like using mapper after Inner Join is practically impossible. Is there a workaround?

  • darshthakkar's avatar
    darshthakkar
    Valued Contributor

    I haven’t gotten a chance to test this out but it’s on my to-do list so should be completed before the end of this week and I can keep you posted @koryknick on how it goes.

    Thank you for your help on this one.

    Regards,
    Darsh

    • darshthakkar's avatar
      darshthakkar
      Valued Contributor

      @koryknick: Thank you for your patience.
      I was able to test with 2 dummy files and could literally see the output preview on the Inner Join. Those dummy files had a max of 27 rows (which is very less in comparison to the data I deal with everyday)

      However, with the kind of data I’m dealing with my use case, its in hundreds of thousands of records for which a large chunk of data wouldn’t suffice the condition of Inner Join → In such cases even if I select the “Preview Document Count” as 2000, the output of Inner Join wouldn’t be there.

      What would you suggest me here? What else can I use instead of an Inner Join?

  • darshthakkar's avatar
    darshthakkar
    Valued Contributor

    Output previews are very helpful while developing the pipeline as we can literally see the ouput of what we are developing, the workaround for this would be exporting the results in a flat file and analyzing the flat file thereafter, isn’t this time consuming and prone to errors (as there would be human intervention)?

    If you were thinking of any other ways, I would request you to please share.

  • darshthakkar's avatar
    darshthakkar
    Valued Contributor

    @koryknick: What if the inputs that go into an Inner Join have been sorted, would the ouput preview be randomized or would it be based on the sorted input?

    I sorted both the inputs and I was expecting some values in preview but I didn’t. If output preview is purely randomized then we can disregard this scenario. Some snaps below for your reference:

    Pipeline showing no output preview on Inner Join:

    Snowflake Sort snap exported results for references:

    Flat file Sort snap exported results for references:

    Preview count is 50, if it would have been 1, I could understand nothing coming on an output preview:

    Screenshot of Join settings:

    Without the output preview, its challenging to go ahead with other transformations using mapper, sort, etc. What would you suggest using instead of this?

    CC: @ptaylor

    • ptaylor's avatar
      ptaylor
      Employee

      Look at the preview data for the Join’s two inputs. Can you find a pair of documents from each view that should be joined?

      If not, try increasing the preview count to 2000.

      • darshthakkar's avatar
        darshthakkar
        Valued Contributor

        Thank you @ptaylor for your quick response.
        This opens up a new discussion as Sort snap’s output preview is not aligned with the exported results. I have tried validating it with Shift + Validate but haven’t found any luck. Some screenshots might help you understand this in detail:

        Output preview of snowflake execute:

        As the Name field is string in snowflake, used the following in the mapper:

        Sorting it in ascending order so that the order matches with the flat file:

        Output preview of Snowflake Sort:

        This is amusing as if you check the exported flat file, the results will come in ascending order (snaps shared in my previous comment)

  • darshthakkar's avatar
    darshthakkar
    Valued Contributor

    @ptaylor: What would you suggest to use instead of an Inner join then? I would need the output as there is a lot of transformations that I would need to do after the output of Inner join.

    I believe, this might have been the case with Distinct results while saving/validating pipeline vs executing pipeline - #10 by ptaylor too but getting Null values was very surprising. Once I get a chance to test that, I will keep you posted on that thread itself. For now, let’s just stay focused on the workaround for Inner Join (it’s very easy for me to get distracted when the 2 phenomenon are somewhat similar)

    • ptaylor's avatar
      ptaylor
      Employee

      You need to develop the pipeline using a smaller set of data so that each stage will have output.

      • darshthakkar's avatar
        darshthakkar
        Valued Contributor

        That is a challenge as I cannot meddle with the source system which has huge amount of data (millions of rows) and with all the joins in place, we are reducing it to a couple of thousands.

        Would you suggest me to fetch smaller sets of data and then merge it at the end while designing a pipeline?