Performance Optimization: Use Static Lookups Instead of Joins
Hi!
I wanted to share a powerful and lightweight approach I recently implemented for using static reference data in pipelines—without needing memory-intensive joins or separate file/database reads during runtime.
The Challenge
In typical scenarios, we handle static or reference data (like lookup tables or code descriptions) by:
- Reading it from a file or database
- Performing a Join Snap to enrich the main data stream
While effective, joins:
- Can be memory-heavy, especially with large datasets
- Add complexity to your pipeline
- Require both sources to be aligned in structure and timing
The New Approach
Instead of performing a join, we can:
- Store static reference data as a JSON file in SnapLogic’s SLDB
- Load this JSON file in an Expression Library
- Use filter/map function in your pipeline expressions to fetch data from JSON based on a key
No joins. No file readers. Just fast in-memory lookups!
Example
Sample JSON file (staticData,json)
[
{ "code": "A1", "desc": "Alpha" },
{ "code": "B2", "desc": "Beta" },
{ "code": "C3", "desc": "Gamma" }
]
Define in Pipeline:
Usage in Pipeline:
lib.static.filter(x =>x.code == $code_from_source).length > 0 ? lib.static.filter(x =>x.code == $code_from_source)[0].desc : "Unknown"
This setup allows you to quickly enrich your data using a simple expression, and the same logic can be reused across multiple pipelines via the library.
Benefits
- Faster: No join processing overhead
- Simpler pipelines: Fewer snaps and data dependencies
- Reusable: One JSON file + one function = many pipelines
- Memory-efficient: Especially helpful when Snaplex memory is a constraint
Things to Consider
- SLDB file size limit: The JSON file stored in SLDB must be under 100MB (SnapLogic’s file size limit for SLDB uploads).
- Data updates: If your reference data changes frequently (e.g., weekly/monthly), you’ll need to build a separate job or pipeline to overwrite the SLDB file.
- Search performance: The filter() method checks each item one by one, which can be slow if your JSON has a lot of records. For faster lookups, consider converting the data into a key-value map.
- Governance: SLDB files have limited access control compared to databases. Ensure your team is aligned on ownership and update responsibility.
- Maintainability: JSON logic is hardcoded, so changes to structure or logic require modifying the expression library and possibly redeploying affected pipelines.
I’ve found this approach especially useful for small to medium-sized static datasets where performance, simplicity, and reusability are key. If you're looking to reduce joins and streamline your pipelines, I highly recommend giving this method a try.
To make it easier, I’ve attached a sample pipeline, JSON lookup file, and input CSV so you can see the setup in action. Feel free to explore, adapt, and let me know how it works for you!