05-08-2025 12:49 PM - edited 05-08-2025 12:50 PM
Hi!
I wanted to share a powerful and lightweight approach I recently implemented for using static reference data in pipelines—without needing memory-intensive joins or separate file/database reads during runtime.
In typical scenarios, we handle static or reference data (like lookup tables or code descriptions) by:
While effective, joins:
Instead of performing a join, we can:
No joins. No file readers. Just fast in-memory lookups!
Sample JSON file (staticData,json)
[
{ "code": "A1", "desc": "Alpha" },
{ "code": "B2", "desc": "Beta" },
{ "code": "C3", "desc": "Gamma" }
]
Define in Pipeline:
Usage in Pipeline:
lib.static.filter(x =>x.code == $code_from_source).length > 0 ? lib.static.filter(x =>x.code == $code_from_source)[0].desc : "Unknown"
This setup allows you to quickly enrich your data using a simple expression, and the same logic can be reused across multiple pipelines via the library.
I’ve found this approach especially useful for small to medium-sized static datasets where performance, simplicity, and reusability are key. If you're looking to reduce joins and streamline your pipelines, I highly recommend giving this method a try.
To make it easier, I’ve attached a sample pipeline, JSON lookup file, and input CSV so you can see the setup in action. Feel free to explore, adapt, and let me know how it works for you!
05-09-2025 07:27 AM
Very good post! Thank you.