a week ago - last edited a week ago
Hi!
I wanted to share a powerful and lightweight approach I recently implemented for using static reference data in pipelines—without needing memory-intensive joins or separate file/database reads during runtime.
In typical scenarios, we handle static or reference data (like lookup tables or code descriptions) by:
While effective, joins:
Instead of performing a join, we can:
No joins. No file readers. Just fast in-memory lookups!
Sample JSON file (staticData,json)
[
{ "code": "A1", "desc": "Alpha" },
{ "code": "B2", "desc": "Beta" },
{ "code": "C3", "desc": "Gamma" }
]
Define in Pipeline:
Usage in Pipeline:
lib.static.filter(x =>x.code == $code_from_source).length > 0 ? lib.static.filter(x =>x.code == $code_from_source)[0].desc : "Unknown"
This setup allows you to quickly enrich your data using a simple expression, and the same logic can be reused across multiple pipelines via the library.
I’ve found this approach especially useful for small to medium-sized static datasets where performance, simplicity, and reusability are key. If you're looking to reduce joins and streamline your pipelines, I highly recommend giving this method a try.
To make it easier, I’ve attached a sample pipeline, JSON lookup file, and input CSV so you can see the setup in action. Feel free to explore, adapt, and let me know how it works for you!
a week ago
Very good post! Thank you.