Forum Discussion
So ok, I have a growing collection of data files in a “data lake”. I’ve already created metadata describing where they are, what they contain, who owns them etc. It would be very compelling if I could register these files in snaplogic in the same was as I upload actual “Files” to project space. Then I could build patterns and pipelines that unlock the underlying data in my “data lake” without the end user needing to understand where the lake is, what the naming conventions/security keys/formats/encryption/compressions are etc. I’m very new here, perhaps this concept already exists… (something along the lines of a Metadata Registry / Metadata Repository)
Snaplogic supports parameterizing the input to the pipelines which can be leveraged in Patterns and Pipelines. In regards to support for Metadata repository/registry, the best way to expose the data to end users without publishing the underlying details is through Hive (which is supported by Snaplogic).