Forum Discussion

Former Employee

9 years ago

CLI for BCP (SQL SVR) and dynamic file names

The scenario I needed to accomplish was to pull data from SQL server, group the data sets together based on different fields and then dynamically define the file name based on those groupings. Perf...

EricBarner-SQLSvr_BCP_CLI_Community.zip4 KB

Contributor

5 years ago

Hi bojanvelevski,

Thanks for the reply. I too tried Group by N snap but i forget as we can read the columns using index path. Now to get my output i need to do group by on some column, but now we dont have any column to do, i tried using Aggregate snap as i cant group by i got only date output but i need file name too. any advice here.

The output should be as below. Thanks

Thanks

tstack
Former Employee
8 years ago
Copy Link

PSAmmirata:

I am interested in being able to determine the peak memory used by a Snap during execution, or at least the maximum that could be used. Is this possible?

I have to ask, why do you think you need to know? Are you seeing memory-related issues right now when running pipelines? Ideally, this is not something that you should be worrying about.

Also, keep in mind that something like the Sort snap may not allocate that much memory since it is not creating documents. The memory for the documents is allocated earlier in the pipeline by Snaps like Parsers, DB Selects, and so on.

That being said, it’s a bit hard to know the peak usage for a single snap in isolation. You can probably get an idea of how much the whole pipeline is consuming by running only that pipeline on a Snaplex and observing the overall memory usage for the node that the pipeline ran on.

Note that the amount of memory used by a pipeline can vary depending on how fast it runs. For example, if a pipeline is mostly/all streaming (i.e. doesn’t use collecting snaps like Sort), it will consume more memory when the data sources are fast compared to when they are slow. In other words, faster data sources means there will be more documents in-flight, which means there will be more memory consumption.

PSAmmirata:

Sort snap - Wouldn’t be able to determine the peak memory used, but the maximum that could be used can be determined by looking at the Maximum memory %.

The Sort snap has some idea of how much memory the documents it has ingested is keeping alive in order to know when to spill to disk. It might be a good idea for the snap to surface this number so that you can tune this property appropriately.

Otherwise, yes, the snap should limit its memory usage based on the Max mem property.

PSAmmirata:

In-memory Lookup snap - The maximum that could be used can be determined by looking at the Maximum memory %, and the peak memory used would be the total memory allocated from the pipeline execution statistics.

Yes, this is somewhat similar to Sort, except it’s keeping the documents that came in on the right input view alive instead of all the ones coming in to the left input view.

PSAmmirata:

Join snap (with unsorted input streams) - would this be similar to the Sort snap?

I think Join uses a different mechanism and might use the disk more aggressively.

PSAmmirata:

Aggregate snap - ???

An Aggregate probably doesn’t keep much memory alive compared to the others.

PSAmmirata:

Also, is there any way to estimate the required memory for a Snaplex node? Or any way to estimate how much of a workload a given amount of memory can support?

There’s so much variability here, with the size of documents, the speed of the endpoints, and the design of the pipeline, that this is pretty hard to do generically. For the most part, I think you just have to try it out and check the resource graphs of the nodes in the dashboard.

Related Content

Parsing CSV file
6 months ago
arvindnsn
CSV parser cannot parse data
11 months ago
jfpelletier
Merging 2 arrays into single array
2 years ago
nickhumble
SnapLogic Metadata Read - Parsing Question
2 years ago
kindminis
CSV Parsing Inputs
4 years ago
amit_saroha