Snaplex Node Freezing with High Memory

One of our SL Nodes freezes on high memory and won’t start any new pipelines or accept any webhooks. The only way we’ve found to fix this is a manual reboot.

Memory will be going between 70-85%, and CPU between 10-60%… everything will be working fine… then suddenly the memory % stops changing, and CPU falls to 2-3%. It stays in this stage until we manually reboot. We aren’t receiving any alerts for this either.

Is anyone having the same problem? Any ideas on how to fix this issue?

It would be great if SL would catch this issue and automatically restart the node.

I’d recommend that you contact Support.

Hi,

This is a known issue, a memory leak, which we have been experiencing for the last couple of months and Snaplogic support is investigating. I’d reiterate the suggestion of raising a support ticket, as any additional information from other orgs will help with the diagnosis.

We are currently manually restarting groundplex nodes approx. every 2 weeks, and have considered automating the restarts, however we consider this a workaround and are looking for a fix to the root cause.

We’ve also scripted a notification when groundplex nodes pass a certain memory threshold for a sustained period, as these notifications weren’t available OOTB.

Cheers,
C.J.

2 Likes

Thanks C.J.

We have submitted a support ticket and are currently working with SL support. We’re now waiting for the next memory leak so we can get some better info to them.

You mentioned you have scripted a notification? Would you mind explaining how you did this? The “…for a sustained period” seems like the important part there. I guess we could also setup an alert if CPU usage doesn’t go over 10% for more than 1 hour or something similar.

Had similar issue and the node use to restart cause it would crash (Ran out of memory). I have a ticket for this issue.

I would be glad if snaplogic is able to free up memory on its own. (Tried to do it with jython script which didnt work either)

Hi,

Our notification script is just a simple Python script running as a cron job on one of our servers - it hits the Snaplogic Public API to gather node CPU & memory usage information. You could probably implement something similar as a Snaplogic Pipeline running on the Cloudplex if you wanted to.

Cheers,
C.J.