What happens during the Snaplex Upgrade

When the Snaplex is upgraded:
a) JCCs will go through a rolling restart. As one JCC is being restarted, the others will continue processing the pipelines on the older version. The next JCCs will wait for max 15 minutes (configurable value at the snaplex level) for the running pipelines to finish, before they get restarted.
b) FeedMasters (FM) will go through a rolling restart. One FM will be up and running as the other goes into restart mode. The running FM will queue up the messages in the inbound queue up to 10GB size. When FMs restart they will persist the messages from the queue to the disk. These will be retrieved back into the queue from disk. Once JCCs are up, FM starts sending the messages from the queue to be processed.

1 Like

Thanks for this post @vdandu. I never considered the behavior during the rolling restart period from node 1 to node 4.

Would it be advisable to put nodes in maintenance mode during the restart so that there is no confusion as to which version is being utilized during this restart period?

Nodes should not be placed in maintenance mode for a rolling restart. During a upgrade, if the version change is across a platform release (like upgrading from Feb 2018 to May 2018 version), the nodes will continue to run pipelines based on their current version. One node at a time will go into upgrade mode, stop accepting new execution requests, wait for currently running pipelines to finish, and then upgrade.

The nodes which are yet to upgrade will continue to run pipelines, with the snap pack version corresponding to what they were using before. The node which upgrades and restarts will start running with the new version of the snap packs.

If using a local load balancer, for feedmaster ultra requests or Groundplex triggered pipeline requests, it is required that health checking is enabled on the load balancer. Without this, the load balancer will not detect that the upgrade is in progress and that can cause failures. See https://docs-snaplogic.atlassian.net/wiki/spaces/SD/pages/1439325/Snaplex+Health

@akidave, Thank you for your response; it makes sense.

My thoughts were around the (unlikely) “what-if” scenario of a defect being introduced with the upgrade (or even if not, but…). If a pipeline fails or generates errors around the same time of the rolling restart, we would need to research using the Dashboard which nodes have been restarted and which are pending. It’s not a lot of sweat to research, so pro/con arguments can be made either way. I’m just chasing the least risk and effort in the case of the “what-if”.

We can leave it as you answered; this is just food for thought…