Apache log4j (cve-2021-44228) mitigation

A zero-day exploit affecting Apache Log4j (CVE-2021-44228) was made public on December 9, 2021 that could result in remote code execution.

Our Python-based SnapLogic Control Plane is not impacted by this vulnerability. As we investigate the potential impact on our Java-based Snaplexes, we are already actively working on a patch and our support team will provide an update as soon as soon as this is available. This patch is recommended for all customer-managed Groundplexes. We will also apply the patch for all SnapLogic-managed Cloudplexes as soon as the patch is available.

In the meantime, customers with Groundplexes can mitigate this vulnerability by setting the system property “log4j2.formatMsgNoLookups” to “true”.

Please do not hesitate to contact support if you require assistance with this vulnerability.

2 Likes

A Snaplex patch with the fix for this issue has been published. https://docs-snaplogic.atlassian.net/wiki/spaces/SD/pages/137429020/Updating+a+Snaplex+War+File has instructions on doing the Snaplex version update.

4.27 Patch 1: main-10894 is the new version.

1 Like

For customers who cannot upgrade to the 4.27 patch version currently, the log4j system property can be used to prevent the vulnerability. The way to do this would be to add a Snaplex property with key as jcc.jvm_options and value as -Dlog4j2.formatMsgNoLookups=true. See Snaplex update docs for details on updating properties.

On saving the Snaplex properties in the Manager, if the nodes are running with slpropz, there will be a UI prompt asking for node restarts with the config change. Accepting that prompt will initiate a rolling restart of the Snaplex nodes with the change. No manual restarts are required.

If the node restart UI prompt does not show up, that means the Snaplex has no nodes or the nodes are running with global.properties file (the older way of Snaplex configuration, without using slpropz files). To update such nodes with the new property, the /opt/snaplogic/etc/global.properties file will have to be manually updated. A line like

jcc.jvm_options = -Dlog4j2.formatMsgNoLookups=true

has to be added to the file. If a jcc.jvm_options entry already exists, then the new property can be appended with a space as the delimiter, like

jcc.jvm_options = -Dmykey=myvalue -Dlog4j2.formatMsgNoLookups=true

The JCC would have to be manually restarted. Each node in the Snaplex can be placed into maintenance mode one at a time and then restarted to load the property update.

Note that for nodes using slpropz config, manual restarts are not required. The property updates and rolling restarts are automatic.

The auto-upgrade mechanism will update the JCC process only, the Monitor process does not get upgraded automatically. The Monitor is a lightweight process that tracks the health of the JCC process. It does not bind to any ports and so does not accept external requests. The risk of running with the older monitor version is low.

On regular Groundplexes, upgrading the Monitor process to use the new version requires the service to be restarted on the node. Put each node into maintenance and then use jcc.sh restart to restart the service on Linux. On Windows, if running as an application, do jcc.bat restart. If running as a Windows service, do jcc.bat update_service.

For Groundplexes running on Kubernetes or standalone Docker, updating the container image version to main-10894 will update both the Monitor process and the JCC process.

On Cloudplexes, the Snaplex version update will automatically upgrade both the JCC and the Monitor process.

A second patch 4.27 Patch 2: main-10904 has been published. This addresses CVE-2021-45046, the log4j2 version has been updated to 2.16.0.

Customers can upgrade Cloudplexes and Groundplexes by changing the version in the Snaplex properties to the new version. main-10904 is the Docker Hub tag to use for the new Docker image.

An issue was found in the second patch and it is currently under investigation.
Please stick with the first patch until you receive a new update.

1 Like

Any updates on this or the newest log4j vulnerability (addressed in 2.17.0)?

I expect an update later this morning (Pacific time).

UPDATE: Our QA team is doing the final testing on the next patch.

2 Likes

4.27 Patch 3: main-10919 has been published. This patch updates the log4j version to 2.17.0 and an issue that caused PipeExec pipeline instances to hang with Patch 2 has been addressed.

1 Like

I have updated our Snaplexes to the 4.27 Patch 3 main-10919.

Our security infrastructure team has given me the feedback that the scanning tool they use, Nessus, is still showing an installed version of C:opt\snaplogic\run\lib\monitor\log4j-core-2.11.2.jar, and that’s what I see in the folder that the path leads to.

Does this mean that our updating of the Snaplex didn’t work, or is there something else that needs to be done?

Check the comment above about the Monitor process update.

As part of the automatic version upgrade, C:\opt\snaplogic\run\lib\jcc used by the JCC process will be updated to run with the new log4j version. The monitor process does not get updated by the auto-update mechanism.

To update the Monitor process also to use the new log4j version, the service will have to be restarted on the node. Use jcc.bat restart for updating Windows application or jcc.bat update_service for Windows service update.

Thank you, the “jcc.bat update_service” took care of it for me, don’t know how I missed that one :slight_smile:

@akidave - Will a “snaplex restart” from the GUI be as effective as jcc.bat update_service? Or must we execute jcc.bat from a pipeline on every one of those nodes?

We have dozens of customers who are running our pipelines on snaplexes that they host On Prem. There is no way we can ask someone on site to run jcc.bat on each node. Most of them are “naive-level users” who would be intimidated by the very suggestion of doing something at the command line. Even getting every practice to do a simple reboot of their server would take a lot of work, resulting in pushback from our customers and a certain number of them who might simply ignore the message.

I can confirm that the GUI “restart node” does not fix the issue (node restart looks like does not run the jcc.bat update_service script). I performed the “restart node…” on my test server and all the multiple log4j 2.14 files still remained in the “/opt/snaplogic/run/lib/monitor” folder:

Once I manually rebooted the server itself (hard reboot) the log4j files from the monitor folder were replaced with 2.17 versions:

My team uses GroundPlex servers on AWS EC2 Linux instances.
As always just make sure servers are in maintenance mode and no jobs running before doing the hard reboot.

1 Like

@lzapart - thanks for the feedback! (As unhappy as the contents might be…)

Looks like we’ll need to work up a way to run jcc.bat from a pipeilne script.

There’s no way we can get 100 servers (that don’t even belong to us) rebooted without a ton of messaging going back and forth… and a ton of unhappy customers… and some that still wouldn’t have got the message… and even some who might say “okay, it’s rebooted” having not done anything.

Sorry for the delayed response.

Doing node/Snaplex Restart from the dashboard UI or using the public API will restart only the JCC process. The Monitor runs in the background and checks the health of the JCC and restarts the JCC when required. So the Monitor does not restart as part of the UI/API driven upgrade/restart operation.

To restart the Monitor process, the command line approach has to be used. As mentioned previously, the Monitor does not accept external requests, so there is no way to trigger the log4j exploit on the Monitor. It is recommended to restart the Monitor but this can be done when convenient. Updating the JCC is a higher priority and that can be done using the UI or the public API.

1 Like

Hi @akidave -

We replaced all snaplexes with 4.27 Patch 1 as soon as it came out, then Patch 3 upon its availability as well, so we were pretty well covered there.

We’ve worked up a simple pipeline to run jcc.bat update_service remotely on all our snaplex nodes. Since (as reported) the additional exposure is minimal, we’ve elected to wait until all hands are back on deck following the holidays, so that we can adequately QA our solution, and then have staff standing by to call/help individual practices if anyone’s node fails to come back online.

Thanks for confirming we have to do this, though!

For future situations like this, I would highly recommend implementing some kind of “process supervisor” or a component that follows something like the Sidecar Pattern. i.e. an auxilliary application that the main Snaplex service knows how to run, whose sole purpose is to cycle the main service off, let it update (or do the update), then cycle it back on, and message the backplane if anything fails.

This would provide SnapLogic administrators with the ability to do what we’re talking about, remotely.

regards,
– johnb aka forbin