Adobe: Building a Metrics and Monitoring System

What were the underlying reasons or business implications for the need to automate business processes?

With the growing business, Business teams wanted to effectively monitor the process, alert the failures and stay ahead of bottlenecks. Service adhered by SLA’s needs to be met to avoid the revenue impact or any financial losses. An effective monitoring to be in-place to reduce the failure risk and for business continuity.

Describe your automation use case(s) and business process(es).

Metrics and Monitoring implemented helped the teams understand the state of the infrastructure and services. A standard solution was implemented to effectively collate the metrics from the Platform. On the service level, we’re able to collect the data at minute level to understand how the data trend changed over a period.

Describe how you implemented your automations and processes.

A Service Exporter is implemented, which is a Pull Metric Exporter which reports metrics by responding to the scraper requests. The Exporter service is implemented in python to trigger the services implemented and push the data to Prometheus. The data in Prometheus is used to create alerts and dashboards.

Technologies that helped us to build a M&M system

SnapLogic for middleware
Python to build Custom App
Kubernetes to deploy the App
Prometheus and Grafana Dashboards to view data
Alert manager
Self-Servicing to configure the threshold for the alert.

Over 50+metrics collected to monitor the platform and services onboarded. Few metrics collected that help business monitor and alert on the progress of service -

Delayed Executions
Missed Schedules
Network issues
Track failures to a change in asset
Spikes in resources over a period
Continuous failure of a service

What were the business results after executing the strategy?

The Framework implemented enables us to identify the infrastructure related issues during deployments.
Helped us for smoother releases by enabling us to verify for failures before and after the Snaplogic Cloud/Data Releases.
Isolate the services contributed to any resource spikes.
Business teams were alerted for the failure of the service and more insight also provided which include if asset level changes contributed for the issue or downstream systems not available.
Identify and alerting on the issues enabled business to debug issue before it grows in proportion.

Who was and how were they involved in building out the solution?

Adobe Team

Anything else you would like to add?

Anomaly detection across the platform and services onboarded will be a better solution if provided in-house.

Forum Discussion

Adobe: Building a Metrics and Monitoring System

What were the underlying reasons or business implications for the need to automate business processes?

Describe your automation use case(s) and business process(es).

Describe how you implemented your automations and processes.

What were the business results after executing the strategy?

Who was and how were they involved in building out the solution?

Anything else you would like to add?

Recent Discussions

Q and A with Kalyan Venkat: Timely Metrics from Varied Applications While Meeting Audit Requirements

Hampshire Trust Bank: Building a More Agile Financial Services Organization

Skidmore College: Rebuilding Identity Management System with Oracle HCM Cloud

Abano Healthcare New Zealand: Impressive Cost Savings by Streamlining IT Modernisation

NAIT: Ensuring Data Quality and Ease of Use for Career Matching System