10-25-2022 04:28 PM
With the growing business, Business teams wanted to effectively monitor the process, alert the failures and stay ahead of bottlenecks. Service adhered by SLA’s needs to be met to avoid the revenue impact or any financial losses. An effective monitoring to be in-place to reduce the failure risk and for business continuity.
Metrics and Monitoring implemented helped the teams understand the state of the infrastructure and services. A standard solution was implemented to effectively collate the metrics from the Platform. On the service level, we’re able to collect the data at minute level to understand how the data trend changed over a period.
A Service Exporter is implemented, which is a Pull Metric Exporter which reports metrics by responding to the scraper requests. The Exporter service is implemented in python to trigger the services implemented and push the data to Prometheus. The data in Prometheus is used to create alerts and dashboards.
Technologies that helped us to build a M&M system
Over 50+metrics collected to monitor the platform and services onboarded. Few metrics collected that help business monitor and alert on the progress of service -
Adobe Team
Anomaly detection across the platform and services onboarded will be a better solution if provided in-house.