Author: Niraj Kumar Gupta, Cloud Consulting at Powerupcloud Technologies.
Contributors: Mudit Jain, Hemant Kumar R and Tiriveedi Srividhya
INTRODUCTION TO SERVICES USED
Metrics are abstract data points indicating performance of your systems. By default, several AWS services provide free metrics for resources (such as Amazon EC2 instances, Amazon EBS volumes, and Amazon RDS DB instances).
AWS CloudWatch Alarm is a powerful service provided by Amazon for monitoring and managing our AWS services. It provides us with data and actionable insights that we can use to monitor our application/websites, understand and respond to critical changes, optimize resource utilization, and get a consolidated view of the entire account. CloudWatch collects monitoring and operational information in the form of logs, metrics, and events. You can configure alarms to initiate an action when a condition is satisfied, like reaching a pre-configured threshold.
Amazon CloudWatch Dashboards is a feature of AWS CloudWatch that offers basic monitoring home pages for your AWS accounts. It provides resource status and performance views via graphs and gauges. Dashboards can monitor resources in multiple AWS regions to present a cohesive account-wide view of your accounts.
CloudWatch Composite Alarms
Composite alarms enhance existing alarm capability giving customers a way to logically combine multiple alarms. A single infrastructure event may generate multiple alarms, and the volume of alarms can overwhelm operators or mislead the triage and diagnosis process. If this happens, operators can end up dealing with alarm fatigue or waste time reviewing a large number of alarms to identify the root cause. Composite alarms give operators the ability to add logic and group alarms into a single high-level alarm, which is triggered when the underlying conditions are met. This gives operators the ability to make intelligent decisions and reduces the time to detect, diagnose, and performance issues when it happen.
What are Anomaly detection-based alarms?
Amazon CloudWatch Anomaly Detection applies machine-learning algorithms to continuously analyze system and application metrics, determine a normal baseline, and surface anomalies with minimal user intervention. You can use Anomaly Detection to isolate and troubleshoot unexpected changes in your metric behavior.
Why Composite Alarms?
- Simple Alarms monitor single metrics. Most of the alarms triggered, limited by the design, will be false positives on further triage. This adds to maintenance overhead and noise.
- Advance use cases cannot be conceptualized and achieved with simple alarms.
Why Anomaly Detection?
- Static alarms trigger based on fixed higher and/or lower limits. There is no direct way to change these limits based on the day of the month, day of the week and/or time of the day etc. For most businesses these values change massively over different times of the day and so on. Specially so, while monitoring user behavior impacted metrics, like incoming or outgoing traffic. This leaves the static alarms futile for most of the time.
- It is cheap AI based regression on the metrics.
- Request count > monitored by anomaly detection based Alarm1.
- Cache hit > monitored by anomaly detection based Alarm2.
- Alarm1 and Alarm2 > monitored by composite Alarm3.
- Alarm3 > Send Notification(s) to SNS2, which has lambda endpoint as subscription.
- Lambda Function > Sends custom notification with CloudWatch Dashboard link to the distribution lists subscribed in SNS1.
- Enable additional CloudFront Cache-Hit metrics.
This is applicable to all enterprise’s CloudFront CDN distributions.
- We will configure an Anomaly Detection alarm on request count increasing by 10%(example) of expected average.
2. We will add an Anomaly Detection alarm on CacheHitRate percentage going lower than standard deviation of 10%(example) of expected average.
3. We will create a composite alarm for the above-mentioned alarms using logical AND operation.
4. Create a CloudWatch Dashboard with all required information in one place for quick access.
5. Create a lambda function:
This will be triggered by SNS2 (SNS topic) when the composite alarm state changes to “ALARM”. This lambda function will execute to send custom notifications (EMAIL alerts) to the users via SNS1 (SNS topic)
The target arn should be the SNS1, where the user’s Email id is configured as endpoints.
In the message section type the custom message which needs to be notified to the user, here we have mentioned the CloudWatch dashboard URL.
6. Create two SNS topics:
- SNS1 – With EMAIL alerts to users [preferably to email distribution list(s)].
- SNS2 – A Lambda function subscription with code sending custom notifications via SNS1 with links to CloudWatch dashboard(s). Same lambda can be used to pick different dashboard links based on the specific composite alarm triggered, from a DynamoDB table with mapping between SNS target topic ARN to CloudWatch Dashboard link.
7. Add notification to the composite alarm to send notification on the SNS2.
Possible False Positives
- There is some new promotion activity and the newly developed pages for the promotional activity.
- Some hotfix went wrong at the time of spikes in traffic.
This is one example of implementing a simple setup of composite alarms and anomaly-based detection alarms to achieve advance security monitoring. We are submitting the case that these are very powerful tools and can be used to design a lot of advanced functionalities.