Skip to main content

How to Turn CloudWatch Logs into Real-Time Alerts Using Metric Filters

Why Alarms Matter in Cloud Infrastructure 

In any modern cloud-based architecture, monitoring and alerting play a critical role in maintaining reliability, performance, and security. 

It's not enough to just have logs—you need a way to act on those logs when something goes wrong. That's where CloudWatch alarms come in. 

Imagine a situation where your application starts throwing 5xx errors, and you don't know until a customer reports it. By the time you act, you've already lost trust. 

Alarms prevent this reactive chaos by enabling proactive monitoring—you get notified the moment an issue surfaces, allowing you to respond before users even notice. 

Without proper alarms: 

  • You might miss spikes in 4xx/5xx errors. 

  • You're always proactive instead of reactive. 

  • Your team lacks visibility into critical system behavior. 

  • Diagnosing issues becomes more difficult due to a lack of early signals. 

Due to all the reasons Above, that's why I decided to implement AWS CloudWatch Alarms using Metric Filters—a cost-effective, powerful way to monitor logs and trigger alerts based on specific patterns. 

Why To Choose CloudWatch 

We choose Amazon CloudWatch because: 

  • It offers native support for metric filtering and alarms. 

  • It's scalable and works well with EC2 instances, ECS, EKS, Lambda, etc. 

  • It's cost-effective for log-based monitoring. 

Instead of pushing logs to third-party tools, we kept everything inside AWS for tighter security and easier maintenance. 

Do you also want to know How I integrate the CloudWatch Follow the Steps Below. 

Step 1: Install the CloudWatch Agent (For EC2 Logs) 

👉 Official AWS Setup Link: 

After installation: 

  • Create a custom config file to push only needed logs (to reduce noise and cost). 

👉 Full example config and Metrics Filters available on my GitHub: https://github.com/Naresh-b17/cloud-watch-alarms/tree/main 

️ Step 2: Create an IAM Role for CloudWatch Agent 

Minimum IAM Permissions: 

  • logs:CreateLogGroup 

  • logs:CreateLogStream 

  • logs:PutLogEvents 

  • cloudwatch:PutMetricData 

Attach this role to your EC2 instance running the CloudWatch Agent. 

 Step 3: Create Metric Filters for Your Log Group 

Go to: 

 Picture 

Examples: 

4xx Errors: 

[...statusCode=%4[0-9]{2}%...] 

 
5xx Errors: 

[...statusCode=%5[0-9]{2}%...] 

 
👉 Full example config and Metrics Filters available on my GitHub: https://github.com/Naresh-b17/cloud-watch-alarms/tree/main 

For Each Filter: 

  • Click Test Pattern 

  • Create Metric → Give Namespace (Example: Custom/ALB/ErrorMetrics) 

  • Metric Value: Set as 1 


Step 4: Setup SNS (Simple Notification Service) for Alerts 

SNS = Your Notification Pipeline 

If you don’t have an SNS Topic: 

Picture 

To Add Email Recipients: 

  • Open the SNS Topic 

  • Create Subscription → Protocol: Email → Enter Email IDs → Create 

  • Team members will get a confirmation email (They must click confirm!) 

In your Alarm setup, select this SNS topic under Notification section. 

Step 5: Create CloudWatch Alarms on These Metrics 

Once your Metric Filters are in place and actively generating metrics, it's time to put CloudWatch Alarms into action. This is where the magic happens—turning raw log patterns into real-time alerts.  

Example Alarm Scenarios: 

  • If 5xx errors > 5 within 5 minutesTrigger an Alert 

  • If 4xx errors > 10 within 10 minutesTrigger an Alert 

  • If Large payloads (>100KB) exceed 50 events in 1 hourTrigger an Alert 

Detailed Steps to Create Alarms: 

  1. Go to AWS Console → CloudWatch → Alarms → Create Alarm 

  1. Click Select metric → Choose “Log Metrics” 

  1. Browse and select your custom metric (created earlier using Metric Filters) 

  1. Click Select Metric 

  1. Under Conditions: 

  1. Choose the threshold type (Static/Anomaly detection) 

  1. Set threshold value (example: Greater than 5 errors) 

  1. Choose datapoint evaluation (e.g., 1 out of 1 datapoint breaching threshold) 

  1. Under Actions: 

  1. Select an existing SNS topic or create a new one (SNS setup explained next) 

  1. Add a name and description for easy identification (e.g., ALB-5XX-Error-Alarm) 

Comments

Popular posts from this blog

How to Perform Penetration Testing on IoT Devices: Tools & Techniques for Business Security

The Internet of Things (IoT) has transformed our homes and workplaces but at what cost?   With billions of connected devices, hackers have more entry points than ever. IoT penetration testing is your best defense, uncovering vulnerabilities before cybercriminals do. But where do you start? Discover the top tools, techniques, and expert strategies to safeguard your IoT ecosystem. Don’t wait for a breach, stay one step ahead.   Read on to fortify your devices now!  Why IoT Penetration Testing is Critical  IoT devices often lack robust security by design. Many run on outdated firmware, use default credentials, or have unsecured communication channels. A single vulnerable device can expose an entire network.  Real-world examples of IoT vulnerabilities:   Mirai Botnet (2016) : Exploited default credentials in IP cameras and DVRs, launching massive DDoS attacks. Stuxnet (2010): Targeted industrial IoT systems, causing physical damage to nuclear centrifu...

Comparison between Mydumper, mysqldump, xtrabackup

Backing up databases is crucial for ensuring data integrity, disaster recovery preparedness, and business continuity. In MySQL environments, several tools are available, each with its strengths and optimal use cases. Understanding the differences between these tools helps you choose the right one based on your specific needs. Use Cases for Database Backup : Disaster Recovery : In the event of data loss due to hardware failure, human error, or malicious attacks, having a backup allows you to restore your database to a previous state.  Database Migration : When moving data between servers or upgrading MySQL versions, backups ensure that data can be safely transferred or rolled back if necessary.  Testing and Development : Backups are essential for creating realistic testing environments or restoring development databases to a known state.  Compliance and Auditing : Many industries require regular backups as part of compliance regulations to ensure data retention and integri...

Infrastructure-as-Prompt: How GenAI Is Revolutionizing Cloud Automation

Forget YAML sprawl and CLI incantations. The next frontier in cloud automation isn't about writing more code; it's about telling the cloud what you need. Welcome to the era of Infrastructure-as-Prompt (IaP), where Generative AI is transforming how we provision, manage, and optimize cloud resources. The Problem: IaC's Complexity Ceiling Infrastructure-as-Code (IaC) like Terraform, CloudFormation, or ARM templates revolutionized cloud ops. But it comes with baggage: Steep Learning Curve:  Mastering domain-specific languages and cloud provider nuances takes time. Boilerplate Bloat:  Simple tasks often require verbose, repetitive code. Error-Prone:  Manual coding leads to misconfigurations, security gaps, and drift. Maintenance Overhead:  Keeping templates updated across environments and providers is tedious. The Solution: GenAI as Your Cloud Co-Pilot GenAI models (like GPT-4, Claude, Gemini, or specialized cloud models) understand n...