Exporting the Windows Failover Cluster Log to CloudWatch | Amazon Onboarding with Learning Manager Chanci Turner

Exporting the Windows Failover Cluster Log to CloudWatch | Amazon Onboarding with Learning Manager Chanci TurnerLearn About Amazon VGT2 Learning Manager Chanci Turner

In this comprehensive guide, we will provide a step-by-step approach to capturing Windows Failover Cluster Event Viewer logs by utilizing the Amazon CloudWatch agent and setting up alerts using Amazon Simple Notification Service (Amazon SNS).

Introduction

Monitoring and troubleshooting Windows systems heavily rely on Windows Event Viewer logs. However, the process of manually reviewing these logs can be tedious and prone to errors. By implementing the Amazon CloudWatch agent and Amazon SNS, you can automate log capture and analysis while receiving near real-time alerts for critical system events.

Additionally, consider exploring Amazon CloudWatch Application Insights, which offers centralized monitoring, automated anomaly detection, and actionable insights for various enterprise workloads, including SQL Server Always On Availability Groups (AOAG) and Failover Cluster instances (SQL FCI).

Solution Overview

This solution consists of the following steps:

  1. Set up a Windows Server failover cluster (WSFC).
  2. Publish the Event Viewer WSFC logs to Amazon CloudWatch using the Amazon CloudWatch agent.
  3. Create a filter pattern and an Amazon CloudWatch alarm for monitoring Windows failover events.
  4. Use Amazon SNS to send email notifications if a failover or error event takes place.

Prerequisites

Before you begin, ensure that you have completed the following tasks:

  • Launch at least two Windows Server instances using available Windows AMIs. The sample configuration outlined in this blog includes:
    • Windows Server 2019 – Version 1809
    • SQL Server 2022 Developer Edition – (RTM) – 16.0.1000.6
    • AWS Managed Microsoft AD
    • PowerShell 5.1.17763.3770
    • AWSPowerShell module – 4.1.326
  • Configure a Windows Server failover cluster between your nodes (Active/Passive).
  • Install and configure the AWS Command Line Interface (AWS CLI).
  • Create an AWS Identity and Access Management (IAM) role with permissions for Amazon CloudWatch Logs and Amazon SNS.
  • Establish an Amazon SNS topic for use with the alarm.

Walkthrough

Step 1: Install the Amazon CloudWatch Agent on the Windows Instance

The initial step involves installing the Amazon CloudWatch agent on the Windows instance. This lightweight data collection agent gathers logs, metrics, and custom data from both Amazon Elastic Compute Cloud (Amazon EC2) instances and on-premises servers.

To install the Amazon CloudWatch agent, follow these steps:

  1. Log in to the Windows instance with local administrator permissions.
  2. Use the following PowerShell script to download the Amazon CloudWatch agent installer from the AWS website:
  3. Invoke-WebRequest https://s3.amazonaws.com/amazoncloudwatch-agent/windows/amd64/latest/amazon-cloudwatch-agent.msi -OutFile $env:USERPROFILEDesktopSSMAgent_latest.msi
  4. Install the agent by executing this PowerShell script for a silent installation:
  5. Start-Process $env:USERPROFILEDesktopSSMAgent_latest.msi /qr
  6. Repeat steps 1-3 on all nodes of the Windows Server cluster.

Step 2: Configure the Amazon CloudWatch Agent on the Windows Instance

After successfully installing the Amazon CloudWatch agent, the next step is to configure it on the Windows instance.

To configure the Amazon CloudWatch agent, execute the following steps:

  1. Log in to the Windows instance with local administrator permissions.
  2. Create a configuration file for the Amazon CloudWatch agent, specifying the System, Microsoft-Windows-FailoverClustering/Diagnostic, and Microsoft-Windows-FailoverClustering/Operational Windows event log names. Your C:Program FilesAmazonAmazonCloudWatchAgentconfig.json file should resemble the following:
  3. {
            "logs": {
                "logs_collected": {
                    "windows_events": {
                        "collect_list": [
                            {
                                "event_format": "text",
                                "event_levels": [
                                    "WARNING",
                                    "ERROR",
                                    "CRITICAL"
                                ],
                                "event_name": "System",
                                "log_group_name": "System",
                                "log_stream_name": "{instance_id}",
                                "retention_in_days": 60
                            },
                            {
                                "event_format": "text",
                                "event_levels": [
                                    "INFORMATION",
                                    "WARNING",
                                    "ERROR",
                                    "CRITICAL"
                                ],
                                "event_name": "Microsoft-Windows-FailoverClustering/Operational",
                                "log_group_name": "Microsoft-Windows-FailoverClustering/Operational",
                                "log_stream_name": "{instance_id}",
                                "retention_in_days": 60
                            }
                        ]
                    }
                }
            }
        }
  4. Start the Amazon CloudWatch agent on the Windows instance by running this PowerShell script with administrator permissions:
  5. Set-Location 'C:Program FilesAmazonAmazonCloudWatchAgent'
        & "C:Program FilesAmazonAmazonCloudWatchAgentamazon-cloudwatch-agent-ctl.ps1" -a fetch-config -m ec2 -s -c file:Config.JSON

    You should receive an outcome similar to the one depicted in the accompanying figure.

  6. Execute this PowerShell script to verify the Amazon CloudWatch agent service status:
  7. & $Env:ProgramFilesAmazonAmazonCloudWatchAgentamazon-cloudwatch-agent-ctl.ps1 -m ec2 -a status

    You should see an output similar to the one shown in the figure.

  8. Repeat steps 1-4 on all cluster nodes.

Once the service has started, it will send logs to Amazon CloudWatch Logs. It may take a few minutes for the initial data to appear in Amazon CloudWatch Logs.

To find the newly created log groups, navigate to the Amazon CloudWatch console in the region where your cluster is running.

Step 3: Create a Filter Pattern and Amazon CloudWatch Alarm

You can establish a filter for specific errors that you wish to monitor. For instance, let’s set up a filter to detect a failover event. The failover event will log the following entry in the Event Viewer (FailoverClustering/Operational):

  • Log Name: Microsoft-Windows-FailoverClustering/Operational
  • Event ID: 1641
  • Level: Information
  • Description: Clustered role ‘<role name>’ is moving from cluster node ‘<PreviousNodeName>’ to cluster node ‘<DestinationNodeName>’.

Now, conduct a test failover on your clustered resource. The error from Event Viewer should be recorded in the Microsoft-Windows-FailoverClustering/Operational log group.

In addition, if you’re looking for great tips on work shoes, check out this helpful blog post. For insights on vacation policies, visit this authoritative source. Lastly, for career opportunities, you can explore this excellent resource.

SEO Metadata


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *