Skip to content

Conversation

@vivekashok1221
Copy link
Contributor

@vivekashok1221 vivekashok1221 commented Aug 25, 2025

Fixes issue where anomaly detection band alarm's period defaulted to 300 seconds regardless of the metric's period. AnomalyDetectionAlarmProps didn't have a period property because it was deprecated in the parent interface and removed by jsii during compilation, causing the internal MathExpression to default to 300 seconds.

I am an Amazon employee (see commit email).

Issue

Should fix #34614 (the issue was closed as duplicate but I'm not sure the issue it was linked to is actually the same issue)

Reason for this change

Although I passed in the duration of 1 day as the period for the alarm metric, the actual period for the evaluation of the anomaly detection band is overridden to 5 minutes (300 seconds).
i.e, cdk synth shows this warning:

[Warning at /TestCdkStack/TestAnomalyAlarm] Periods of metrics in 'usingMetrics' for Math expression 'ANOMALY_DETECTION_BAND(m0, 2)' have been overridden to 300 seconds. [ack: CloudWatch:Math:MetricsPeriodsOverridden]

And, on deployment, I can see on the CloudWatch alarm dashboard that the period is 5 minutes.

Example alarm:

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import { AnomalyDetectionAlarm, ComparisonOperator, Metric, Stats } from 'aws-cdk-lib/aws-cloudwatch';

export class TestCdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    new AnomalyDetectionAlarm(this, 'TestAnomalyAlarm', {
      alarmName: 'TestAnomalyDetectionAlarm',
      metric: new Metric({
        namespace: 'TestNamespace',
        metricName: 'TestMetric',
        statistic: Stats.SUM,
        period: cdk.Duration.days(1), // This will get overriden
      }),
      // period: cdk.Duration.days(1),  I can't add period here since AnomalyDetectionAlarmProps doesn't have period prop
      stdDevs: 2,
      comparisonOperator: ComparisonOperator.GREATER_THAN_UPPER_THRESHOLD,
      evaluationPeriods: 1,
    });
  }
}

This happens because AnomalyDetectionAlarm creates an internal MathExpression for the anomaly detection band, and since AnomalyDetectionAlarmProps doesn't have a period property, no period gets passed to that math expression. The math expression then defaults to 300 seconds which overrides the period I've set on the metric used within it (math expression always overrides the periods of the metrics passed into it)

I believe this is a bug in aws-cdk-lib where the metric's period isn't being respected by the Anomaly Detection Alarm.

If I try to include the period property in AnomalyDetectionAlarmProps, I get the error: Object literal may only specify known properties, and 'period' does not exist in type 'AnomalyDetectionAlarmProps' despite AnomalyDetectionAlarmProps extending CreateAlarmOptionsBase, which has the period property.

This is because the period property has been marked as deprecated so jsii removed that property during compilation, which resulted in AnomalyDetectionAlarmProps not receiving this property from the parent interface.

Possible Fixes

  1. Add a period property to AnomalyDetectionAlarmProps
  2. Have the AnomalyDetectionAlarm use the period set in the metric passed into it

This PR implements approach 2 (see discussion)

Describe any new or updated permissions being added

N/A

Description of how you validated changes

Added two new unit tests
Added an integration test

Steps I took for testing

(listing it here to catch if I missed anything/did something wrong)

  • I made a testCDK package with the example alarm I've provided above.
  • Made the change to aws-cdk-lib and ran npx lerna run build --scope=aws-cdk-lib
  • Ran ../aws/link-all.sh in testCDK directory.
  • Also had to follow the second option mentioned here since I faced the same issue.
  • Ran npx cdk synth to make sure the generated file is correct.
  • Also deployed to my aws account with npx cdk deploy and verified that the dashboard is displaying the correct duration.
  • Ran yarn test in aws-cdk/packages/aws-cdk-lib
  • Ran npx lerna run build --scope=@aws-cdk-testing/framework-integ followed by yarn integ --update-on-failed

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. p2 labels Aug 25, 2025
@aws-cdk-automation aws-cdk-automation requested a review from a team August 25, 2025 07:42
*
* @default Duration.minutes(5)
*/
readonly period?: cdk.Duration;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's a bad pattern to add a property to a child interface when the same property exists on the parent, especially when the parent property is marked as deprecated. Maybe I could rename this to anomalyDetectionPeriod or something similar.

Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This review is outdated)

Comment on lines 653 to 655
const { period, ...alarmProps } = props;
super(scope, id, {
...props,
...alarmProps,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, the period prop will get passed into Alarm constructor which led to me getting this error on deployment:

❌  TestCdkStack failed: ToolkitError: The stack named TestCdkStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Resource handler returned message: "Period should not be set if list of Metrics is set. (Service: CloudWatch, Status Code: 400, Request ID: <request id>) (SDK Attempt Count: 1)" (RequestToken: <request token>, HandlerErrorCode: GeneralServiceException) 

... which I believe is understandable since period has to be passed to the MathExpresion "metric", not the alarm itself.

@aws-cdk-automation aws-cdk-automation dismissed their stale review August 25, 2025 09:35

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@abidhasan-aws abidhasan-aws self-requested a review September 2, 2025 15:00
@aemada-aws aemada-aws removed the request for review from abidhasan-aws September 10, 2025 09:56
evaluationPeriods: 1,

// The period over which the anomaly detection band's statistics are applied (default: 5 minutes)
period: Duration.minutes(5),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be passing in period here. Period is an attribute of metric. Most (all) other CloudWatch alarms use the metric period as monitoring period.

Copy link
Contributor Author

@vivekashok1221 vivekashok1221 Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Math expressions have their own period attribute which overrides the period attributes of the metrics passed into it (which makes sense since you can pass multiple metrics with different period attributes into it).

I felt this was a way to transparently pass this period into the math expression without changing existing behaviour.

Maybe we can rename period here to anomalyDetectionPeriod.
Or I guess we could have the math expression pick up the monitoring period from the metric passed into it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that confused me when I was trying to debug why the alarm wasn't picking up the period of my metric was that AnomalyDetectionAlarmProps through CreateAlarmOptionsBase already had a period member, but it is marked as deprecated, and I can't even find it as an option in my version of cdk-lib for some reason. (I see you mentioned it here as well). If the base interface marks period as deprecated, we probably shouldn't reintroduce it.

It does seem the easier (and better) way would be to remove the period field from the MathExpression. If we do that, we also won't need this workaround.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi folks, thank you for looking into this. I have proposed a fix here: https://github.com/aws/aws-cdk/pull/35319/files#r2394391941

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that confused me when I was trying to debug why the alarm wasn't picking up the period of my metric was that AnomalyDetectionAlarmProps through CreateAlarmOptionsBase already had a period member, but it is marked as deprecated, and I can't even find it as an option in my version of cdk-lib for some reason. (I see you mentioned it here as well).

Hey @Miles123K
I believe this is the case because, as I've mentioned in my PR description, jsii removes deprecated properties during compilation

@aemada-aws aemada-aws added p1 and removed p2 labels Sep 30, 2025
...props,
...alarmProps,
comparisonOperator: props.comparisonOperator ?? ComparisonOperator.LESS_THAN_LOWER_OR_GREATER_THAN_UPPER_THRESHOLD,
metric: Metric.anomalyDetectionFor(props),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vivekashok1221 , I started working on fixing the bug before I saw this PR. The correct fix should be:

Suggested change
metric: Metric.anomalyDetectionFor(props),
metric: Metric.anomalyDetectionFor({
...props,
// AnomalyDetectionAlarmProps.period is deprecated - the guidance recommends encoding the period in the metric itself
period: metricPeriod(props.metric),
}),

This uses the period provided in the metric object, which is in line with the guidance:

* @deprecated Use `metric.with({ period: ... })` to encode the period into the Metric object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll try to implement your suggestion.

I wanted to confirm one thing: correcting the anomaly detection alarm to use the metric's period will change the alarm duration for everyone who has specified a period in the metric object passed into the alarm- it will switch from the current 5 minute default (set by math expression here) to match their metric period (This was why I added an explicit way to specify duration in AnomalyDetectionAlarm construct in the first place). Is this approach acceptable?

(I agree this approach is correct. I initially expected the alarm to pick up the metric's period anyway. However, I do have some concerns about changing existing behaviour for current users.)

Copy link
Contributor

@vishaalmehrishi vishaalmehrishi Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good callout, thanks.

I understand the risk here; that said, there are two considerations which make me comfortable with fixing the bug directly instead of adding the extra period property:

  1. While there may be customers who realised that the default was incorrectly set to 5 minutes despite their configuration (i.e. the behaviour is a bug), there may also be customers who did not realise this (i.e they still think the alarm is configured with the correct period) - fixing the period to match the metric will automatically fix the bug for them.
  2. The current (incorrect) default period of 5 minutes is not documented anywhere as intended behaviour (i.e. it is not a contract) - it is purely a result of buggy behaviour and requires users to actually read CDK code to find the MathExpression where it is set.

Also, ultimately we want the correct long-term solution in place. A user who might start with AnomalyDetection next week might be confused by the extra period property - the direct fix helps them (and all future users) to understand expected behaviour better.

Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM 👍 , I'll do as you've suggested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishaalmehrishi vishaalmehrishi self-assigned this Oct 1, 2025
@vishaalmehrishi
Copy link
Contributor

vishaalmehrishi commented Oct 6, 2025

Let's update the commit + PR title to fix(cloudwatch)... so it appears as a fix in the release notes.

@vivekashok1221
Copy link
Contributor Author

Sure, I'll do that.
By the way, does this repo have any policy against rebasing my commits locally (to squash them/amend commit message) and force pushing my branch?
Or shall I just make a new commit and request the person merging it to squash and merge?

@vishaalmehrishi
Copy link
Contributor

Sure, I'll do that. By the way, does this repo have any policy against rebasing my commits locally (to squash them/amend commit message) and force pushing my branch? Or shall I just make a new commit and request the person merging it to squash and merge?

There's no specific policy. However, you do not have to rebase locally - you can just make your commit and Mergify will take care of squashing and merging (assuming no conflicts of course).

@vivekashok1221 vivekashok1221 force-pushed the vivek/add-period-prop-anomaly branch from 9bdb0f6 to 8388ee0 Compare October 7, 2025 17:59
@vivekashok1221 vivekashok1221 changed the title feat(cloudwatch): add period property to AnomalyDetectionAlarmProps fix(cloudwatch): respect metric period in AnomalyDetectionAlarm Oct 7, 2025
vishaalmehrishi referenced this pull request Oct 8, 2025
Fixes issue where anomaly detection band alarm's period always
defaulted to 300 seconds regardless of the metric's period:
Pass the metric's period to anomalyDetectionFor() to prevent
MathExpression from using its default period.
@vivekashok1221 vivekashok1221 changed the title fix(cloudwatch): respect metric period in AnomalyDetectionAlarm fix(cloudwatch): Metric period in AnomalyDetectionAlarm is not being respected Oct 8, 2025
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This review is outdated)

@vivekashok1221 vivekashok1221 changed the title fix(cloudwatch): Metric period in AnomalyDetectionAlarm is not being respected fix(cloudwatch): metric period in AnomalyDetectionAlarm is not being respected Oct 8, 2025
@mergify
Copy link
Contributor

mergify bot commented Oct 9, 2025

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot added the queued label Oct 9, 2025
@mergify mergify bot had a problem deploying to deployment-integ-test October 9, 2025 08:24 Error
@mergify
Copy link
Contributor

mergify bot commented Oct 9, 2025

This pull request has been removed from the queue for the following reason: checks failed.

The merge conditions cannot be satisfied due to failing checks:

You may have to fix your CI before adding the pull request to the queue again.
If you update this pull request, to fix the CI, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

@mergify mergify bot removed the queued label Oct 9, 2025
@vivekashok1221
Copy link
Contributor Author

@Mergifyio requeue

@mergify
Copy link
Contributor

mergify bot commented Oct 9, 2025

requeue

❌ Command disallowed due to command restrictions in the Mergify configuration.

  • sender-permission >= write

@vivekashok1221
Copy link
Contributor Author

I think the build job is failing in CI due to an OOM error. Rerunning the job by requeuing my PR might fix it.

@mergify
Copy link
Contributor

mergify bot commented Oct 10, 2025

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot added the queued label Oct 10, 2025
@mergify
Copy link
Contributor

mergify bot commented Oct 10, 2025

This pull request has been removed from the queue for the following reason: pull request branch update failed.

The pull request can't be updated.

You should update or rebase your pull request manually. If you do, this pull request will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

@mergify mergify bot removed the queued label Oct 10, 2025
@mergify
Copy link
Contributor

mergify bot commented Oct 10, 2025

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot added the queued label Oct 10, 2025
@mergify
Copy link
Contributor

mergify bot commented Oct 10, 2025

This pull request has been removed from the queue for the following reason: pull request branch update failed.

The pull request can't be updated.

You should update or rebase your pull request manually. If you do, this pull request will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

@mergify mergify bot removed the queued label Oct 10, 2025
@mergify mergify bot dismissed vishaalmehrishi’s stale review October 10, 2025 09:52

Pull request has been modified.

@vishaalmehrishi
Copy link
Contributor

Mergify was failing because the PR could not be updated. Rebase from the UI wasn't working either so pushed a merge commit - hopefully this will get it done

@mergify
Copy link
Contributor

mergify bot commented Oct 10, 2025

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot added the queued label Oct 10, 2025
@mergify
Copy link
Contributor

mergify bot commented Oct 10, 2025

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit c7d8004 into aws:main Oct 10, 2025
20 of 21 checks passed
@mergify mergify bot removed the queued label Oct 10, 2025
@github-actions
Copy link
Contributor

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 10, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. p1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cloudwatch: AnomalyDetectionAlarm metric period is not applied properly

5 participants