-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws_glue_crawler: unable to remove schedule from existing crawler #33194
Comments
Used below similar TypeScript code: import * as cdk from 'aws-cdk-lib';
import * as iam from 'aws-cdk-lib/aws-iam';
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as glue from 'aws-cdk-lib/aws-glue';
enum GlueCrawlerRecrawlBehavior {
CRAWL_NEW_FOLDERS_ONLY = 'CRAWL_NEW_FOLDERS_ONLY'
}
enum GlueCrawlerUpdateBehavior {
LOG = 'LOG'
}
export class GlueClawlerStack extends cdk.Stack {
constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const bucket = new s3.Bucket(this, "example-bucket");
const crawlerRole = new iam.Role(this, "example-crawler-role", {
roleName: "example-crawler-role",
assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'),
inlinePolicies: {
'crawler-policy': new iam.PolicyDocument({
statements: [
new iam.PolicyStatement({
actions: [
'logs:CreateLogGroup',
'logs:CreateLogStream',
'logs:PutLogEvents',
'cloudwatch:*',
],
effect: iam.Effect.ALLOW,
resources: ['*'],
}),
new iam.PolicyStatement({
actions: [
'glue:*',
],
effect: iam.Effect.ALLOW,
resources: [
'arn:aws:glue:region:account:catalog',
'arn:aws:glue:region:account:database/example_database',
'arn:aws:glue:region:account:table/example_database/*',
],
}),
new iam.PolicyStatement({
actions: ['s3:GetObject', 's3:ListBucket', 's3:ListObjects'],
effect: iam.Effect.ALLOW,
resources: [
bucket.bucketArn,
`${bucket.bucketArn}/*`
],
})
]
})
}
});
new glue.CfnCrawler(this, "example-crawler", {
role: crawlerRole.roleArn,
targets: {
s3Targets: [
{
path: `s3://${bucket.bucketName}/example-data/`
}
]
},
databaseName: "example_database",
name: "example-crawler",
schedule: {
scheduleExpression: 'cron(20-50/10 0 * * ? *)'
},
recrawlPolicy: {
recrawlBehavior: GlueCrawlerRecrawlBehavior.CRAWL_NEW_FOLDERS_ONLY
},
schemaChangePolicy: {
deleteBehavior: 'LOG',
updateBehavior: GlueCrawlerUpdateBehavior.LOG
},
});
}
} This synthesizes the following CFN template: Resources:
examplebucketC9DFA43E:
Type: AWS::S3::Bucket
UpdateReplacePolicy: Retain
DeletionPolicy: Retain
Metadata:
aws:cdk:path: GlueClawlerStack/example-bucket/Resource
examplecrawlerrole1B62B8EE:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Action: sts:AssumeRole
Effect: Allow
Principal:
Service: glue.amazonaws.com
Version: "2012-10-17"
Policies:
- PolicyDocument:
Statement:
- Action:
- cloudwatch:*
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Effect: Allow
Resource: "*"
- Action: glue:*
Effect: Allow
Resource:
- arn:aws:glue:region:account:catalog
- arn:aws:glue:region:account:database/example_database
- arn:aws:glue:region:account:table/example_database/*
- Action:
- s3:GetObject
- s3:ListBucket
- s3:ListObjects
Effect: Allow
Resource:
- Fn::GetAtt:
- examplebucketC9DFA43E
- Arn
- Fn::Join:
- ""
- - Fn::GetAtt:
- examplebucketC9DFA43E
- Arn
- /*
Version: "2012-10-17"
PolicyName: crawler-policy
RoleName: example-crawler-role
Metadata:
aws:cdk:path: GlueClawlerStack/example-crawler-role/Resource
examplecrawler:
Type: AWS::Glue::Crawler
Properties:
DatabaseName: example_database
Name: example-crawler
RecrawlPolicy:
RecrawlBehavior: CRAWL_NEW_FOLDERS_ONLY
Role:
Fn::GetAtt:
- examplecrawlerrole1B62B8EE
- Arn
Schedule:
ScheduleExpression: cron(20-50/10 0 * * ? *)
SchemaChangePolicy:
DeleteBehavior: LOG
UpdateBehavior: LOG
Targets:
S3Targets:
- Path:
Fn::Join:
- ""
- - s3://
- Ref: examplebucketC9DFA43E
- /example-data/
Metadata:
aws:cdk:path: GlueClawlerStack/example-crawler
CDKMetadata:
Type: AWS::CDK::Metadata
Properties:
Analytics: v2:deflate64:H4sIAAAAAAAA/y2KQQ7CIBAA39I7rG01qWf5AT7AIN0aCoWEBTkQ/m4aPM1kMjNMywLjoApxvVruzBvqMyltmSr0qnSF+sjaYmJi890aM+qAKoPDM55s7OMygti8iKo4jI1JpJCj7svfG/NhRdjp8p3uMI9wG3YyhsfskzkQZOcPVnt/DZUAAAA=
Metadata:
aws:cdk:path: GlueClawlerStack/CDKMetadata/Default
Parameters:
BootstrapVersion:
Type: AWS::SSM::Parameter::Value<String>
Default: /cdk-bootstrap/hnb659fds/version
Description: Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip] And creates Glue Crawler in AWS console when deployed. In AWS console, we could select Commenting out code: schedule: {
scheduleExpression: 'cron(20-50/10 0 * * ? *)'
}, gives the below output for
It generates CloudFormation stack without ...
examplecrawler:
Type: AWS::Glue::Crawler
Properties:
DatabaseName: example_database
Name: example-crawler
RecrawlPolicy:
RecrawlBehavior: CRAWL_NEW_FOLDERS_ONLY
Role:
Fn::GetAtt:
- examplecrawlerrole1B62B8EE
- Arn
SchemaChangePolicy:
DeleteBehavior: LOG
UpdateBehavior: LOG
Targets:
S3Targets:
- Path:
Fn::Join:
- ""
- - s3://
- Ref: examplebucketC9DFA43E
- /example-data/
Metadata:
aws:cdk:path: GlueClawlerStack/example-crawler
... Deploying it appears to update the Using Empty String for @gergobig This appears to be CloudFormation issue/limitation. Please use empty string Thank, |
Thanks @ashishdhingra! I find this issue pretty weird on the CloudFormation side, however, your suggestion does work. I will raise this issue on their side. |
Comments on closed issues and PRs are hard for our team to see. |
Describe the bug
Let's say, I have a crawler that triggers every 10 minutes between 12:20 AM and 12:50 AM
cron(20-50/10 0 * * ? *)
but when I try to remove this schedule and make it "Run on demand" by providingNone
as schedule. The schedule is not removed.When changing the schedule to
None
the following message appears incdk diff
's output:However, the schedule persists and has not been removed on the AWS side.
Regression Issue
Last Known Working CDK Version
No response
Expected Behavior
Passing
None
as schedule erases existing rules and makes it "run on demand".Current Behavior
Passing
None
as schedule leaves the previous configuration.Reproduction Steps
Deploy this stack with a schedule and redeploy with
schedule=None
.Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.128.0
Framework Version
No response
Node.js Version
nodejs: 18
OS
alpine 3
Language
Python
Language Version
No response
Other information
No response
The text was updated successfully, but these errors were encountered: