Skip to content

Option to filter tags by service or resource type in aws_tagging_resource table#2466

Closed
thomasklemm wants to merge 1 commit intoturbot:mainfrom
thomasklemm:feat/resource-tags-filters
Closed

Option to filter tags by service or resource type in aws_tagging_resource table#2466
thomasklemm wants to merge 1 commit intoturbot:mainfrom
thomasklemm:feat/resource-tags-filters

Conversation

@thomasklemm
Copy link
Contributor

@thomasklemm thomasklemm commented Apr 8, 2025

Adds the option to filter the aws_tagging_resource table by resource types, e.g. ec2:instance,s3:bucket,auditmanager for limiting the response to only Amazon EC2 instances, Amazon S3 buckets, or any AWS Audit Manager resource.

Integration test logs

Logs
Add passing integration test logs here

Example queries

-- Filter for all tagged EC2 & RDS resources, plus S3 buckets
select arn from aws_tagging_resource where resource_types = '["ec2", "rds", "s3:bucket"]';
-- => Returns only tags for expected resources

-- FIlter for EC2 instances, RDS database instances and S3 buckets
select arn from aws_tagging_resource where resource_types = '["ec2:instance", "rds:db", "s3:bucket"]';
-- => Returns only tags for expected resources

-- Filter gets ignored when empty
select arn from aws_tagging_resource where resource_types = '[]';
-- => Returns tags for all tagged resources

-- Filter for all resource types supported
select count(*) from aws_tagging_resource where resource_types = '["access-analyzer","acm","acm-pca","airflow","amplify","apigateway","app-integrations","appconfig","appflow","appmesh","apprunner","appstream","appsync","aps","athena","auditmanager","backup","batch","ce","cloud9","cloudformation","cloudfront","cloudtrail","cloudwatch","codeartifact","codebuild","codecommit","codeconnections","codedeploy","codeguru-profiler","codeguru-reviewer","codepipeline","codestar-connections","cognito-identity","cognito-idp","comprehend","connect","databrew","dataexchange","datapipeline","datasync","dax","detective","devicefarm","dms","ds","dynamodb","ec2","ecr","ecr-public","ecs","eks","elasticache","elasticbeanstalk","elasticfilesystem","elasticloadbalancing","elasticmapreduce","emr-containers","emr-serverless","es","events","evidently","finspace","firehose","fis","forecast","frauddetector","fsx","gamelift","geo","glacier","globalaccelerator","glue","grafana","greengrass","groundstation","guardduty","healthlake","iam","imagebuilder","inspector","iot","iotanalytics","iotdeviceadvisor","iotevents","iotfleetwise","iotsitewise","iottwinmaker","iotwireless","ivs","ivschat","kafka","kendra","kinesis","kinesisanalytics","kinesisvideo","kms","lambda","lex","logs","lookoutmetrics","lookoutvision","m2","managedblockchain","mediapackage","mediapackage-vod","mediatailor","memorydb","mobiletargeting","mq","network-firewall","networkmanager","oam","omics","outposts","panorama","personalize","pipes","proton","qldb","quicksight","ram","rds","redshift","refactor-spaces","rekognition","resiliencehub","resource-explorer-2","resource-groups","route53","route53-recovery-control","route53-recovery-readiness","route53resolver","rum","s3","sagemaker","scheduler","schemas","secretsmanager","servicecatalog","servicediscovery","ses","signer","sns","sqs","ssm","states","storagegateway","synthetics","transfer","wisdom","workspaces","chatbot","config","organizations","payments","securityhub"]'

@misraved misraved requested review from ParthaI and Copilot April 8, 2025 21:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

@thomasklemm thomasklemm force-pushed the feat/resource-tags-filters branch from c7120ff to 8e75ea2 Compare April 9, 2025 06:56
@thomasklemm thomasklemm requested a review from Copilot April 9, 2025 07:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

@ParthaI
Copy link
Contributor

ParthaI commented Apr 11, 2025

Hi @thomasklemm, I was reviewing the code changes and had a few suggestions for improvement:

  • Would it make sense to use resource_types as the column name instead of resource_type_filter for better alignment with naming conventions?
  • Also, could we consider defining the column type as JSON instead of string?
  • This would allow us to use d.EqualsQuals["resource_types"].GetJsonbValue() and parse it directly as a slice of strings, rather than relying on comma-separated values.
  • For implementation reference, you might take a look at similar patterns used in the tables aws_cloudwatch_metric_data_point, aws_cloudwatch_metric, and aws_pricing_product.
  • Lastly, could you please include example queries in the table documentation demonstrating how to use the resource_types column as a filter?

Thanks!

@thomasklemm thomasklemm force-pushed the feat/resource-tags-filters branch from 8e75ea2 to 3c2a4eb Compare April 14, 2025 09:11
@ParthaI
Copy link
Contributor

ParthaI commented Apr 17, 2025

Hi @thomasklemm, just checking in to see if you had a chance to review the suggestions and comments above.

@thomasklemm
Copy link
Contributor Author

Hi @ParthaI, thanks for the detailed suggestions! I made them locally, need to test them in our cluster, would update the PR after

@ParthaI
Copy link
Contributor

ParthaI commented May 7, 2025

Hi @thomasklemm, just checking in to see if you’ve had a chance to test it out and push the changes based on your findings?

@thomasklemm thomasklemm force-pushed the feat/resource-tags-filters branch from 3c2a4eb to 9779b47 Compare May 8, 2025 09:52
@thomasklemm
Copy link
Contributor Author

thomasklemm commented May 8, 2025

Hi @ParthaI, made the changes and adjusted the initial query examples to match the resource_type = '[...]' syntax. Changes are working locally when I make the example queries. Also added some documentation for the feature in the w/ example queries in the docs.

Tried these queries and all looks correct:

select arn from aws_tagging_resource where resource_types = '["ec2", "rds", "s3:bucket"]' order by arn;
select count(arn) from aws_tagging_resource where resource_types = '["ec2", "rds", "s3:bucket"]';

select arn from aws_tagging_resource where resource_types = '["ec2:instance", "rds:db", "s3:bucket"]' order by arn;
select count(arn) from aws_tagging_resource where resource_types = '["ec2:instance", "rds:db", "s3:bucket"]';

select arn from aws_tagging_resource where resource_types = '[]' order by arn;
select arn from aws_tagging_resource order by arn;

select count(arn) from aws_tagging_resource where resource_types = '[]';
select count(arn) from aws_tagging_Resource;

@thomasklemm thomasklemm requested a review from Copilot May 8, 2025 10:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new filter option for the aws_tagging_resource table that allows filtering resources by specific AWS service or resource types using a JSON array of strings.

  • Added documentation in aws_tagging_resource.md to explain the new filtering option with examples.
  • Modified aws/table_aws_tagging_resource.go to support a new key column "resource_types" and to parse the JSON array of resource types from query qualifiers.
  • Enhanced error handling for invalid JSON input in resource_types.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
docs/tables/aws_tagging_resource.md Updated documentation to describe resource type filters.
aws/table_aws_tagging_resource.go Added key column support, JSON parsing, and error handling for resource_types filter.
Comments suppressed due to low confidence (1)

aws/table_aws_tagging_resource.go:127

  • [nitpick] Consider renaming 'resource_types' to 'rawResourceTypes' to better distinguish it from the parsed slice 'resourceTypes' and to align with Go naming conventions.
resource_types := d.EqualsQuals["resource_types"].GetJsonbValue()

@thomasklemm
Copy link
Contributor Author

@ParthaI There's two length constraints mentioned in the API docs: 100 array items, and 256 characters in total for the string that gets sent to the API. Wondering if we should handle that here and raise an error to the user? If I see correctly AWS just starts to ignore resource types after the character limit (but need to verify this better)

image

@ParthaI
Copy link
Contributor

ParthaI commented May 8, 2025

Hello @thomasklemm, thank you for sharing the detailed information.

Approach 1: We can pass the resourceTypes input exactly as provided by the user in the query. The API will then handle it according to its default behavior. If the input exceeds the allowed limits and the API returns an error, we simply propagate that error back to the user—this helps them understand the limit and how the API behaves.

Approach 2: Alternatively, if the resourceTypes array exceeds the documented limits (more than 256 characters or more than 100 items), we can split it into smaller chunks and make multiple API calls accordingly. This ensures that no data is missed, even if the user requests a large number of resource types.

In my opinion, I’d prefer Approach 1, as it aligns with the default behavior of the AWS CLI.

Please let me know your thoughts. Thanks!

@thomasklemm
Copy link
Contributor Author

thomasklemm commented May 8, 2025

@ParthaI I think the API in this case in not returning an error in either case (array > 100 entries, complete string > 256 characters) based on what I observed earlier, it will just silently drop the additional items. The string limit is actually quite easy to it if you're querying for more than 20-25 services at the same time, w/ complete resource types much less is actually usable. I think it might make sense to raise an error in the code to allow the user to adjust their query, or even better do the chunking you describe in approach 2. Is there another place in the AWS plugin where this strategy is being used?

Another thing I just noticed: I think the caching isn't working in the case where resource_types is provided right now, do you have an intuition why this might be happening? Based on query times it always fetching the data, not returning any cached data.

@ParthaI
Copy link
Contributor

ParthaI commented May 12, 2025

@ParthaI I think the API in this case in not returning an error in either case (array > 100 entries, complete string > 256 characters) based on what I observed earlier, it will just silently drop the additional items. The string limit is actually quite easy to it if you're querying for more than 20-25 services at the same time, w/ complete resource types much less is actually usable. I think it might make sense to raise an error in the code to allow the user to adjust their query, or even better do the chunking you describe in approach 2. Is there another place in the AWS plugin where this strategy is being used?

We have implemented a similar pattern (though not exactly the same) in the aws_codecommit_repository table, where we list resources using a batch process.

Another thing I just noticed: I think the caching isn't working in the case where resource_types is provided right now, do you have an intuition why this might be happening? Based on query times it always fetching the data, not returning any cached data.

Since we are using CacheMatch: query_cache.CacheMatchExact for the resource_types key column, the cache will only be used if the query parameter exactly matches; otherwise, it will result in a cache miss. And I think this is expected.

@ParthaI
Copy link
Contributor

ParthaI commented May 19, 2025

Hello @thomasklemm, did you get a chance to take a look at the above comment?

@thomasklemm thomasklemm force-pushed the feat/resource-tags-filters branch from 9779b47 to 9fdcdbf Compare May 24, 2025 20:12
@thomasklemm thomasklemm requested a review from Copilot May 25, 2025 18:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces filtering functionality by resource types for the aws_tagging_resource table. Key changes include:

  • Updating documentation to describe the new JSON array filter for resource types.
  • Adding a "resource_types" column and KeyColumns configuration to support filtering.
  • Implementing JSON parsing, batching of resource type qualifiers, and deduplication of results based on ARN.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
docs/tables/aws_tagging_resource.md Added documentation for filtering resources by resource types.
aws/table_aws_tagging_resource.go Implemented parsing, batching, and deduplication for the new filter.
Comments suppressed due to low confidence (2)

aws/table_aws_tagging_resource.go:122

  • [nitpick] Consider renaming 'resource_types' to 'resourceTypesJSON' to follow Go's camelCase naming conventions and to clearly differentiate the qualifier value from other variables.
resource_types := d.EqualsQuals["resource_types"].GetJsonbValue()

aws/table_aws_tagging_resource.go:222

  • [nitpick] Consider renaming 'currentItems' to 'batchCount' to improve clarity and align with idiomatic Go naming practices.
currentItems := 0

@thomasklemm thomasklemm requested a review from Copilot May 25, 2025 19:50

This comment was marked as outdated.

@thomasklemm
Copy link
Contributor Author

@ParthaI I have now confirmed that the 256 character limit that the API docs mention doesn't exist in reality, not sure why it made it to the docs. However the 100 items limit exists, so I adjusted the implementation to do automatic batching if more than 100 services/resource types get provided to the resource types filter. Locally it's working very well, but I'd like to confirm it in our production environment too w/ access to larger AWS organizations, so see if there's any issues.
Will report back and then craft nicer commits :)

Without the batching, this error would get returned: operation error Resource Groups Tagging API: GetResources, https response error StatusCode: 400, RequestID: ca55ac0c-6b4a-49f7-98b9-4334a0f8f8b2, InvalidParameterException: ResourceTypeFilters provided are more than allowed limit 100

@ParthaI
Copy link
Contributor

ParthaI commented May 28, 2025

Thanks, @thomasklemm , for diving deeper into this. The implementation looks good. Please let me know once you've pushed the changes to this PR, and I’ll do a final review.

Thanks again for all your efforts!

@ParthaI
Copy link
Contributor

ParthaI commented Jun 9, 2025

Hello @thomasklemm, Just checking, did you get any chance to push your latest changes in this PR?

@thomasklemm thomasklemm force-pushed the feat/resource-tags-filters branch 2 times, most recently from ae2f94e to e1a1f5e Compare June 9, 2025 11:57
@thomasklemm thomasklemm force-pushed the feat/resource-tags-filters branch from e1a1f5e to a3d7ff5 Compare June 9, 2025 12:39
Add support for filtering tagged resources by service or resource type through a new
`resource_types` column that accepts a JSON array of filter strings.

- Add `resource_types` column to filter resources by type (e.g., ["ec2:instance", "s3:bucket"])
- Implement automatic batching to handle AWS API's 100-item limit for resource type filters
- Add comprehensive documentation with examples for common resource type queries

- The column accepts service-wide filters (e.g., "lambda") or specific resource types (e.g., "lambda:function")
- Large filter lists are automatically split into multiple API requests (100 items per batch)
- Results are deduplicated by ARN to prevent duplicate entries across batches
- Full backward compatibility maintained - existing queries work unchanged

- Parse resource type filters from query qualifiers using GetJsonbValue()
- Split filters into batches respecting the 100-item API limit
- Process each batch with separate API calls
- Track seen resources by ARN to avoid duplicates
- Stream results as they're received for optimal performance

This feature enables more targeted queries for large AWS environments, reducing API calls
and improving query performance when working with specific resource types.

Co-authored-by: ParthaI <parthai@turbot.com>
@thomasklemm thomasklemm force-pushed the feat/resource-tags-filters branch from a3d7ff5 to 6b05500 Compare June 9, 2025 12:41
@thomasklemm thomasklemm requested a review from Copilot June 9, 2025 12:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for filtering the aws_tagging_resource table by resource types, allowing users to limit query results based on specific AWS service or resource type identifiers.

  • Updated documentation with examples demonstrating the valid JSON array syntax for resource type filters.
  • Implemented new functions for parsing filters, batching resource type parameters to comply with the API limit, and handling deduplication across multiple API calls.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
docs/tables/aws_tagging_resource.md Adds detailed examples and usage instructions for filtering resource types in the documentation.
aws/table_aws_tagging_resource.go Introduces parsing, batching, fetching, and deduplication logic for resource type filters in queries.
Comments suppressed due to low confidence (1)

aws/table_aws_tagging_resource.go:155

  • [nitpick] Consider updating the error message to match documented examples (e.g. using 'rds:db' if that is the expected format) for consistency between the error feedback and the documentation.
return nil, errors.New("failed to parse 'resource_types' qualifier: value must be a JSON array of strings, e.g. [\"ec2:instance\", \"s3:bucket\", \"rds\"]")

Name: "resource_types",
Require: plugin.Optional,
Operators: []string{"="},
CacheMatch: query_cache.CacheMatchExact,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthaI As far as I can tell there's no caching being applied for this query, the query times are the almost the same when you rerun a query, so it behaves very differently then when another table returns cached data. Any idea how to debug this further? Or should I just remove this line then?

@thomasklemm
Copy link
Contributor Author

HI @ParthaI, I've had a chance to test this in production against AWS organizations of different sizes and have been getting good results for smaller organizations (< 100k tags returned). However I'm stuck on a case where there's a small number of duplicates getting returned – not sure yet why, could be that the AWS API is returning the same resource from multiple regions, which the code here doesn't catch, it only deduplicates when there's the same ARN getting returned across multiple requests in the same batch; hooking into the underlying matrix of requests against different regions and deduplicating across them looks quite a bit more involved, I have the feeling that it doesn't make sense to go this way and rather recommend to the user to put a DISTINCT ON (arn) into their query if they need it. In general this PR in the current state feels very ergonomic to use, including the automatic batching if there's more than 100 services/resource types being passed in. Could you give this PR a review? I've also tried to add good documentation in.

@ParthaI
Copy link
Contributor

ParthaI commented Jun 11, 2025

Thanks, @thomasklemm, for the detailed information about your testing. I'll take a look at the PR!

@ParthaI
Copy link
Contributor

ParthaI commented Jun 12, 2025

Hello @thomasklemm,

I’ve reviewed the code changes, and everything looks great—thank you for the clean implementation! 🙌

I’ve been testing the changes from your PR branch, and so far, I haven’t encountered any duplicate rows. As you mentioned, this could be due to the relatively small number of resources (fewer than 500) in my environment.

I'd like to investigate this further. Could you please help me with the following details?

  1. Which resource types are showing duplicate rows? Are these global resources?

  2. What query did you use during testing? If you could share it, that would help me run a similar test across multiple resource types.

  3. Could you try filtering the specific resource types where duplicates appear by using the resource_types filter in the WHERE clause? For example, if you’re seeing duplicates for EC2, the query would look like:

    select distinct arn from aws_tagging_resource where resource_types = '["ec2"]';
  4. If the duplicates are no longer appearing, then it’s possible the issue was environmental or data-specific. Otherwise, we may need to revisit the code.

Appreciate your thorough testing and detailed observations—thank you again! Looking forward to your response so I can dive deeper into this. 🙏

@ParthaI
Copy link
Contributor

ParthaI commented Jun 27, 2025

Hi @thomasklemm, have you had a chance to review my above comment?

@misraved
Copy link
Contributor

Hi @thomasklemm,

Thank you for your contribution and for taking the time to open this PR! Since we haven’t heard back in a while, we’ll go ahead and close this for now to keep things tidy.

If you’d like to revisit this work in the future, feel free to reopen the PR or open a new one - we’d be happy to review it when you’re ready. We really appreciate your effort and hope to collaborate again soon!

Thanks 👍!!

@misraved misraved closed this Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants