Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lifecycle rules to all non-production buckets #3666

Open
erikamov opened this issue Jan 27, 2025 · 3 comments
Open

Add lifecycle rules to all non-production buckets #3666

erikamov opened this issue Jan 27, 2025 · 3 comments
Assignees

Comments

@erikamov
Copy link
Contributor

erikamov commented Jan 27, 2025

User story / feature request

As a Developer,
When I create test data on GC Buckets designated for testing or development,
I want the test data deleted after 30 days.

As part of the Cal-ITP Cost Reduction Plan, we need to clear test buckets to reduce Goggle Cloud monthly costs.
Cal-ITP Test Buckets Cleanup.pdf

Acceptance Criteria

When I list GC Buckets designated for testing or development the should not contain data older than 30 days.

Notes

Make sure Autoclass disabled on those Buckets.

@erikamov
Copy link
Contributor Author

Post message about the cleanup and attached PDF on slack channel calitp-data-infra. Should I post in any other channel?

@tiffanychu90
Copy link
Member

  • One things I'm noticing for my set of BQ tables for dbt (sheet Pivot Table 1), I always follow the instructions here to run poetry run dbt run --full-refresh if you haven't been creating new tables for a couple weeks.
    • I only really work in one section of the warehouse though (mart_gtfs), so I probably could have a lot of tables in tiffany_* portions of cal-itp-data-infra-staging deleted. What would be the command I should run to just bring in mart_gtfs?
    • If the analysts can always just bring in the specific portion of the tables they need for adding new tables in dbt, which seems fairly infrequent nowadays except for me + Vivek, we can delete christian, eric, mine (and I'll repopulate with the portion I need).
  • Let's clean up other users who aren't working on adding warehouse tables in the foreseeable future: soren, anyone else?

@erikamov erikamov changed the title Test Buckets Cleanup Add lifecycle rules to all non-production buckets Feb 18, 2025
@erikamov
Copy link
Contributor Author

erikamov commented Feb 18, 2025

  • One things I'm noticing for my set of BQ tables for dbt (sheet Pivot Table 1), I always follow the instructions here to run poetry run dbt run --full-refresh if you haven't been creating new tables for a couple weeks.

    • I only really work in one section of the warehouse though (mart_gtfs), so I probably could have a lot of tables in tiffany_* portions of cal-itp-data-infra-staging deleted. What would be the command I should run to just bring in mart_gtfs?
    • If the analysts can always just bring in the specific portion of the tables they need for adding new tables in dbt, which seems fairly infrequent nowadays except for me + Vivek, we can delete christian, eric, mine (and I'll repopulate with the portion I need).
  • Let's clean up other users who aren't working on adding warehouse tables in the foreseeable future: soren, anyone else?

I usually run only the table that I need, and you can use "+" before the path to create any dependent models, for example: poetry run dbt run -s +models/mart/gtfs/dim_agency.sql. Or you if you use poetry run dbt run -s +models/mart/gtfs/ will run all models under models/mart/gtfs/ folder and any other models needed to generate them. Can also do the same for tests: poetry run dbt test -s +models/mart/gtfs/

See item ii:
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants