Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCR test(s) fail during replay: Framework converted resources #14158

Open
roaks3 opened this issue Mar 31, 2023 · 10 comments · May be fixed by GoogleCloudPlatform/magic-modules#11903
Open

VCR test(s) fail during replay: Framework converted resources #14158

roaks3 opened this issue Mar 31, 2023 · 10 comments · May be fixed by GoogleCloudPlatform/magic-modules#11903

Comments

@roaks3
Copy link
Collaborator

roaks3 commented Mar 31, 2023

Failure rates

  • 100%

Impacted tests

  • TestAccFrameworkProviderBasePath_setBasePath
  • TestAccDataSourceGoogleFirebaseAppleAppConfig
  • TestAccDataSourceDNSKeys_noDnsSec
  • TestAccDataSourceDNSKeys_basic
  • TestAccDataSourceDnsManagedZone_basic
  • TestAccDataSourceDnsRecordSet_basic
  • TestAccFrameworkProviderMeta_setModuleName
  • TestAccFirebaseWebApp_firebaseWebAppFull
  • TestAccFirebaseWebApp_firebaseWebAppBasicExample

Affected Resource(s)

  • Marking google_* since this is related to the framework upgrades

List of resources migrated to the plugin framework

  • No resources have been migrated

List of data sources migrated to the plugin framework

  • data.google_client_config
  • data.google_client_openid_userinfo
  • data.google_dns_keys
  • data.google_dns_managed_zone
  • data.google_dns_record_set
  • data.google_firebase_android_app_config (Beta)
  • data.google_firebase_apple_app_config (Beta)
  • data.google_firebase_web_app_config (Beta)

Affected PR(s)

GoogleCloudPlatform/magic-modules#7580

Message(s)

2023-03-30T19:27:18.895Z [WARN]  sdk.helper_resource: Error running Terraform CLI command: test_name=TestAccFrameworkProviderBasePath_setBasePath test_step_number=3 test_terraform_path=/bin/terraform test_working_directory=/tmp/plugintest3335870807/work2314434417
  error=
  | exit status 1
  | 
  | Error: Error when reading or editing DNSManagedZone "projects/ci-test-project-188019/managedZones/tf-test-zone-n36lc0kstv": Get "https://www.googleapis.com/dns/v1beta2/projects/ci-test-project-188019/managedZones/tf-test-zone-n36lc0kstv?alt=json": Requested interaction not found
  | 
  |   with google_dns_managed_zone.foo,
  |   on terraform_plugin_test.tf line 7, in resource "google_dns_managed_zone" "foo":
  |    7: resource "google_dns_managed_zone" "foo" {
  | 
  

2023-03-30T19:27:22.187Z [WARN]  sdk.helper_resource: Error running Terraform CLI command: test_working_directory=/tmp/plugintest3335870807/work2314434417 test_name=TestAccFrameworkProviderBasePath_setBasePath test_step_number=3
  error=
  | exit status 1
  | 
  | Error: Error when reading or editing ManagedZone: Delete "https://www.googleapis.com/dns/v1beta2/projects/ci-test-project-188019/managedZones/tf-test-zone-n36lc0kstv?alt=json": Requested interaction not found
  |
@roaks3
Copy link
Collaborator Author

roaks3 commented Mar 31, 2023

It looked to me like none of the requests for these tests were being found during replay, so I wonder if the new framework is doing something different with requests, or possibly recording/replaying them differently.

@rileykarson rileykarson added this to the Near-Term Goals milestone Apr 3, 2023
mraouffouad added a commit to mraouffouad/magic-modules that referenced this issue Jul 28, 2023
slevenick pushed a commit to GoogleCloudPlatform/magic-modules that referenced this issue Jul 28, 2023
…o identityplatform config (#8402)

* Add Add BlockingFunctionsConfig, AuthorizedDomains and QuotaConfig fields to Config.yaml

* adding new fields to identity_platform_config_basic.tf.erb

* Update Config.yaml

Temporarily enable VCR to run the tests. Also, provide a more user's friendly desc for the quota field.

* Fix the failing test

* Update Config.yaml

Fix the quota start_time format.

* Attempt 2: Fix the failing test

* Update Config.yaml Enabling VCR.

* Update Config.yaml

Re-enable skip_vcr due to hashicorp/terraform-provider-google#14158.
modular-magician added a commit to modular-magician/terraform-provider-google-beta that referenced this issue Jul 28, 2023
…o identityplatform config (hashicorp#8402)

* Add Add BlockingFunctionsConfig, AuthorizedDomains and QuotaConfig fields to Config.yaml

* adding new fields to identity_platform_config_basic.tf.erb

* Update Config.yaml

Temporarily enable VCR to run the tests. Also, provide a more user's friendly desc for the quota field.

* Fix the failing test

* Update Config.yaml

Fix the quota start_time format.

* Attempt 2: Fix the failing test

* Update Config.yaml Enabling VCR.

* Update Config.yaml

Re-enable skip_vcr due to hashicorp/terraform-provider-google#14158.

Signed-off-by: Modular Magician <[email protected]>
modular-magician added a commit to modular-magician/terraform-provider-google that referenced this issue Jul 28, 2023
…o identityplatform config (hashicorp#8402)

* Add Add BlockingFunctionsConfig, AuthorizedDomains and QuotaConfig fields to Config.yaml

* adding new fields to identity_platform_config_basic.tf.erb

* Update Config.yaml

Temporarily enable VCR to run the tests. Also, provide a more user's friendly desc for the quota field.

* Fix the failing test

* Update Config.yaml

Fix the quota start_time format.

* Attempt 2: Fix the failing test

* Update Config.yaml Enabling VCR.

* Update Config.yaml

Re-enable skip_vcr due to hashicorp#14158.

Signed-off-by: Modular Magician <[email protected]>
modular-magician added a commit that referenced this issue Jul 28, 2023
…o identityplatform config (#8402) (#15325)

* Add Add BlockingFunctionsConfig, AuthorizedDomains and QuotaConfig fields to Config.yaml

* adding new fields to identity_platform_config_basic.tf.erb

* Update Config.yaml

Temporarily enable VCR to run the tests. Also, provide a more user's friendly desc for the quota field.

* Fix the failing test

* Update Config.yaml

Fix the quota start_time format.

* Attempt 2: Fix the failing test

* Update Config.yaml Enabling VCR.

* Update Config.yaml

Re-enable skip_vcr due to #14158.

Signed-off-by: Modular Magician <[email protected]>
modular-magician added a commit to hashicorp/terraform-provider-google-beta that referenced this issue Jul 28, 2023
…o identityplatform config (#8402) (#5964)

* Add Add BlockingFunctionsConfig, AuthorizedDomains and QuotaConfig fields to Config.yaml

* adding new fields to identity_platform_config_basic.tf.erb

* Update Config.yaml

Temporarily enable VCR to run the tests. Also, provide a more user's friendly desc for the quota field.

* Fix the failing test

* Update Config.yaml

Fix the quota start_time format.

* Attempt 2: Fix the failing test

* Update Config.yaml Enabling VCR.

* Update Config.yaml

Re-enable skip_vcr due to hashicorp/terraform-provider-google#14158.

Signed-off-by: Modular Magician <[email protected]>
modular-magician added a commit to modular-magician/terraform-google-conversion that referenced this issue Jul 28, 2023
…o identityplatform config (#8402)

* Add Add BlockingFunctionsConfig, AuthorizedDomains and QuotaConfig fields to Config.yaml

* adding new fields to identity_platform_config_basic.tf.erb

* Update Config.yaml

Temporarily enable VCR to run the tests. Also, provide a more user's friendly desc for the quota field.

* Fix the failing test

* Update Config.yaml

Fix the quota start_time format.

* Attempt 2: Fix the failing test

* Update Config.yaml Enabling VCR.

* Update Config.yaml

Re-enable skip_vcr due to hashicorp/terraform-provider-google#14158.

Signed-off-by: Modular Magician <[email protected]>
modular-magician added a commit to GoogleCloudPlatform/terraform-google-conversion that referenced this issue Jul 28, 2023
…o identityplatform config (#8402) (#1211)

* Add Add BlockingFunctionsConfig, AuthorizedDomains and QuotaConfig fields to Config.yaml

* adding new fields to identity_platform_config_basic.tf.erb

* Update Config.yaml

Temporarily enable VCR to run the tests. Also, provide a more user's friendly desc for the quota field.

* Fix the failing test

* Update Config.yaml

Fix the quota start_time format.

* Attempt 2: Fix the failing test

* Update Config.yaml Enabling VCR.

* Update Config.yaml

Re-enable skip_vcr due to hashicorp/terraform-provider-google#14158.

Signed-off-by: Modular Magician <[email protected]>
DanielRieske pushed a commit to bschaatsbergen/magic-modules that referenced this issue Aug 2, 2023
…o identityplatform config (GoogleCloudPlatform#8402)

* Add Add BlockingFunctionsConfig, AuthorizedDomains and QuotaConfig fields to Config.yaml

* adding new fields to identity_platform_config_basic.tf.erb

* Update Config.yaml

Temporarily enable VCR to run the tests. Also, provide a more user's friendly desc for the quota field.

* Fix the failing test

* Update Config.yaml

Fix the quota start_time format.

* Attempt 2: Fix the failing test

* Update Config.yaml Enabling VCR.

* Update Config.yaml

Re-enable skip_vcr due to hashicorp/terraform-provider-google#14158.
@melinath melinath assigned SarahFrench and unassigned megan07 Oct 6, 2023
@zachberger

This comment was marked as off-topic.

@SarahFrench
Copy link
Member

SarahFrench commented Nov 14, 2023

I'm going to update the issue description with a list of data sources that have been migrated to the plugin framework, and acceptance tests including those data sources are likely to be affected. It is possible for the VCR system to not have recorded interactions for other reasons than issues with the plugin framework migration.

Edit: Done, the failing acceptance tests listed on this issue map pretty well to the migrated data sources.

@SarahFrench

This comment was marked as off-topic.

@SarahFrench

This comment was marked as off-topic.

@SarahFrench
Copy link
Member

SarahFrench commented Feb 16, 2024

Investigation

Stuff from running TestAccDataSourceDnsManagedZone_basic in VCR_MODE=RECORDING locally

The test passes, there are files made but they're 'incomplete':

fixtures/TestAccDataSourceDnsManagedZone_basic.seed

2815386111364297813

fixtures/TestAccDataSourceDnsManagedZone_basic.yaml

---
version: 1
interactions:
- request:
    body: ""
    form: {}
    headers:
      Content-Type:
      - application/json
      User-Agent:
      - Terraform/1.8.0-dev (+https://www.terraform.io) Terraform-Plugin-SDK/terraform-plugin-framework
        terraform-provider-google/dev
    url: https://dns.googleapis.com/dns/v1/projects/PROJECT_ID/managedZones/tf-test-zone-0icnli2epb?alt=json
    method: GET
  response:
    body: |
      {
        "error": {
          "code": 404,
          "message": "The 'parameters.managedZone' resource named 'tf-test-zone-0icnli2epb' does not exist.",
          "errors": [
            {
              "message": "The 'parameters.managedZone' resource named 'tf-test-zone-0icnli2epb' does not exist.",
              "domain": "global",
              "reason": "notFound"
            }
          ]
        }
      }
    headers:
      Alt-Svc:
      - h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
      Cache-Control:
      - private
      Content-Type:
      - application/json; charset=UTF-8
      Date:
      - Fri, 16 Feb 2024 17:03:48 GMT
      Server:
      - ESF
      Vary:
      - Origin
      - X-Origin
      - Referer
      X-Content-Type-Options:
      - nosniff
      X-Frame-Options:
      - SAMEORIGIN
      X-Xss-Protection:
      - "0"
    status: 404 Not Found
    code: 404
    duration: ""

There should be more than 1 interaction listed for the test that provisions and deletes resources.

The single interaction recorded there is from the plugin framework provider, because "terraform-plugin-framework" is in the user agent. Where are the other API interactions for the PF provider?

Stuff from running the above with a debugger in the VCR-related code

After investigating this a bit I've found the problem shows up in the acctest package's closeRecorder function. Hopefully that's the only place that will need a fix.

That function has a block that's repeated once per SDK/PF provider (SDK provider block is first, PF provider block is second). Those blocks roughly perform these actions:

  • retrieves the config of the provider from a map, with one used per running test (provider config includes stuff like creds, client, default values etc)
  • stops the recorder in the client within the provider's config - this creates the YAML file with API interactions
  • writes VCR seed to file - this is implemented directly in our code and is not in the VCR dependency
  • deletes the provider config for that particular test from the map of configs
  • deletes the vcr source for that particular test from the map of vcr sources (vcr source contains a seed value)

When you run a test in VCR mode there will be an SDK and PF client present. When the first block (SDK) runs it has access to the vcr source and as a result is able to save VCR stuff to file. That block ends by deleting the vcr source from the global map variable. The PF block runs immediately after and cannot find the vcr source, as it was just deleted, and doesn't write anything to file. Edit: this only impacts saving the seed, not the YAML file.

This suggests that only interactions that the SDK client has are ever written to file, and the PF client stuff is never recorded. However in the example I show above there is 1 interaction recorded from the PF provider as "terraform-plugin-framework" is in the user agent. I'll need to look at this more.

Thinking forward, this suggests there are problems with tests that will use the 2 separate clients of the SDK and PF provider configs. Especially if you look at the comment above getCachedConfig:

// Returns a cached config if VCR testing is enabled. This enables us to use a single HTTP transport
// for a given test, allowing for recording of HTTP interactions.
// Why this exists: schema.Provider.ConfigureFunc is called multiple times for a given test
// ConfigureFunc on our provider creates a new HTTP client and sets base paths (config.go LoadAndValidate)
// VCR requires a single HTTP client to handle all interactions so it can record and replay responses so
// this caches HTTP clients per test by replacing ConfigureFunc

By having different clients for the SDK and PF code we break the assumption of "a single HTTP transport for a given test, allowing for recording of HTTP interactions"

@SarahFrench
Copy link
Member

SarahFrench commented Feb 16, 2024

More notes:

  • The act of stopping (via .Stop()) the recorder in the provider client causes the API interactions to be saved to YAML files by the package we use for VCR tests: https://github.com/dnaeon/go-vcr/blob/b3f5a17c396f1f45e232e36c6eed2577da52d22a/recorder/recorder.go#L188 . I.e. we cannot easily tweak logic to help address this problem.
    • Saving seed files is something we choose to do and is implemented in the TPG/TPGB codebases
  • The 2 blocks described above don't seem to conflict - the last block has final say on what the files contain and that's unchanged if you delete the first block.
  • The two sets of interactions you can save from VCR recording TestAccDataSourceDnsManagedZone_basic is either 6 API interactions via the SDK client or 1 API interaction via the PF client. This test provisions a resource via SDK provider and then uses the data source via the PF provider so the mixture makes sense

@SarahFrench
Copy link
Member

SarahFrench commented Feb 16, 2024

In addition to what I said before, there is another incompatibility between VCR and the PF-implemented data source:

When performing Read actions (no C/U/D on data source) a new client is used which is separate to the client within the provider's config. When you enable TF_LOG = DEBUG or higher you don't see the request/response from when the data source is being read into state because this new client is separate to that logging logic somehow. The 'central' client in the PF provider is referenced here via option.WithHTTPClient when setting up the new client though.

However when the 'check destroy' function for the acc test does an HTTP request using the plugin-framework provider's client, instead of a new client made in the Read function, that request is reflected in the debug logs and is the single API interaction shown in the VCR recordings from my investigation above.


Summary

  • The migrated data sources use client libraries to interact with APIs, versus directly client configured in the provider config, so the interactions with the APIs during Read actions for those data sources cannot be detected in VCR tests
  • Even if the above was fixed, VCR tests rely on a single client that all API requests go though, and at best we have 2 clients - one for SDK and one for PF

Solutions?

Looks like making the muxed provider use a single central client would be a good idea to ensure VCR compatibility during migrations in future. And to avoid using client libraries within migrated datasources. For now we should avoid any new PF-migrated data sources or resources until these problems are addressed.

I'm going to be working on planning proposals for re-starting the muxing/migration project this quarter so I won't be acting on any of the ideas above any time soon. But doing this investigation was very useful for figuring out previously unidentified risks with the migration project.

@SarahFrench
Copy link
Member

I'm currently writing up an RFC that includes discussion about this issue and possible solutions.

While I was taking a look in the code I noticed that the SDK provider has some caching, due to how the provider will be re-configured multiple times during a test:

// Returns a cached config if VCR testing is enabled. This enables us to use a single HTTP transport
// for a given test, allowing for recording of HTTP interactions.
// Why this exists: schema.Provider.ConfigureFunc is called multiple times for a given test
// ConfigureFunc on our provider creates a new HTTP client and sets base paths (config.go LoadAndValidate)
// VCR requires a single HTTP client to handle all interactions so it can record and replay responses so
// this caches HTTP clients per test by replacing ConfigureFunc
func getCachedConfig(ctx context.Context, d *schema.ResourceData, configureFunc schema.ConfigureContextFunc, testName string) (*transport_tpg.Config, diag.Diagnostics) {

It doesn't look like there's similar caching for the PF provider. Depending on what solution we go with we may need to look into that.

@SarahFrench
Copy link
Member

Update: I've been working on fixing issues with the muxing in the Google provider(s) and those changes fix the issue with VCR tests. I won't close this issue until those changes are merged, but this issue is 'solved' in that I know why it happens and I have an early version of a fix implemented

SarahFrench added a commit to SarahFrench/magic-modules that referenced this issue Oct 3, 2024
…ovider-google#14158

- Use of older TPG versions through ExternalProviders breaks VCR, so that is also removed from TestAccDataSourceGoogleFirebaseAppleAppConfig
SarahFrench added a commit to SarahFrench/magic-modules that referenced this issue Oct 15, 2024
…ovider-google#14158

- Use of older TPG versions through ExternalProviders breaks VCR, so that is also removed from TestAccDataSourceGoogleFirebaseAppleAppConfig
SarahFrench added a commit to SarahFrench/magic-modules that referenced this issue Oct 15, 2024
…ovider-google#14158

- Use of older TPG versions through ExternalProviders breaks VCR, so that is also removed from TestAccDataSourceGoogleFirebaseAppleAppConfig
@melinath melinath added the test-failure-100 100% fail rate label Oct 22, 2024
SarahFrench added a commit to SarahFrench/magic-modules that referenced this issue Oct 25, 2024
…ovider-google#14158

- Use of older TPG versions through ExternalProviders breaks VCR, so that is also removed from TestAccDataSourceGoogleFirebaseAppleAppConfig
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment