Skip to content

PSS: Migration from a backend to a state store#38048

Merged
radeksimko merged 7 commits intomainfrom
radek/migrate-backend-to-pss
Feb 10, 2026
Merged

PSS: Migration from a backend to a state store#38048
radeksimko merged 7 commits intomainfrom
radek/migrate-backend-to-pss

Conversation

@radeksimko
Copy link
Member

@radeksimko radeksimko commented Jan 8, 2026

Notes

Removing data after migration

There are reasons both for and against removing the data that was migrated. If we don't do it, it creates a split-brain situation. If we remove it and the migration somehow failed without us explicitly knowing, we cause dataloss. I'm leaning towards keeping it.

Our different migration paths are being somewhat pragmatic and usually avoid deleting anything remotely but we do remove local files after migrating them to a remote backend because we generally consider locally stored state files as bad practice and a risk. Many (first time) users start without any backend and then introduce it later.

AFAICT we do not remove local files after migrating to a state store, which I think we should:

// Configuring a state_store for the first time.
func (m *Meta) stateStore_C_s(c *configs.StateStore, stateStoreHash int, backendSMgr *clistate.LocalState, opts *BackendOpts) (backend.Backend, tfdiags.Diagnostics) {
var diags tfdiags.Diagnostics
vt := arguments.ViewJSON
// Set default viewtype if none was set as the StateLocker needs to know exactly
// what viewType we want to have.
if opts == nil || opts.ViewType != vt {
vt = arguments.ViewHuman
}
// Grab a purely local backend to get the local state if it exists
localB, localBDiags := m.Backend(&BackendOpts{ForceLocal: true, Init: true})
if localBDiags.HasErrors() {
diags = diags.Append(localBDiags)
return nil, diags
}
workspaces, wDiags := localB.Workspaces()
if wDiags.HasErrors() {
diags = diags.Append(&errBackendLocalRead{wDiags.Err()})
return nil, diags
}
var localStates []statemgr.Full
for _, workspace := range workspaces {
localState, sDiags := localB.StateMgr(workspace)
if sDiags.HasErrors() {
diags = diags.Append(&errBackendLocalRead{sDiags.Err()})
return nil, diags
}
if err := localState.RefreshState(); err != nil {
diags = diags.Append(&errBackendLocalRead{err})
return nil, diags
}
// We only care about non-empty states.
if localS := localState.State(); !localS.Empty() {
log.Printf("[TRACE] Meta.Backend: will need to migrate workspace states because of existing %q workspace", workspace)
localStates = append(localStates, localState)
} else {
log.Printf("[TRACE] Meta.Backend: ignoring local %q workspace because its state is empty", workspace)
}
}
// Get the state store as an instance of backend.Backend
b, storeConfigVal, providerConfigVal, moreDiags := m.stateStoreInitFromConfig(c, opts.Locks)
diags = diags.Append(moreDiags)
if diags.HasErrors() {
return nil, diags
}
if len(localStates) > 0 {
// Migrate any local states into the new state store
err := m.backendMigrateState(&backendMigrateOpts{
SourceType: "local",
DestinationType: c.Type,
Source: localB,
Destination: b,
ViewType: vt,
})
if err != nil {
diags = diags.Append(err)
return nil, diags
}
// We remove the local state after migration to prevent confusion
// As we're migrating to a state store we don't have insight into whether it stores
// files locally at all, and whether those local files conflict with the location of
// the old local state.
log.Printf("[TRACE] Meta.Backend: removing old state snapshots from old backend")
for _, localState := range localStates {
// We always delete the local state, unless that was our new state too.
if err := localState.WriteState(nil); err != nil {
diags = diags.Append(&errBackendMigrateLocalDelete{err})
return nil, diags
}
if err := localState.PersistState(nil); err != nil {
diags = diags.Append(&errBackendMigrateLocalDelete{err})
return nil, diags
}
}
}
if m.stateLock {
view := views.NewStateLocker(vt, m.View)
stateLocker := clistate.NewLocker(m.stateLockTimeout, view)
if err := stateLocker.Lock(backendSMgr, "init is initializing state_store first time"); err != nil {
diags = diags.Append(fmt.Errorf("Error locking state: %s", err))
return nil, diags
}
defer stateLocker.Unlock()
}
// Store the state_store metadata in our saved state location
s := backendSMgr.State()
if s == nil {
s = workdir.NewBackendStateFile()
}
var pVersion *version.Version // This will remain nil for builtin providers or unmanaged providers.
if c.ProviderAddr.Hostname == addrs.BuiltInProviderHost {
diags = diags.Append(&hcl.Diagnostic{
Severity: hcl.DiagWarning,
Summary: "State storage is using a builtin provider",
Detail: "Terraform is using a builtin provider for initializing state storage. Terraform will be less able to detect when state migrations are required in future init commands.",
})
} else {
isReattached, err := reattach.IsProviderReattached(c.ProviderAddr, os.Getenv("TF_REATTACH_PROVIDERS"))
if err != nil {
diags = diags.Append(fmt.Errorf("Unable to determine if state storage provider is reattached while initializing state store for the first time. This is a bug in Terraform and should be reported: %w", err))
return nil, diags
}
if isReattached {
diags = diags.Append(&hcl.Diagnostic{
Severity: hcl.DiagWarning,
Summary: "State storage provider is not managed by Terraform",
Detail: "Terraform is using a provider supplied via TF_REATTACH_PROVIDERS for initializing state storage. Terraform will be less able to detect when state migrations are required in future init commands.",
})
} else {
// The provider is not built in and is being managed by Terraform
// This is the most common scenario, by far.
var vDiags tfdiags.Diagnostics
pVersion, vDiags = getStateStorageProviderVersion(c, opts.Locks)
diags = diags.Append(vDiags)
if vDiags.HasErrors() {
return nil, diags
}
}
}
s.StateStore = &workdir.StateStoreConfigState{
Type: c.Type,
Hash: uint64(stateStoreHash),
Provider: &workdir.ProviderConfigState{
Source: &c.ProviderAddr,
Version: pVersion,
},
}
s.StateStore.SetConfig(storeConfigVal, b.ConfigSchema())
// We need to briefly convert away from backend.Backend interface to use the method
// for accessing the provider schema. In this method we _always_ expect the concrete value
// to be backendPluggable.Pluggable.
plug := b.(*backendPluggable.Pluggable)
s.StateStore.Provider.SetConfig(providerConfigVal, plug.ProviderSchema())
// Verify that selected workspace exists in the state store.
if opts.Init && b != nil {
err := m.selectWorkspace(b)
if err != nil {
if errors.Is(err, &errBackendNoExistingWorkspaces{}) {
// If there are no workspaces, Terraform either needs to create the default workspace here
// or instruct the user to run a `terraform workspace new` command.
ws, err := m.Workspace()
if err != nil {
diags = diags.Append(fmt.Errorf("Failed to check current workspace: %w", err))
return nil, diags
}
if ws == backend.DefaultStateName {
// Users control if the default workspace is created through the -create-default-workspace flag (defaults to true)
if opts.CreateDefaultWorkspace {
diags = diags.Append(m.createDefaultWorkspace(c, b))
if !diags.HasErrors() {
// Report workspace creation to the view
view := views.NewInit(vt, m.View)
view.Output(views.DefaultWorkspaceCreatedMessage)
}
} else {
diags = diags.Append(&hcl.Diagnostic{
Severity: hcl.DiagWarning,
Summary: "The default workspace does not exist",
Detail: "Terraform has been configured to skip creation of the default workspace in the state store. To create it, either remove the `-create-default-workspace=false` flag and re-run the 'init' command, or create it using a 'workspace new' command",
})
}
} else {
// User needs to run a `terraform workspace new` command to create the missing custom workspace.
diags = append(diags, tfdiags.Sourceless(
tfdiags.Error,
fmt.Sprintf("Workspace %q has not been created yet", ws),
fmt.Sprintf("State store %q in provider %s (%q) reports that no workspaces currently exist. To create the custom workspace %q use the command `terraform workspace new %s`.",
c.Type,
c.Provider.Name,
c.ProviderAddr,
ws,
ws,
),
))
return nil, diags
}
} else {
// For all other errors, report via diagnostics
diags = diags.Append(fmt.Errorf("Failed to select a workspace: %w", err))
}
}
}
if diags.HasErrors() {
return nil, diags
}
// Update backend state file
if err := backendSMgr.WriteState(s); err != nil {
diags = diags.Append(errBackendWriteSavedDiag(err))
return nil, diags
}
if err := backendSMgr.PersistState(); err != nil {
diags = diags.Append(errBackendWriteSavedDiag(err))
return nil, diags
}
return b, diags
}

However, I think for the migration being implemented here in this PR (backend -> state store) we should keep it as is (i.e. NOT remove).

Multi-workspace migrations and transparency

I noticed that it can be difficult to tell what's actually happening during the migration purely from the UI messages. All that our messages recognise is that migration from X to Y is happening and was successful or not. I think ideally:

  1. we should print out the workspaces as they are migrated
  2. we should acknowledge in the UI when the destination already exists. This will be a common codepath for users migrating to state store versions of equivalent backends (e.g. s3 backend -> s3 state store).

Selecting/creating workspace after migration

I noticed in our local -> state store migration codepath we attempt to create the default workspace if it doesn't exist. It makes sense in that context as zero workspaces may often exist if the "migration" init is the very first init ever and no apply ran yet.

I cannot think of a good reason to do the same for the backend -> state store migration though. If the state is already stored in a backend, at least one successful init and apply must have run by then. If the default workspace is still missing, it may well be intention and so I don't think we should be intervening in any way there.

Target Release

1.15.x

Rollback Plan

  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

CHANGELOG entry

  • This change is user-facing and I added a changelog entry.
  • This change is not user-facing.

@radeksimko radeksimko force-pushed the radek/migrate-backend-to-pss branch 3 times, most recently from 9ce6880 to bdd2faa Compare January 16, 2026 16:02
@radeksimko radeksimko added the no-changelog-needed Add this to your PR if the change does not require a changelog entry label Jan 16, 2026
@radeksimko radeksimko force-pushed the radek/migrate-backend-to-pss branch 2 times, most recently from 4af2b3b to 40f3b3a Compare January 23, 2026 15:05
@radeksimko radeksimko force-pushed the radek/migrate-backend-to-pss branch 6 times, most recently from 124a22f to 3b7178b Compare February 2, 2026 13:49
@radeksimko radeksimko force-pushed the radek/migrate-backend-to-pss branch 2 times, most recently from 3fff10b to f0fbe0f Compare February 2, 2026 14:50
@radeksimko radeksimko marked this pull request as ready for review February 2, 2026 14:57
@radeksimko radeksimko requested review from a team as code owners February 2, 2026 14:57
@SarahFrench SarahFrench self-assigned this Feb 2, 2026
Copy link
Member

@SarahFrench SarahFrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few minor things from review.

Reading through your PR description:

I noticed that it can be difficult to tell what's actually happening during the migration purely from the UI messages. All that our messages recognise is that migration from X to Y is happening and was successful or not. I think ideally:

  • we should print out the workspaces as they are migrated
  • we should acknowledge in the UI when the destination already exists. This will be a common codepath for users migrating to state store versions of equivalent backends (e.g. s3 backend -> s3 state store).

Agreed! I think when we've discussed 'quality of life' changes like these in the past it's been in the context of wider improvements as part of a major release, but I guess these could be added in the context of PSS as that's new. However we'd be restricted as migrations use shared code across all types of migration, and we don't want to make breaking changes.

I cannot think of a good reason to do the same for the backend -> state store migration though. If the state is already stored in a backend, at least one successful init and apply must have run by then. If the default workspace is still missing, it may well be intention and so I don't think we should be intervening in any way there.

I mildly disagree in that a user could perform a first init with a backend, change the config to include a state_store and run a second init when they will be forced to choose between reconfigure/migrate. This can happen all while no state files exist, if default is the only workspace and no apply has happened. In that case the migration is pretty much equivalent to the user having the state_store present in config during the first init, which is when the default workspace would be created as things are currently implemented. So, I think the migration making a default workspace is reasonable in that case?

I agree that if the default workspace is missing and other workspaces exist in the backend before migration to a state store then after the migration the default workspace shouldn't be created. But I believe that's current behaviour?

@radeksimko
Copy link
Member Author

I mildly disagree in that a user could perform a first init with a backend, change the config to include a state_store and run a second init when they will be forced to choose between reconfigure/migrate. This can happen all while no state files exist, if default is the only workspace and no apply has happened. In that case the migration is pretty much equivalent to the user having the state_store present in config during the first init, which is when the default workspace would be created as things are currently implemented. So, I think the migration making a default workspace is reasonable in that case?

You're right. While (IMO) rare, there is a real scenario we need to account for.

I have copied the same logic here but think we should revisit it both here, at the source and maybe related logic in a few other places too.

The part I'm specifically unsure about is whether respecting -create-default-workspace should be conditional on default workspace being pre-selected before the migration or on anything else.

Even if we are fine with the behaviour, this detail is missing from the help text here, where we imply that the default workspace is always created:

-create-default-workspace [EXPERIMENTAL]
This flag must be used alongside the -enable-pluggable-state-storage-
experiment flag with experiments enabled. This flag's value defaults
to true, which allows the default workspace to be created if it does
not exist. Use -create-default-workspace=false to disable this behavior.

@radeksimko radeksimko enabled auto-merge (squash) February 3, 2026 18:02
@radeksimko radeksimko disabled auto-merge February 5, 2026 10:36
@SarahFrench
Copy link
Member

After discussion in a 1:1:

  • We discussed the importance of Terraform reporting reality instead of lying to users (as it does with backends, reporting the default workspace always existing)
  • Currently our code forces the reality to match the old lie that Terraform would tell (by creating a default workspace state file during init always)
  • In our discussion we found this handling of 'no workspaces' scenario in the remote backend.
  • We decided to similarly swallow the 'no workspaces' error in state_store related methods. No biggie if workspace list returns an empty list after this; it's reflecting reality accurately and that's what we want!

So for this PR the block handling workspaces can be removed, and in a following PR we'd remove -create-default-workspace etc.

@radeksimko radeksimko force-pushed the radek/migrate-backend-to-pss branch from 8e1d04a to fd6f40d Compare February 6, 2026 15:20
@radeksimko radeksimko force-pushed the radek/migrate-backend-to-pss branch from d323a3d to c47380c Compare February 6, 2026 17:12
@radeksimko radeksimko enabled auto-merge (squash) February 6, 2026 17:13
Comment on lines +5584 to +5586
args := []string{
"-enable-pluggable-state-storage-experiment=true",
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
args := []string{
"-enable-pluggable-state-storage-experiment=true",
}
args := []string{}

@radeksimko radeksimko merged commit 27770ee into main Feb 10, 2026
7 checks passed
@radeksimko radeksimko deleted the radek/migrate-backend-to-pss branch February 10, 2026 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-changelog-needed Add this to your PR if the change does not require a changelog entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants