Skip to content

PLC Replica (Take 2)#28

Merged
DavidBuchanan314 merged 30 commits intomainfrom
DavidBuchanan314/plc-replica-2
Feb 13, 2026
Merged

PLC Replica (Take 2)#28
DavidBuchanan314 merged 30 commits intomainfrom
DavidBuchanan314/plc-replica-2

Conversation

@DavidBuchanan314
Copy link
Copy Markdown
Collaborator

@DavidBuchanan314 DavidBuchanan314 commented Feb 6, 2026

Continuation of #24, manually rebased on top of #22 after merge

TODO:

  • Standardise db url param PLC Replica #24 (comment)
  • Set ETag + Last-Modified headers in responses (does not solve read-after-write issues on its own but will probably still be useful) (Just done Last-Modified for now)
  • Log JSON marshalling errors inside writeJSONError
  • Add some more tests for relevant edge cases (operation replays, multiple ops for the same DID, operation re-ordering)
  • Don't do synchronous_commit off

@DavidBuchanan314 DavidBuchanan314 mentioned this pull request Feb 6, 2026
Copy link
Copy Markdown
Member

@bnewbold bnewbold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From your TODO list (in the PR description): I think getting the DATABASE_URL stuff sorted it worth it from the start (real pain to change deployment of that kind of thing). I think ETag we should hold off and discuss the semantics (there are a couple ways we could do it). Last-Modified would be straight-forward but not a "must".

Overall the factoring changes seem good. You hit many of my earlier review notes.

We'll need a Dockerfile and add CI, but can do that as a follow-up.

The postgresql test framework helpers got dropped? I don't think those are a "must" for this PR, but they were nice to have and good to be thinking that through.

Copying over one of my earlier review notes:

as an observation, it feels like we are dancing around adding sequencing to the core PLC semantics. I think that is probably the correct move for now: PLC and the library code should work without explicit sequence numbers for individual operations; sequencing is an abstraction on top.

Comment thread cmd/replica/main.go Outdated
},
&cli.Int64Flag{
Name: "cursor-override",
Usage: "Starting cursor (sequence number) for ingestion",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this says "starting" which implies it only works when first creating the replica? should be more explicit in this usage string about the behavior.

if you ever need to change the upstream URL, you'd probably need to change the cursor as well; we did that recently with the relay rollout and had confusion.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial motivation for this to exist was so I could "skip ahead" and test the backfill/livetail cutover behaviour, so it can be used at any point in time, but it should work for the changing-upstream scenario too.

Comment thread replica/database.go
return (*didplc.OpEnum)(o).AsOperation()
}

func (o storedOp) Value() (driver.Value, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't super love using the database/sql/driver interface for doing JSON database serialization, though that might just be me. I think this is internal enough that it isn't a big deal.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the JSON serialisation "manually" to begin with, but it ended up being pretty verbose with all the error handling, and I thought using the database/sql/driver tidied things up. I don't have particularly strong opinions but I'm inclined to leave it as-is

Comment thread replica/database.go
Comment thread replica/database.go
Comment thread replica/database.go Outdated
Comment on lines +148 to +150
if !q.Has("synchronous_commit") {
// Since we're a replica, if we lose data we can just re-fetch it from the origin.
q.Set("synchronous_commit", "off")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth flagging this and the SkipDefaultTransaction above in the README. And maybe making it configurable, eg with a CLI --unsafe-fast-db flag?

Copy link
Copy Markdown
Member

@bnewbold bnewbold Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having reviewed and seen that transactions are added in the core places, I am less concerned about SkipDefaultTransaction; though I also suspect that flag might be a no-op because using a transaction would negate it?

I still feel like disabling synchronous_commit is cowboy. folks will definitely be running this on, eg, raspberry pi with sketchy disk/power, and I think having postgresql not get in a weird state by default is safer.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, SkipDefaultTransaction is a leftover from early experiments, it is indeed a no-op now and can be removed.

My understanding of synchronous_commit=off is that while you may lose recently committed data on power loss, the db should still be in non-broken state.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(added a note to the readme about synchronous_commit)

Comment thread replica/inflight.go
Comment thread replica/inflight.go Outdated
Comment thread replica/metrics.go

var meter = otel.Meter("github.com/did-method-plc/go-didplc/replica")

var (
Copy link
Copy Markdown
Member

@bnewbold bnewbold Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a counter for overall ops would be helpful; broken down by success vs error.

prometheus can then use that for "ops per second".

Comment thread replica/server.go Outdated
Comment thread replica/server.go
mux.HandleFunc("GET /{did}", s.handleDIDDoc)
mux.HandleFunc("GET /{$}", s.handleIndex)

handler := otelhttp.NewHandler(mux, "")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(if I was less lazy i'd check): does this aggregate calls to handlers, or by path? if by path, or if "by DID" in any way, the cardinality of metrics will explode

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc it does the right thing as of open-telemetry/opentelemetry-go-contrib#6905

@bnewbold
Copy link
Copy Markdown
Member

I still have kind of mixed feelings about moving the library code to didplc/. but I guess if we are going to make that change, now is the time; and with multiple services in here we probably should do it.

@DavidBuchanan314 DavidBuchanan314 force-pushed the DavidBuchanan314/plc-replica-2 branch from 5f7d2d9 to f9930d5 Compare February 11, 2026 17:36
@DavidBuchanan314 DavidBuchanan314 force-pushed the DavidBuchanan314/plc-replica-2 branch from f9930d5 to d0c5251 Compare February 11, 2026 17:55
@DavidBuchanan314 DavidBuchanan314 marked this pull request as ready for review February 12, 2026 21:27
Comment thread extra/pg/with-test-db.sh
export PGDATABASE=postgres
export DATABASE_URL="postgresql://pg:password@localhost:5433/postgres"
sleep 2
until pg_isready -q; do sleep 0.1; done
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, probably want to copy over this fix into the equivalent helper in the ts codebase

Comment thread Makefile
.PHONY: test-race
test-race: ## Run tests with race detector
go test -v -short -race ./...
./extra/pg/with-test-db.sh go test -v -short -race -run TestGormOpStore ./replica/...
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the db tests are run under both pg and sqlite, and also with the race detector

@DavidBuchanan314 DavidBuchanan314 merged commit d8a482d into main Feb 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants