Conversation
bnewbold
left a comment
There was a problem hiding this comment.
From your TODO list (in the PR description): I think getting the DATABASE_URL stuff sorted it worth it from the start (real pain to change deployment of that kind of thing). I think ETag we should hold off and discuss the semantics (there are a couple ways we could do it). Last-Modified would be straight-forward but not a "must".
Overall the factoring changes seem good. You hit many of my earlier review notes.
We'll need a Dockerfile and add CI, but can do that as a follow-up.
The postgresql test framework helpers got dropped? I don't think those are a "must" for this PR, but they were nice to have and good to be thinking that through.
Copying over one of my earlier review notes:
as an observation, it feels like we are dancing around adding sequencing to the core PLC semantics. I think that is probably the correct move for now: PLC and the library code should work without explicit sequence numbers for individual operations; sequencing is an abstraction on top.
| }, | ||
| &cli.Int64Flag{ | ||
| Name: "cursor-override", | ||
| Usage: "Starting cursor (sequence number) for ingestion", |
There was a problem hiding this comment.
this says "starting" which implies it only works when first creating the replica? should be more explicit in this usage string about the behavior.
if you ever need to change the upstream URL, you'd probably need to change the cursor as well; we did that recently with the relay rollout and had confusion.
There was a problem hiding this comment.
The initial motivation for this to exist was so I could "skip ahead" and test the backfill/livetail cutover behaviour, so it can be used at any point in time, but it should work for the changing-upstream scenario too.
| return (*didplc.OpEnum)(o).AsOperation() | ||
| } | ||
|
|
||
| func (o storedOp) Value() (driver.Value, error) { |
There was a problem hiding this comment.
I don't super love using the database/sql/driver interface for doing JSON database serialization, though that might just be me. I think this is internal enough that it isn't a big deal.
There was a problem hiding this comment.
I did the JSON serialisation "manually" to begin with, but it ended up being pretty verbose with all the error handling, and I thought using the database/sql/driver tidied things up. I don't have particularly strong opinions but I'm inclined to leave it as-is
| if !q.Has("synchronous_commit") { | ||
| // Since we're a replica, if we lose data we can just re-fetch it from the origin. | ||
| q.Set("synchronous_commit", "off") |
There was a problem hiding this comment.
I think it is worth flagging this and the SkipDefaultTransaction above in the README. And maybe making it configurable, eg with a CLI --unsafe-fast-db flag?
There was a problem hiding this comment.
having reviewed and seen that transactions are added in the core places, I am less concerned about SkipDefaultTransaction; though I also suspect that flag might be a no-op because using a transaction would negate it?
I still feel like disabling synchronous_commit is cowboy. folks will definitely be running this on, eg, raspberry pi with sketchy disk/power, and I think having postgresql not get in a weird state by default is safer.
There was a problem hiding this comment.
Oops, SkipDefaultTransaction is a leftover from early experiments, it is indeed a no-op now and can be removed.
My understanding of synchronous_commit=off is that while you may lose recently committed data on power loss, the db should still be in non-broken state.
There was a problem hiding this comment.
(added a note to the readme about synchronous_commit)
|
|
||
| var meter = otel.Meter("github.com/did-method-plc/go-didplc/replica") | ||
|
|
||
| var ( |
There was a problem hiding this comment.
a counter for overall ops would be helpful; broken down by success vs error.
prometheus can then use that for "ops per second".
| mux.HandleFunc("GET /{did}", s.handleDIDDoc) | ||
| mux.HandleFunc("GET /{$}", s.handleIndex) | ||
|
|
||
| handler := otelhttp.NewHandler(mux, "") |
There was a problem hiding this comment.
(if I was less lazy i'd check): does this aggregate calls to handlers, or by path? if by path, or if "by DID" in any way, the cardinality of metrics will explode
There was a problem hiding this comment.
iiuc it does the right thing as of open-telemetry/opentelemetry-go-contrib#6905
|
I still have kind of mixed feelings about moving the library code to |
5f7d2d9 to
f9930d5
Compare
f9930d5 to
d0c5251
Compare
| export PGDATABASE=postgres | ||
| export DATABASE_URL="postgresql://pg:password@localhost:5433/postgres" | ||
| sleep 2 | ||
| until pg_isready -q; do sleep 0.1; done |
There was a problem hiding this comment.
Note, probably want to copy over this fix into the equivalent helper in the ts codebase
| .PHONY: test-race | ||
| test-race: ## Run tests with race detector | ||
| go test -v -short -race ./... | ||
| ./extra/pg/with-test-db.sh go test -v -short -race -run TestGormOpStore ./replica/... |
There was a problem hiding this comment.
the db tests are run under both pg and sqlite, and also with the race detector
Continuation of #24, manually rebased on top of #22 after merge
TODO:
ETag+ Last-Modified headers in responses (does not solve read-after-write issues on its own but will probably still be useful) (Just done Last-Modified for now)writeJSONError