Conversation
Improve plcli error handling
bump golang to v1.25
bump license year to 2026
9d28adf to
e449d8a
Compare
There was a problem hiding this comment.
Pull request overview
This PR implements a PLC directory replica service that synchronizes operation data from an upstream PLC directory server. It builds on PR #22's OpStore interface refactoring.
Changes:
- Adds a complete replica service with HTTP server, database persistence (SQLite/PostgreSQL), and ingestion from upstream directory
- Implements validation and commit workers with batching and concurrency control
- Adds database-backed OpStore implementation using GORM
- Moves didplc package code organization (log.go, client.go, diddoc.go separated from main files)
- Adds OpenTelemetry metrics and PostgreSQL docker-compose support
Reviewed changes
Copilot reviewed 24 out of 28 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| replica/validate.go | Validation and commit worker implementations with batching |
| replica/validate_test.go | Comprehensive unit tests for validation logic |
| replica/ingest.go | Ingestion state machine with WebSocket streaming and HTTP fallback |
| replica/inflight.go | Thread-safe in-flight operation tracking with resume cursor management |
| replica/inflight_test.go | Unit tests for inflight tracking logic |
| replica/database.go | GORM-based OpStore implementation with SQLite and PostgreSQL support |
| replica/server.go | HTTP server handlers for DID resolution and operation logs |
| replica/server_test.go | Integration tests for HTTP endpoints |
| replica/metrics.go | OpenTelemetry metrics definitions |
| cmd/replica/main.go | Main entry point with CLI configuration |
| cmd/replica/otel.go | OpenTelemetry setup (traces and metrics) |
| didplc/opstore.go | OpStore interface and in-memory implementation (from PR #22) |
| didplc/operation.go | Added database serialization (Value/Scan) and WrapOperation helper |
| didplc/log.go | Moved log verification from root package |
| didplc/diddoc.go | Extracted DID document types |
| didplc/client.go | HTTP client for PLC directory API |
| didplc/manual_test.go | Manual signature verification tests |
| didplc/operation_test.go | Updated test data paths (testdata/ → ../testdata/) |
| didplc/operation_export_test.go | Updated test data paths |
| pg/ | PostgreSQL docker-compose and helper scripts |
| go.mod/go.sum | Added dependencies for GORM, WebSockets, OpenTelemetry, etc. |
| Makefile | Updated build and test targets |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| case <-ctx.Done(): | ||
| commitBatch() | ||
| return |
There was a problem hiding this comment.
When the context is cancelled (line 105), commitBatch() is called to flush remaining operations before returning. However, the commitBatch function uses the same cancelled context for database operations (line 78). This may cause the final batch commit to fail due to context cancellation, potentially losing validated operations. Consider using a background context with a timeout for the final flush on shutdown, or documenting that operations in the final batch may be lost on cancellation.
| func writeJSONError(w http.ResponseWriter, message string, status int) { | ||
| w.Header().Set("Content-Type", "application/json") | ||
| w.WriteHeader(status) | ||
| json.NewEncoder(w).Encode(map[string]string{"message": message}) |
There was a problem hiding this comment.
The json.NewEncoder(w).Encode call on line 64 can fail, but the error is silently ignored. While this is in an error handling path where the original error message will be logged elsewhere, consider logging if the JSON encoding of the error response fails to aid in debugging issues with error responses.
| json.NewEncoder(w).Encode(map[string]string{"message": message}) | |
| if err := json.NewEncoder(w).Encode(map[string]string{"message": message}); err != nil { | |
| slog.Default().Error("failed to encode JSON error response", "status", status, "message", message, "error", err) | |
| } |
bnewbold
left a comment
There was a problem hiding this comment.
This is a partial review; i'll try to wrap up the rest soon.
I could go either way on moving the core library stuff into a sub-directory. The name is good, and I guess sooner is better if we are going to to it.
| Usage: "PLC directory replica server", | ||
| Flags: []cli.Flag{ | ||
| &cli.StringFlag{ | ||
| Name: "postgres-url", |
There was a problem hiding this comment.
There is a convention to use db-url / DATABASE_URL, and determine what kind of database to open based on the prefix (eg, sqlite://replica.sqlite)
| }, | ||
| &cli.Int64Flag{ | ||
| Name: "cursor-override", | ||
| Usage: "Starting cursor (sequence number) for ingestion", |
There was a problem hiding this comment.
this says "starting" which implies it only works when first creating the replica? should be more explicit either way.
if you ever need to change the upstream URL, you'd probably need to change the cursor as well; we did that recently with the relay rollout.
|
|
||
| ### DID Document Format Differences | ||
|
|
||
| The reference implementation returns DID documents in `application/did+ld+json` format, whereas this replica returns them in `application/did+json` format. Both are described in the [DID specification](https://www.w3.org/TR/did-1.0/), but in practical terms the difference is that the `@context` field is missing. |
| OpData didplc.OpEnum `gorm:"column:op_data;not null"` | ||
| } | ||
|
|
||
| // Note: couldn't call the type Operation because that'd get confusing with didplc.Operation |
There was a problem hiding this comment.
Could call it OperationRow (and also HeadRow, CursorRow)?
|
|
||
| const batchSize = 1000 | ||
|
|
||
| type ValidatedOp struct { |
There was a problem hiding this comment.
as an observation, it feels like we are dancing around adding sequencing to the core PLC semantics. I think that is probably the correct move for now: PLC and the library code should work without explicit sequence numbers for individual operations; sequencing is an abstraction on top.
CI: detect go-version from go.mod file
eeab3d4 to
1422224
Compare
|
Continued at #28 which is manually rebased onto the latest OpStore changes |
Stacks on top of #22 (which contains the changes to to the core didplc library)
In this PR I tweaked the directory structure of the repo, moving the
didplclibrary into a subdirectory, with areplicamodule alongside it.Design
There are three main components to the implementation:
DBOpStore, which implements theOpStoreinterface (introduced in Refactor, implement OpStore interface #22), via gorm, backed by either postgres or sqlite.Ingestor, which slurps operations from an upstream PLC directory (via either/exportor/export/stream, switching automatically), validates them, and inserts them into theOpStore.Server, which implements the external HTTP APIs for resolving DIDs (which it reads out of theOpStore)Ingestoris a single-process design but scales via concurrency/threading (configurable via--num-workers). It can process thousands of ops per second, which is more than enough for the foreseeable future - scaling beyond that might involve sharding based on DID strings.In theory,
Servercan be scaled horizontally (across multiple processes/nodes), for as long as the underlying db can keep up. Although for simplicity, the current entrypoint incmd/replica/main.gospawns a oneIngestorand oneServer, in a single process.