Skip to content

PLC Replica#24

Closed
DavidBuchanan314 wants to merge 30 commits intoopstore-refactorfrom
DavidBuchanan314/plc-replica
Closed

PLC Replica#24
DavidBuchanan314 wants to merge 30 commits intoopstore-refactorfrom
DavidBuchanan314/plc-replica

Conversation

@DavidBuchanan314
Copy link
Copy Markdown
Collaborator

@DavidBuchanan314 DavidBuchanan314 commented Jan 30, 2026

Stacks on top of #22 (which contains the changes to to the core didplc library)

In this PR I tweaked the directory structure of the repo, moving the didplc library into a subdirectory, with a replica module alongside it.

Design

There are three main components to the implementation:

  • DBOpStore, which implements the OpStore interface (introduced in Refactor, implement OpStore interface #22), via gorm, backed by either postgres or sqlite.
  • Ingestor, which slurps operations from an upstream PLC directory (via either /export or /export/stream, switching automatically), validates them, and inserts them into the OpStore.
  • Server, which implements the external HTTP APIs for resolving DIDs (which it reads out of the OpStore)

Ingestor is a single-process design but scales via concurrency/threading (configurable via --num-workers). It can process thousands of ops per second, which is more than enough for the foreseeable future - scaling beyond that might involve sharding based on DID strings.

In theory, Server can be scaled horizontally (across multiple processes/nodes), for as long as the underlying db can keep up. Although for simplicity, the current entrypoint in cmd/replica/main.go spawns a one Ingestor and one Server, in a single process.

@DavidBuchanan314 DavidBuchanan314 force-pushed the DavidBuchanan314/plc-replica branch from 9d28adf to e449d8a Compare January 30, 2026 11:57
Comment thread didplc/operation.go Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a PLC directory replica service that synchronizes operation data from an upstream PLC directory server. It builds on PR #22's OpStore interface refactoring.

Changes:

  • Adds a complete replica service with HTTP server, database persistence (SQLite/PostgreSQL), and ingestion from upstream directory
  • Implements validation and commit workers with batching and concurrency control
  • Adds database-backed OpStore implementation using GORM
  • Moves didplc package code organization (log.go, client.go, diddoc.go separated from main files)
  • Adds OpenTelemetry metrics and PostgreSQL docker-compose support

Reviewed changes

Copilot reviewed 24 out of 28 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
replica/validate.go Validation and commit worker implementations with batching
replica/validate_test.go Comprehensive unit tests for validation logic
replica/ingest.go Ingestion state machine with WebSocket streaming and HTTP fallback
replica/inflight.go Thread-safe in-flight operation tracking with resume cursor management
replica/inflight_test.go Unit tests for inflight tracking logic
replica/database.go GORM-based OpStore implementation with SQLite and PostgreSQL support
replica/server.go HTTP server handlers for DID resolution and operation logs
replica/server_test.go Integration tests for HTTP endpoints
replica/metrics.go OpenTelemetry metrics definitions
cmd/replica/main.go Main entry point with CLI configuration
cmd/replica/otel.go OpenTelemetry setup (traces and metrics)
didplc/opstore.go OpStore interface and in-memory implementation (from PR #22)
didplc/operation.go Added database serialization (Value/Scan) and WrapOperation helper
didplc/log.go Moved log verification from root package
didplc/diddoc.go Extracted DID document types
didplc/client.go HTTP client for PLC directory API
didplc/manual_test.go Manual signature verification tests
didplc/operation_test.go Updated test data paths (testdata/ → ../testdata/)
didplc/operation_export_test.go Updated test data paths
pg/ PostgreSQL docker-compose and helper scripts
go.mod/go.sum Added dependencies for GORM, WebSockets, OpenTelemetry, etc.
Makefile Updated build and test targets

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/replica/main.go Outdated
Comment thread replica/database.go
Comment thread replica/ingest.go Outdated
Comment thread replica/validate.go
Comment on lines +105 to +107
case <-ctx.Done():
commitBatch()
return
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the context is cancelled (line 105), commitBatch() is called to flush remaining operations before returning. However, the commitBatch function uses the same cancelled context for database operations (line 78). This may cause the final batch commit to fail due to context cancellation, potentially losing validated operations. Consider using a background context with a timeout for the final flush on shutdown, or documenting that operations in the final batch may be lost on cancellation.

Copilot uses AI. Check for mistakes.
Comment thread replica/server.go
Comment thread replica/server.go
Comment thread replica/server.go
Comment thread replica/server.go
func writeJSONError(w http.ResponseWriter, message string, status int) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(status)
json.NewEncoder(w).Encode(map[string]string{"message": message})
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The json.NewEncoder(w).Encode call on line 64 can fail, but the error is silently ignored. While this is in an error handling path where the original error message will be logged elsewhere, consider logging if the JSON encoding of the error response fails to aid in debugging issues with error responses.

Suggested change
json.NewEncoder(w).Encode(map[string]string{"message": message})
if err := json.NewEncoder(w).Encode(map[string]string{"message": message}); err != nil {
slog.Default().Error("failed to encode JSON error response", "status", status, "message", message, "error", err)
}

Copilot uses AI. Check for mistakes.
Comment thread replica/database.go
Comment thread replica/server.go
Comment thread replica/database.go
@DavidBuchanan314 DavidBuchanan314 marked this pull request as ready for review January 30, 2026 18:23
Comment thread replica/validate.go
@bnewbold bnewbold changed the base branch from main to opstore-refactor February 3, 2026 01:40
Copy link
Copy Markdown
Member

@bnewbold bnewbold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a partial review; i'll try to wrap up the rest soon.

I could go either way on moving the core library stuff into a sub-directory. The name is good, and I guess sooner is better if we are going to to it.

Comment thread cmd/replica/README.md
Comment thread cmd/replica/main.go
Usage: "PLC directory replica server",
Flags: []cli.Flag{
&cli.StringFlag{
Name: "postgres-url",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a convention to use db-url / DATABASE_URL, and determine what kind of database to open based on the prefix (eg, sqlite://replica.sqlite)

Comment thread cmd/replica/main.go Outdated
Comment thread cmd/replica/main.go
},
&cli.Int64Flag{
Name: "cursor-override",
Usage: "Starting cursor (sequence number) for ingestion",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this says "starting" which implies it only works when first creating the replica? should be more explicit either way.

if you ever need to change the upstream URL, you'd probably need to change the cursor as well; we did that recently with the relay rollout.

Comment thread cmd/replica/README.md

### DID Document Format Differences

The reference implementation returns DID documents in `application/did+ld+json` format, whereas this replica returns them in `application/did+json` format. Both are described in the [DID specification](https://www.w3.org/TR/did-1.0/), but in practical terms the difference is that the `@context` field is missing.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread replica/database.go
OpData didplc.OpEnum `gorm:"column:op_data;not null"`
}

// Note: couldn't call the type Operation because that'd get confusing with didplc.Operation
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could call it OperationRow (and also HeadRow, CursorRow)?

Comment thread replica/database.go Outdated
Comment thread replica/metrics.go Outdated
Comment thread cmd/replica/main.go Outdated
Comment thread replica/validate.go

const batchSize = 1000

type ValidatedOp struct {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an observation, it feels like we are dancing around adding sequencing to the core PLC semantics. I think that is probably the correct move for now: PLC and the library code should work without explicit sequence numbers for individual operations; sequencing is an abstraction on top.

@DavidBuchanan314 DavidBuchanan314 force-pushed the DavidBuchanan314/plc-replica branch from eeab3d4 to 1422224 Compare February 5, 2026 14:40
@DavidBuchanan314
Copy link
Copy Markdown
Collaborator Author

Continued at #28 which is manually rebased onto the latest OpStore changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants