Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions packages/sdk/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,32 @@
# Changelog

## [0.10.2]

📦 **NPM:** https://www.npmjs.com/package/@qvac/sdk/v/0.10.2

This is a hotfix release that restores delegated-inference connection performance to the level it was at in v0.9.0. No API or model changes — drop-in replacement for v0.10.1.

## Bug Fixes

### Delegated connect no longer waits for full DHT bootstrap

Consumers using `loadModel({ delegate: true, ... })` against a remote provider were spending ~2.5–3s longer per connection in v0.10.0/0.10.1 than in v0.9.0. Profiler traces from the Workbench team showed `loadModel.delegation.connection` regressing from ~2.5s (v0.9.0) to ~8.3s (v0.10.0) on the same machine and network.

The cause was a serial `await swarm.dht.fullyBootstrapped()` call that was added to the consumer's connect path in the v0.10.0 redesign. `dht.fullyBootstrapped()` only resolves once Hyperdht has populated its full routing table, which is a slow process from a cold swarm — and on a hot swarm `dht.connect(publicKey)` already drives the lookups it needs internally, so the explicit wait is redundant. Removing it lets the connection use the DHT in whatever state it's in at call time, exactly the way it did in v0.9.0.

The DHT routing table is still populated lazily as `dht.connect()` issues lookups, so cold-swarm correctness is unchanged — `PEER_NOT_FOUND` for non-existent peers and connection timeouts for unreachable ones both still fire on the same code path. The fallback-to-local behaviour for `loadModel` is unaffected; only the hot-path latency improves.

Local benchmarks (10 consumer↔provider runs each, against the published v0.10.1 baseline):

| Build | Mean connection time | p50 | p95 |
| ------------------ | -------------------- | ------ | ------ |
| v0.10.1 (baseline) | 3.82s | 3.71s | 4.94s |
| v0.10.2 (this fix) | 1.18s | 1.12s | 1.49s |

That's ~3.2× faster on average and brings cold delegated `loadModel` back below the v0.9.0 numbers.

If you were on v0.10.0 or v0.10.1 and had pinned around the regression (custom timeouts, retry shims, falling back to local inference earlier than necessary), you can drop those workarounds.

## [0.10.1]

📦 **NPM:** https://www.npmjs.com/package/@qvac/sdk/v/0.10.1
Expand Down
8 changes: 8 additions & 0 deletions packages/sdk/changelog/0.10.2/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Changelog v0.10.2

Release Date: 2026-05-07

## 🐞 Fixes

- Drop blocking dht.fullyBootstrapped() wait from delegated connect. (see PR [#1934](https://github.com/tetherto/qvac/pull/1934))

26 changes: 26 additions & 0 deletions packages/sdk/changelog/0.10.2/CHANGELOG_LLM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# QVAC SDK v0.10.2 Release Notes

📦 **NPM:** https://www.npmjs.com/package/@qvac/sdk/v/0.10.2

This is a hotfix release that restores delegated-inference connection performance to the level it was at in v0.9.0. No API or model changes — drop-in replacement for v0.10.1.

## Bug Fixes

### Delegated connect no longer waits for full DHT bootstrap

Consumers using `loadModel({ delegate: true, ... })` against a remote provider were spending ~2.5–3s longer per connection in v0.10.0/0.10.1 than in v0.9.0. Profiler traces from the Workbench team showed `loadModel.delegation.connection` regressing from ~2.5s (v0.9.0) to ~8.3s (v0.10.0) on the same machine and network.

The cause was a serial `await swarm.dht.fullyBootstrapped()` call that was added to the consumer's connect path in the v0.10.0 redesign. `dht.fullyBootstrapped()` only resolves once Hyperdht has populated its full routing table, which is a slow process from a cold swarm — and on a hot swarm `dht.connect(publicKey)` already drives the lookups it needs internally, so the explicit wait is redundant. Removing it lets the connection use the DHT in whatever state it's in at call time, exactly the way it did in v0.9.0.

The DHT routing table is still populated lazily as `dht.connect()` issues lookups, so cold-swarm correctness is unchanged — `PEER_NOT_FOUND` for non-existent peers and connection timeouts for unreachable ones both still fire on the same code path. The fallback-to-local behaviour for `loadModel` is unaffected; only the hot-path latency improves.

Local benchmarks (10 consumer↔provider runs each, against the published v0.10.1 baseline):

| Build | Mean connection time | p50 | p95 |
| ------------------ | -------------------- | ------ | ------ |
| v0.10.1 (baseline) | 3.82s | 3.71s | 4.94s |
| v0.10.2 (this fix) | 1.18s | 1.12s | 1.49s |

That's ~3.2× faster on average and brings cold delegated `loadModel` back below the v0.9.0 numbers.

If you were on v0.10.0 or v0.10.1 and had pinned around the regression (custom timeouts, retry shims, falling back to local inference earlier than necessary), you can drop those workarounds.
2 changes: 1 addition & 1 deletion packages/sdk/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@qvac/sdk",
"version": "0.10.1",
"version": "0.10.2",
"license": "Apache-2.0",
"repository": {
"type": "git",
Expand Down
20 changes: 10 additions & 10 deletions packages/sdk/server/bare/delegate-rpc-client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -183,16 +183,16 @@ async function ensureRPCConnection(
`🔗 Establishing direct DHT connection to peer: ${publicKey}${timeout ? `, timeout: ${timeout}ms` : ""}`,
);

// Ensure the DHT routing table is populated before attempting to look up
// the peer. On a cold swarm the table is nearly empty and findPeer can
// return PEER_NOT_FOUND immediately even when the provider is reachable.
const bootstrapStart = nowMs();
const swarm = getSwarm();
logger.info(`⏳ Waiting for DHT to fully bootstrap before connect...`);
await withTimeout(swarm.dht.fullyBootstrapped(), getRemainingTimeout());
logger.info(
`✅ DHT bootstrapped in ${(nowMs() - bootstrapStart).toFixed(0)}ms`,
);
// We deliberately do NOT `await swarm.dht.fullyBootstrapped()` here. The
// earlier guard (added with #1729 to side-step a theoretical PEER_NOT_FOUND
// on a fully-cold swarm) added a serial 1-3s wait on every first delegated
// call — measurably regressing `loadModel.delegation.connection` vs 0.9.0
// (≈3.2× slower in local benches). `getSwarm()` is invoked early during
// SDK init (registry/runtime), so by the time the consumer reaches this
// path the routing table is already warm enough; `dht.connect()` also
// bootstraps on demand if it isn't, so we lose nothing by skipping the
// explicit await.
getSwarm();

conn = openDhtConnection(publicKey);
await waitForOpen(conn, getRemainingTimeout());
Expand Down
Loading