Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add sessions to trustless gateways #459

Merged
merged 17 commits into from
Apr 4, 2024

Conversation

achingbrain
Copy link
Member

Implements blockstore sessions for trustless gateways.

  • Queries the Helia routing for block providers
  • Creates a set of trustless gateways from routing results
  • Uses only these gateways to fetch session blocks

achingbrain and others added 14 commits February 6, 2024 08:58
There are no implementations yet but the usage pattern will be something
like:

```javascript
// unixfs cat command
export async function * cat (cid: CID, blockstore: Blocks, options: Partial<CatOptions> = {}): AsyncIterable<Uint8Array> {
  // create a session for the CID if support is available
  const blocks = await (blockstore.createSession != null ? blockstore.createSession(cid) : blockstore)
  const opts: CatOptions = mergeOptions(defaultOptions, options)

  // resolve and export using the session, if created, otherwise fall back to regular blockstore access
  const resolved = await resolve(cid, opts.path, blocks, opts)
  const result = await exporter(resolved.cid, blocks, opts)

  if (result.type !== 'file' && result.type !== 'raw') {
    throw new NotAFileError()
  }

  if (result.content == null) {
    throw new NoContentError()
  }

  yield * result.content(opts)
}
```
Implements blockstore sessions for trustless gateways.

- Queries the Helia routing for block providers
- Creates a set of trustless gateways from routing results
- Uses only these gateways to fetch session blocks
@achingbrain achingbrain requested a review from a team as a code owner March 1, 2024 14:58
Comment on lines 105 to 107
for await (const provider of this.routing.findProviders(root, options)) {
if (provider.protocols == null || !provider.protocols.includes('transport-ipfs-gateway-http')) {
continue
Copy link
Member

@lidel lidel Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 I worry that as things are today, filtering by transport-ipfs-gateway-http will be harmful for decentralization in the long term.

AFAIK it is something invented by IPNI at cid.contact for Rhea/Saturn last year to allow nft.storage to avoid bitswap bills. It only works with IPNI PUTs and was not designed with p2p in mind, Amino DHT peers who in the future will announce /https multiaddrs with trustless gateway as alternative to bitswap will not be discoverable this way.

@achingbrain It may be more future-proof to look at all results, and keep ones that have Addrs with multiaddr that has /https or /tls/http.
This will be compatible with https://github.com/libp2p/specs/blob/master/http/transport-component.md and will also filter-out useless gateways without TLS set up (can't use them in browser due to mixed-content error).

If we look at Addrs instead, when IPNI+DHT proxy at https://delegated-ipfs.dev/routing/v1/providers/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi starts returning DHT results for Kubo nodes with Schema: peer and Addrs that have /https multiaddr, it will "just work", and helia will be able to act on HTTP transport with these peers without having to change anything.

Copy link
Member Author

@achingbrain achingbrain Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite sad-face-making. I hadn't realised that transport-ipfs-gateway-http was added to the spec due to a commercial concern for nft.storage rather than being something you can rely on.

when IPNI+DHT proxy starts returning DHT results with /https multiaddr, it will "just work"

As I read the above, the HTTP transport being added to Kubo is what will cause HTTP multiaddrs to appear. Unless I'm missing something it doesn't mean the peer is running a path or subdomain gateway.

That is, the remote could have a HTTP transport, but also have the gateway config disabled, just the presence of the transport isn't enough to say one way or the other.

It seems like the only way is to make a request and see what you get back? This could be a recipe for DDoSing if you created nodes that respond to provider queries for certain CIDs?

I think a possible solution here might be for the peer to indicate it's capabilities in it's signed peer record? The HTTP routers could return the peer record in the GET /providers response?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean FWIW, if it is the case that it was added exclusively for nft.storage then I wasn't aware of that.

As far as I knew the trustless gateway was already a concept, and we implemented it for Rhea when we were asked. I think there was probably a cost saving for us there due to where we were hosting our bitswap peer at the time, but that's not the case now, and shouldn't have been a convincing argument then since infra costs and locations can easily be changed.

Copy link
Member

@lidel lidel Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alanshaw HTTP is good! Trustless gateways as HTTP transport are good, and .storage implementing and exposing it as alternative to bitswap is great for ecosystem, especially our browser work (ipfs-chromium, service worker, verified-fetch, Brave etc). We do want to keep it! ❤️

The only problem is that IPNI implementation informed how HTTP-only provider is represented on /routing/v1, and unfortunately, DHT peers were not included in the design.

The transport-ipfs-gateway-http codec was registered, afaik, only because IPNI already used codes for bitswap and graphsync and needed another one internally (inferring that from discussion at multiformats/multicodec#321 which I missed an arrived after it was merged already :(). It was a provisional solution, so there was no spec, no design doc how it is supposed to work outside IPNI. So we ended up with something that works for cid.contact and ignores the rest of public IPFS swarm.

Since then, I've been trying to find a way for /routing/v1 to represent Amino DHT peers that provide trustless gateway endpoint, ensuring these can be found (once we have them), not just IPNI ones.


@achingbrain yes, looking for /tls/http or /https and sending a probe and marking endpoint as gateway or not is the only reliable way I see right now.
A basic probe could be GET for small, inlined CID: send GET /ipfs/bafkqablimvwgy3y?format=raw to HTTP port and expect hello. Fast and reliable, works with legacy and future gateways too.

FYSA there are proposals of more declarative ways of discovering if HTTP port supports IPFS Gateway functionality, but they are no better than the above HTTP probe.
Things I am aware of:

  • libp2p/specs#508 proposes probing for manifest at .well-known/libp2p, but this is not implemented anywhere and in practice, no better test than GET /ipfs/bafkqablimvwgy3y?format=raw.
  • ipfs/specs#425 proposes returning headers on OPTIONS response, but it won't work on older gateways, and no better than GET /ipfs/bafkqablimvwgy3y?format=raw
  • There were discussion about publishing protocol list from libp2p identify with the peer record o Amino DHT, and then exposing this on /routing/v1 responses (but this requires network upgrade and time).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that if you send me a request, and I send you a response knowing it will trigger you to send a request to a specific third party, that is an attack vector.

If I send you a single response with 10x provider records, and you make 10x requests, that is a vector for an amplification attack because I have got you to do more work than the work it took me to get you to do the work, though hopefully it's not all bad because the provider hosts should have to be distinct.

As long as we recognise this and are ok with it.

Copy link
Member

@SgtPooki SgtPooki Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a great catch @achingbrain

We should absolutely have some way to know the returned providers are valid ipfs node.

I guess as long as we dont modify the delegated routers from a set of known good actors it shouldnt be an issue, but thats centralizing on known delegated providers.

I imagine we would want any peer to be able to provide routing responses in the future though (ambient peer discovery?) where this could become a huge problem. I believe the amplification attack was discussed there as well

If we get a list of providers from some delegated-router and all those providers turn out bad, we could at least mark that delegated-router as a bad actor and limit any future requests to it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that if you send me a request, and I send you a response knowing it will trigger you to send a request to a specific third party, that is an attack vector.

True, although to some extent this is true regardless. There is currently no validation anywhere in use (e.g. DHT, IPNI, etc.) that a peer advertising /ip4/1.2.3.4/tcp/4001/p2p/<peerID> actually controls that IP address so there's already some potential vector there. This could be done by trying each advertised address, although this assumes the routing system supports all the protocols advertised (e.g. ipv4/ipv6, tcp/quic/webtransport/wss/...) which might not be true (or great for upgradability).

To be fair it is likely more work to setup HTTPS for a bad request than trying to establish just a libp2p connection and on the libp2p side we can be somewhat protected by the peerIDs/public keys (e.g. the public key can sign what's a valid advertisement with its key which will cause the security negotiation to fail otherwise ... although that will result in a wasted dial which is a separate issue in and of itself).

While we're here there's a slightly larger attack surface here for "webseeds-like" behavior (e.g. pointing at https://some-ubuntu-mirror.com/ubuntu-v.1.2.3.iso or (.car)) since more bytes can be transferred. This problem gets drastically worse if the incremental verifiability window gets larger.

If we get a list of providers from some delegated-router and all those providers turn out bad, we could at least mark that delegated-router as a bad actor and limit any future requests to it

Hopefully as explained above this isn't a reasonable thing to do unless there's actually responsibility on the underlying routing system to do these checks (or some delegated routers are allowed to advertise via something like HTTP OPTIONS or .well-known to do extra work you can punish them for not doing correctly). Mostly what you're left with to punish them on is do they give about the same results as other delegated-routers (for the same underlying routing systems) or as querying the underlying systems themselves (likely requires something similar to ipfs/specs#388).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filtering by transport-ipfs-gateway-http will be harmful for decentralization in the long term.

So this is true, just like filtering for bitswap is also harmful. These protocol advertisements are only supported by IPNI, for Amino (as well as any new protocol advertised by nodes without a new codec landing and all IPNI nodes supporting the code) using identify-like behavior (e.g. libp2p identify or leveraging .well-known) is required.

Since then, I've been trying to find a way for /routing/v1 to represent Amino DHT peers that provide trustless gateway endpoint, ensuring these can be found (once we have them), not just IPNI ones.

For trustless-gateway over libp2p .well-known can be used and it's fine. For plain trustless-gateway there's currently no way to advertise that in the Amino DHT because the mappings are multihash -> peerID. This is very similar (although the underlying use case wasn't exactly the same) as the problem expressed here libp2p/notes#11 and ipld/ipld#57 (comment) (note: that's from 2019 and my understanding of the situation has improved a bit since then 🙃 which is part of why there's more detail in the message above than there). AFAIK this would require a protocol change to support with Amino.

Copy link
Member

@SgtPooki SgtPooki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me, but based on prior comments it needs a few changes before merging

Comment on lines +17 to +19
* @default only-if-cached
*/
cacheControl?: string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about defaulting to asking gateways to only return content they have. will read more in the PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to stop them doing what gateways do, e.g. fetch content on your behalf. Otherwise it defeats the purpose of having a session.

Base automatically changed from feat/add-sessions-to-interface to main April 4, 2024 10:27
@achingbrain achingbrain merged commit 6ddefb0 into main Apr 4, 2024
18 checks passed
@achingbrain achingbrain deleted the feat/add-trustless-gateway-sessions branch April 4, 2024 10:54
@achingbrain achingbrain mentioned this pull request Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants