IPIP-342: Ambient Discovery of Content Routers #342

willscott · 2022-11-12T09:34:52Z

This follows the previously circulated proposal outline at https://hackmd.io/bh4-SCWfTBG2vfClG0NUFg

A basic motivation is included in the PR - but essentially this is the best path I've heard for reducing our dependence on hydras as a centrally operated choke point for moving the bulk of the IPFS network beyond sole reliance on the current KAD DHT.

This follows the previously circulated proposal outline at https://hackmd.io/bh4-SCWfTBG2vfClG0NUFg A basic motivation is included in the PR - but essentially this is the best path I've heard for reducing our dependence on hydras as a centrally operated choke point for moving the bulk of the IPFS network beyond sole reliance on the current KAD DHT.

IPIP/0000-content-router-discovery.md

Co-authored-by: Max Inden <[email protected]>

… into feat/content-router-discovery

ajnavarro

I'm missing a way to link content with the provider because I seriously doubt that all parties will be eager to provide all the CIDs in the universe, they will focus on providing the content they care about.

We could link root CID/CIDs with the provider to let know to nodes where they can find the DAG for specific content.

ajnavarro · 2022-11-15T15:49:46Z

IPIP/0342-content-router-discovery.md

+properties:
+* reliability - how many good vs bad responses has this router responded
+with. This statistic should be windowed, such that the client can calculate
+it in terms of the last week or month.


shall we be more specific here?

ajnavarro · 2022-11-15T15:50:29Z

IPIP/0342-content-router-discovery.md

+* reliability - how many good vs bad responses has this router responded
+with. This statistic should be windowed, such that the client can calculate
+it in terms of the last week or month.
+* performance - how quickly does this router respond.


Is this metric also windowed?

Yes, this would be windowed. I was imagining a window of ~ "last week" by default, but this seems like a good candidate to evaluate through simulation.

IPIP/0342-content-router-discovery.md

ajnavarro · 2022-11-15T15:55:45Z

IPIP/0342-content-router-discovery.md

+The protocol will follow a request-response model.
+A node will open a a stream on the protocol when it wants to discover new
+content routers it does not already know.
+It will send a bloom filter as it's query.


Could we specify here first the data that we want to share between nodes, and after that define the way to do it?

ajnavarro · 2022-11-15T16:02:39Z

IPIP/0342-content-router-discovery.md

+list of known content routers, hashing them against the bloom filter and
+selecting the top routers that are not already known to the client. It will
+return this list, along with it's reliability score for each. This response
+is structured as an IPLD list lists, conceptually:


Can we simplify here? do we really need an IPLD list? Reducing the number of new concepts needed to make this work will ramp up the development of different implementations.

a json or cbor array are both examples that would fulfill this, I'll leave it more generic but i have a hard time imagining we'd encode this in a way that wouldn't conform to being considered an IPLD list

ajnavarro · 2022-11-15T16:12:10Z

IPIP/0342-content-router-discovery.md

+The protocol will follow a request-response model.
+A node will open a a stream on the protocol when it wants to discover new
+content routers it does not already know.
+It will send a bloom filter as it's query.


Maybe interesting for this use case: IBLTs: https://arxiv.org/abs/1101.2245
Proposal for sharing bitcoin transactions between nodes faster using IBLTs: https://gist.github.com/gavinandresen/e20c3b5a1d4b97f79ac2

I think we don't want invertability here - the use of the bloom filter is not only performance but also to loose some data to not directly reveal what the client knows. We can consider using cuckoo tables or vacuum filters as more space efficient alternatives to a classic bloom filter.

IPIP/0342-content-router-discovery.md

ajnavarro · 2022-11-16T11:21:09Z

IPIP/0342-content-router-discovery.md

+
+This design is self-contained - it does not require standing up additional
+infrastructure or making additional connections for discovery but rather
+gossips routers over existing peer connections.


this is the first time gossip is mentioned. Should we be more specific in the Detailed design section about the protocol and how nodes will be interconnected?

IPIP/0342-content-router-discovery.md

ajnavarro · 2022-11-16T11:26:43Z

IPIP/0342-content-router-discovery.md

+not be directly discovered. Instead, the gossip discovery protocol is
+ambiently discovered in much the same way as circuit relays.
+
+#### Advertisement in the DHT


The good things about advertising using the DHT are:

Network that is already there, no need to create a new protocol to "provide" new providers instead of CIDs.

You can provide associating your provider with a specific root CID content. I seriously doubt that all providers will be eager to provide all CIDS in the universe.

willscott · 2022-11-16T11:33:53Z

I'm missing a way to link content with the provider because I seriously doubt that all parties will be eager to provide all the CIDs in the universe, they will focus on providing the content they care about.

We could link root CID/CIDs with the provider to let know to nodes where they can find the DAG for specific content.

This is a philosophical disagreement about what a 'content router' is.
I'm going to refer you to the broad guidance that was presented over the summer for the evolution of content routing.

We currently have a couple examples of content routers that do have all the CIDs in the universe, and we do not have convincing examples of, or a definition for, sub-content-routers as you're proposing here.

Why are we trying to compromise to a much-harder-to-make-work complexity without trying for the thing that makes sense and the direction we're heading first?

ajnavarro · 2022-11-17T15:44:25Z

@willscott I think more than a philosophical disagreement is a physical one.

Right now we are able to keep all the CIDs on the network in one provider for two reasons:

The network is relatively small
Right now we have only two places where nodes are providing CIDs: DHT and Bitswap, so it is easier to get that information because is only in two specific places.

When we start to have different ways of providing CIDs, will be near impossible to have everything replicated by everyone.

Also when the network scales, having centralized all the information in several places will be quite challenging and costly. But on the other hand, allowing both approaches (providing everything vs providing a subset of the CIDs) will have for sure their use cases for people with not a huge amount of money to maintain big providers.

willscott · 2022-11-17T15:52:18Z

We currently have providers stepping up to provide full replicas of a content routing database. That is what network indexers have been doing over the last year.
I don't see the physical issue here: indexers are handling trillions of records, vs the 100 billion in the DHT, and the scale we expect as we grow will still fit at the level of a single rack in a data center.

In designing delegated routing so far, the eye has been towards a design where delegated routers need to fall back and do the additional work of querying other routers in order to collect a full replica if they don't possess it themselves, rather than making that the end kubo node's responsibility, as that leads to an untenable performance and decision process for end user nodes that are not equipped to handle that.

I'm not entirely sure of your counter proposal here: I think there are very strong counter arguments against both #322 - which compromises trying to be a content addressed network, or limited DHT providing (e.g. to roots) - which still couldn't handle the current indexer database scale.

aschmahmann · 2022-11-17T17:12:41Z

IPIP/0342-content-router-discovery.md

+IPFS nodes will advertise and coordinate discover of content routers using a
+new libp2p protocol advertised as "/ipfs/content-router-discovery/1.0.0".


As things currently are the name, purpose, or formatting of this protocol seem off. This relates to some of the middle-ground in this discussion between @willscott and @ajnavarro
with #342 (comment) and #342 (comment) around content routing.

High level:

This currently seems to be specifically for IPNI routers so at best this is /ipfs/ipni-discovery/1.0.0

I can guarantee with 100% certainty that there will be people who want additional content routing systems than /ipfs/kad/1.0.0 and the IPNI protocol. However, the model of this discovery system works for any system that has a set of endpoints which are supposed to be able to locate all data within the system (e.g. delegated routing endpoints for /ipfs/kad/1.0.0, IPNI endpoints, delegated routing endpoints for BitTorrent's mainline DHT, etc.). If you want it to be generic enough to cover that then there needs to be some name/identifier for the system you want (e.g. asking for bloom filters or peers specific to a given routing system)

If we leave this as IPNI only, ok 🤷. However, almost the same logic is going to be needed for browser nodes trying to leverage multiple delegated routing endpoints so they'll either up defaulting back to one of the "rejected options" here (e.g. hard coding them or DHT discovery) or reimplementing this.

This currently seems to be specifically for IPNI routers so at best this is /ipfs/ipni-discovery/1.0.0

this is for discovery of content routers per the delegated content router API - what about this is IPNI specific?

The requirement (implied by the proposed reputation scoring) of keeping track of all CIDs in existence makes this sound like an IPNI-specific proposal. Who else would keep the whole index if not "InterPlanetary Network Indexer" (even if it is a composite/reverse proxy one)?

Due to this, renaming it to /ipni-discovery/ calls it what it is and avoids undesired feature creep.

Alternatives:

make this more generic, /router-discovery/: extend the lookup spec to include explicit type of router (for now all lookups will be IPNIs, but allows us to expand this in the future, as suggested). I see this being useful for gossiping/discovering things like IPNS, peer routers, or even DoH, DoT DNS resolvers.

having different router types allows us to have different reputation systems, which may be a way for future/separate support routers which have only a partial view of entire CID space

I'm confused as to how the current draft addresses the issues here:

will have knowledge of the entire CID space

This line (from the spec) seems problematic even in the IPNI case. IPNI != the entire CID space which starts to make implementations complicated. The idea that one routing system is going to cover every use case is IMO not the way to go (and also the reason why there's even discussion of a delegated content routing API rather than just an IPNI API).

An example issue is to say we have 4 routers:

cid.contact (IPNI)

FilSwan (IPNI)

routing.delegate.ipfs.io (proxies DHT + IPNI)

ipfsdht.delegate.ipfs.io (proxies DHT, this is the only one that doesn't exist today, but certainly could)

While router 3 provides strictly more information than routers 1 or 2 it's also likely to be slower than them. It seems optimal to either contact 3 or contact 1/2 + 4 in parallel. For a node running its own DHT client would contact routers 1 or 2 and never contact 4. However, a naïve implementation may just result in all requests going through router 3 as it has the most CIDs covered which is not good. Perhaps a classification algorithm would be able to tease-out the optimizations here without further protocol adjustments, but that seems like a lot of complexity that could be alleviated by a small protocol adjustment.

This case also seems more problematic than the one that's been resolved by flagging router "type" like content-routing, peer-routing, etc. since support for a given API (content/peer routing) can be discovered with a single query to the endpoint whereas this requires a bunch of code complexity.

This seems like it'd be largely resolvable based on allowing users to query and return a set of named routing systems, or just calling this the "IPNI discovery system" so that routers like 3 + 4 know not to participate. I'd rather the former, but understand the latter.

While I wouldn't be surprised if down the road we also ended up requiring some of that ML-style classification code anyway I suspect walking down that path now is premature and likely to cause us problems.

It seems like there's a lot of complexity in expressing this 'composition complexity' either directly or through classification. we don't have this problem today so I would prefer to defer this sort of grammar to a subsequent IPIP - you worry that 3 would do better than the others, but I would argue that would be incentive for the IPNI team to build what i think you have previously called 'radar' to incorporate DHT results into IPNI such that 1,2,3 are all equal :)

As you say, "IPNI != the entire CID space". I think it's a mistake to limiting our framing of this to an "IPNI discovery system" when it is simply discovering 'the most complete' available content routers. We're trying to be inclusive/general here - and I don't see huge harm in calling this content routers.

we don't have this problem today

I guess it depends what you mean by "today". I would like libraries like https://github.com/libp2p/js-libp2p-delegated-content-routing to switch to the latest content routing API (or have alternative which have switched) at which point js-libp2p in browsers should be able to leverage both the DHT and IPNI to get data from any peer that speaks wss/webtransport/webrtc.

Ideally they could use this protocol for discovery rather than hardcoding a DHT resolution endpoint.

@lidel could probably speak more about desired timings here.

but I would argue that would be incentive for the IPNI team to build what i think you have previously called 'radar' to incorporate DHT results into IPNI such that 1,2,3 are all equal :)

That's cool and would certainly resolve at least this use case 😄.

We're trying to be inclusive/general here
❤️

I think it's a mistake to limiting our framing of this to an "IPNI discovery system" when it is simply discovering 'the most complete' available content routers.

That's an interesting framing. By pushing for the "most complete set" it seems like you're essentially trying to get routers to compete for attention and content and make it so that only a single request to a single system needs to happen for clients to get what they need. If this is how the system evolves this is very nice to client machines.

However, if routers try and cut costs or code complexity by serving more-specific data (e.g. only data advertised over a specific pubsub channel, only data put in the IPFS Public DHT, only data from the BitTorrent network, ...) then the client code could start becoming problematic as it tries to figure out who to ask, without spamming all the routers.

I'm more cautious in advocating for the latter, but could see this go either way 😅. As long as the more immediate case around DHT + IPNI data being available to browser nodes is covered I'm happy 😄.

+1 that delegated DHT is the core use case for IPFS on Web Platform (JS in HTML in web browser), at least mid-term, because self-hosted user data is (at least for now) on DHT, and rarely on IPNI (which becomes the way for big paid providers to handle announcement of huge number of CIDs).

The gist of https://github.com/libp2p/js-libp2p-delegated-content-routing story is that is uses /api/v0/dht from Kubo RPC and we want to move away from that model.

Switching to HTTP API at routing.delegate.ipfs.io (proxies DHT + IPNI) is an easy win, and we would want to do this ASAP.

Having ambient discovery via bootstrap nodes talking /wss and /webtransport will allow for basic resiliency / redundancy

aschmahmann · 2022-11-17T17:21:41Z

IPIP/0342-content-router-discovery.md

+IPFS nodes, and the other side are the bootstrap and core-infrastructural
+nodes with high connectivity in the network.
+
+### 1. content-routing as a libp2p protocol


What's the expected plan for this to work with browser-based nodes? Are they supposed to fallback to one of your rejected alternatives (e.g. hardcoded nodes, hardcoded bootstrap nodes, advertising in the DHT, advertising in the Indexers, ...)?

I suspect the idea is for /dnsaddr/bootstrap.libp2p.io (or any other bootstrapper set by JS user, as long its /webtransport or /wss) to speak this new protocol, avoiding hardcoding anything new.

what prevents them from participating in this protocol as described?
browser nodes will need to contact to other existing nodes, as they do today. they would learn about the existence of content routers through those same channels via the new protocol, and could then make use of them.

what prevents them from participating in this protocol as described?

CORS. If the only type of router this protocol returns is HTTP URL, then by default JS-IPFS running on a website won't be able to read data via cross-origin requests to the discovered router due to CORS limitations.

We have two ways of solving the problem:

(easy spec fix) Add a paragraph that requires https:// servers returned by this discovery protocol to to ALWAYS have Access-Control-Allow-Origin: * etc set up

cc @guseggert for visibility, as we should include note about CORS headers in IPIP-337: Delegated Content Routing HTTP API #337 too

(more involved) Create libp2p version of IPIP-337: Delegated Content Routing HTTP API #337 that browser peers could use over existing /wss or /webtransport listeners. Another argument Why IPFS needs Delegated Routing over libp2p.

aschmahmann · 2022-11-17T17:38:09Z

IPIP/0342-content-router-discovery.md

+Cons:
+* Nodes cannot drop use of the DHT / other content routing options always are 'second tier'.


This isn't really the case. Even with this proposal you still need a bootstrap node somewhere to get going (e.g. a /ipfs/kad/1.0.0 bootstrapper, or someone supporting this libp2p protocol). For IPNI you could advertise to IPNI as well and you'd be fine. Perhaps a more accurate con is that this gives less of that subjective information that may/may not come in handy.

the alternative of other content routers being found in the DHT, which is what this alternative is trying to describe, does mean that no ipfs node could be run without kad dht code for DHT lookups. Being a DHT participant is more complexity than just having libp2p code to be able to connect to other peers, and at least my understanding of what's being proposed in this alternative is that it is intertwined with the DHT and not equivalent to hardcoded bootstrap nodes through which content routers can be learned.

aschmahmann · 2022-11-17T17:47:19Z

IPIP/0342-content-router-discovery.md

+
+#### Static list of known routers distributed with IPFS clients
+
+This has worked for the current IPFS bootstrap node, but leads to the need for


Agreed that IPNI (and delegated routers in general) are different from /ipfs/kad/1.0.0 in that the DHT has a discovery mechanism built in once there is bootstrapping and currently IPNI does not. However, any implementation is going to still need some level of hard-coding to get going here and having additional discovery is needed.

aschmahmann · 2022-11-17T18:07:52Z

IPIP/0342-content-router-discovery.md

+support this prioritization without leaking the exact list of known content
+routers that the client already knows.
+
+* The size of the bloom filter is chosen by the client. It is sized such


Unfortunately GitHub doesn't allow threads not tied to a line, but wanted to add some thoughts to this discussion #342 (comment) in a way that responses would be easy to trace.

Per @ajnavarro's comment IPIP-342: Ambient Discovery of Content Routers #342 (comment) I too find it a little hard to believe that there will be so many routers each providing full replicas of all data tracked by IPNI that a bloom filter would be required given that running these servers is expensive and incentivization is IIUC mostly TBD (I think @guseggert had some napkin math here showing the large costs around storing 10^15 CIDs even if we exclude bandwidth costs). That being said this is #not-this-ipips-problem. If the IPNI team thinks thousands of nodes all over the world will spring up hosting PBs of data and that lack of consistency between replicas isn't going to cause problems with the evaluation criteria that clients use that problem seems to live elsewhere.

Whether or not IPIP-322: Content Routing Hints #322 is a good/bad idea is also #not-this-ipips-problem since this describes how to find routers for a given content routing system (i.e. IPNI) not whether it should be passable as a hint.

As an aside my 2c is that you've got to be careful here to not break IPLD properties if you go this route as I've flagged in IPIP-322: Content Routing Hints #322, however it's potentially useful to add hints as long as they're not mandatory.

IPIP/0342-content-router-discovery.md

lidel · 2022-11-18T01:25:31Z

IPIP/0342-content-router-discovery.md

+IPFS nodes, and the other side are the bootstrap and core-infrastructural
+nodes with high connectivity in the network.
+
+### 1. content-routing as a libp2p protocol


I suspect the idea is for /dnsaddr/bootstrap.libp2p.io (or any other bootstrapper set by JS user, as long its /webtransport or /wss) to speak this new protocol, avoiding hardcoding anything new.

lidel · 2022-11-18T01:34:55Z

IPIP/0342-content-router-discovery.md

@@ -0,0 +1,267 @@
+# IPIP 0342: Content Router Ambient Discovery


Since I can't expose a partial index without being punished, would this be more correct?

Suggested change

# IPIP 0342: Content Router Ambient Discovery

# IPIP 0342: IPNI Content Router Ambient Discovery

IPIP/0342-content-router-discovery.md

lidel · 2022-11-18T18:06:33Z

IPIP/0342-content-router-discovery.md

+API. These routers currently are considered to directly support queries using
+the protocols specified by
+[IPIP-337](https://github.com/ipfs/specs/pulls)
+and/or
+[IPIP-327](https://github.com/ipfs/specs/pull/327).


We need to either decide which one is the future, or document how client decides which one to use for sending requests.
IF we do the latter, including content type along with the router URL is the way to go: Reframe endpoint is application/vnd.ipfs.rpc[..]; version=n.

Clarify generality potential of protocol

willscott · 2022-11-20T16:19:19Z

I updated to hopefully address your review, @lidel

There's a more concrete description of what the proposed protocol will look like
I added a query tag to support discovery of more than just content routing through the mechanism
in terms of 'is this only for ipni' / are you 'punished' if you're incomplete - i think it's probably too early to try to predict the dynamics here without either pretty extensive modeling or subsequent user data. The ranking / propagation of routers would be impacting by what users are after - so if there's a community or application that's focused on a narrow subset that's well addressed by a different router, that router could also be very viable with this discovery mechanism.

lidel · 2022-11-21T17:13:18Z

IPIP/0342-content-router-discovery.md

+Protocol messages are encoded using *cbor*. The following protocol examples demonstrate
+the schemas of requests and responses if they were to be encoded with JSON.
+
+A query on the "/ipfs/router-discovery/1.0.0" protocol will look like:
+```json
+{
+  "router": "string",
+  "filter": "bytes of the bloom filter"
+}
+```


I wonder if this protocol for discovering routers could be useful for some libp2p users (we could have a separate type for discovering routers that support peer routing). Indexers already have the peer data (mapping from peerid to multiaddrs), could be useful for reducing peer routing on light clients (use DHT as fallback / only when necessary).

@mxinden @marten-seemann thoughts on the use case and the wire format here?

(we could have a separate type for discovering routers that support peer routing).

Thus they would serve the same use-case as a rendezvous server?

thoughts on the use case and the wire format here?

I can not think of a project outside of the IPFS realm that is in need for this.

Thus they would serve the same use-case as a rendezvous server?

It extends the rendezvous protocol in two ways:

not requiring the need for a single hard-coded rendezvous point

adding reputational gossip in addition to just directory listing in the rendezvous protocol

IPIP/0342-content-router-discovery.md

guillaumemichel · 2022-11-23T10:32:25Z

IPIP/0342-content-router-discovery.md

+nodes do not have geographic locality. As a result, performance is
+separated in the tracking of content routers because it will not be
+effective as a ranking factor in the non-geographically-aware
+gossip system described here. As an optimization, nodes may choose to


gossiping should be geographically-aware and happen only between peers that are geographically close to each other. Otherwise, I may share a content router that is geographically close from me, but it will be too slow for you, and you won't use it at all.
So sharing content routers with geographically far peers becomes irrelevant, as long as we had enough content routers and that they are distributed around the globe.

do we believe ipfs nodes will generally have enough knowledge to ambiently identify which peers are geographically close to them?

A peer knows the RTT between itself and all of its directly connected peers. I would argue that a node cannot learn useful information about new content routers from a node that is 150+ms away from itself (except if it is in a desert). Hence nodes could gossip about content routers only with their closest nodes (in ping distance).

libp2p/specs#413 (GossipSub v1.2) would probably solve this, since it's all about minimising latency

we could use the same modeling / structure as gossip sub, but this is meant to be pull-based rather than push based. I have concerns about dropping in GossipSub directly.

+1, routers identified with DNS names (cid.contact) could use things like Anycast to ensure the client is routed to the closest one. (I believe we already do that for ipfs.io gateway..). Meaning, reports about "the same router" may be actually about different instance entirely. At the very least, spec should note that distance between peers should impact a router's score evaluation.

IPIP/0342-content-router-discovery.md

Co-authored-by: Guillaume Michel - guissou <[email protected]>

IPIP/0342-content-router-discovery.md

guseggert · 2022-11-30T13:56:40Z

IPIP/0342-content-router-discovery.md

+In addition, this protocol expects that content routers that may be considered
+for auto-configuration/discovery by IPFS nodes will have knowledge of the
+entire CID space - in other words a delegation to such a router may be
+considered 'exhaustive'.


How does this happen? What kind of consistency SLAs should routers have, and how can they achieve it?

I'd like to say 'that's outside of this direct IPIP' - in that if routers fail to be consistent they would risk loosing priority.

In practice:

indexers follow the list of providers from other indexers so that the constituents they follow are consistent

they gossip announcements they see to each other so new updates are propagated between them

[in progress] they can come to snapshot consensus periodically over a vector of providers & latest advertisements.

Co-authored-by: Gus Eggert <[email protected]>

BigLep · 2023-05-30T22:13:56Z

This came up in https://pl-strflt.notion.site/2023-05-30-Content-Routing-WG-12-b2ed74834fe44e359bbcdd02740e2084

There is going to be implementation need for this in the next quarter or two. As a result, want to get ahead before code gets written and decisions ossify.

Some next steps:

Update the IPIP to the newest template standard with lessons learned the last 6 months

I assume @willscott or someone from PL EngRes Bedrock team will do this.

Review from spec maintainers

I assume @lidel will take the first pass here on the updated doc when it's ready.

Implementation notes:
"Lassie currently does not depend on boxo. It may be hard for us to prototype ambient-discovery in boxo until the boxo/rest-of-world dependency conflicts are resolved"

lidel · 2023-06-01T11:34:59Z

IPIP/0342-content-router-discovery.md

+  has served useful content in the past.
+* Latency / ping time of the peer.
+
+### 3. selection of routers


Note to self: this section of the spec should be more specific about "bare minimum reputation system", and provide enough for implementer to do the right thing, and not say clients do "as they wish".

Expected probing behavior (or lack of it) on non-client services like bootstrappers should also be specified.

Somehow related (with custom vector):

https://www.notion.so/pl-strflt/Last-Known-Whereabouts-Protocol-00273dfe063546f1b1ee5c7c52916824

https://git.gnunet.org/bibliography.git/plain/docs/NetEcon%2706_-_Harvelaar.pdf

IPIP/0342-content-router-discovery.md

…iscovery

lidel · 2023-08-07T12:26:45Z

IPIP/0342-content-router-discovery.md

+A response is a list of entries, which looks like:
+
+```json
+[
+  {
+    "peer": "multiaddr.MultiAddr",
+    "score": float


single score may be too vague. based on Rhea/Saturn alone, we may want to track "lookup" score and "retrieval" score separately

if we want this API to be not limited to IPNI, then we could have type per result

lidel · 2023-09-19T15:10:35Z

src/ipips/ipip-0342.md

+IPFS nodes, and the other side are the bootstrap and core-infrastructural
+nodes with high connectivity in the network.
+
+### 1. content-routing as a libp2p protocol


I feel that sooner or later we will need a HTTP version of this,
better to address HTTP plan for this protocol from the start.

If this is a generic discovery protocol, http story could be as basic as a section that states the dag-json/dag-cbor wire format described here can be exposed as /routing/v1/routers or /routing/v1/discovery (making it part of the existing routing story for HTTP).

Since this is request-response protocol, perhaps leverage libp2p+http work from libp2p/specs#508?
It will enable us to describe protocol in terms of HTTP semantics, and expose the same socket over libp2p and HTTP (like we expose trustless gateway over libp2p in Kubo experiment at ipfs/kubo#10049)

This collapses complexity related to testing, and maximizes utility: HTTP client is enough to query public endpoint for useful routers.

Explicit response "router" field expressing the content router type, and explicit example of the "ipni" use case.

SgtPooki · 2023-09-19T23:41:29Z

src/ipips/ipip-0342.md

+[IPIP-337](https://github.com/ipfs/specs/pulls/337)
+and/or
+[IPIP-327](https://github.com/ipfs/specs/pull/327).


Would it be worth calling out these names as well, so users unfamiliar with the ipip-### dont have to click through

SgtPooki · 2023-09-19T23:48:32Z

src/ipips/ipip-0342.md

+In addition, this protocol expects that content routers that may be considered
+for auto-configuration/discovery by IPFS nodes will have knowledge of the
+entire CID space - in other words a delegation to such a router may be
+considered 'exhaustive'.
+


Is it possible to have an exhaustive list of the entire CID space?

Also, the wording here is hard to follow. Maybe something like:

Suggested change

In addition, this protocol expects that content routers that may be considered

for auto-configuration/discovery by IPFS nodes will have knowledge of the

entire CID space - in other words a delegation to such a router may be

considered 'exhaustive'.

In addition, implementers of this protocol may specify default content routers, but this protocol comes with a few expectations for default content-routers:

Default content routers configured by those implementing this protocol:

* MUST know of the

entire CID space - in other words, a delegation to such a router may be

considered 'exhaustive'

I also doubt that any single/federated entity will know the entire CID space. It may be possible now, but this certainly doesn't scale.

If we shard the CIDs on different instances, we basically get a DHT (with O(1) lookup if all shards fit in memory). However, it inherits the same weaknesses that the current DHT has, when providing a fair amount of CIDs, you need to open a connection to every single shard. To mitigate this, we might consider introducing intermediary services tasked with CID-to-shard allocation, streamlining access and reducing the number of direct connections needed.

I am going to classify this as out of scope for this IPIP.

We have examples of content routers being used today that are exhaustive. This IPIP is aimed to solve the immediate federation problem there. We don't have existing ones federating shards of space, so can not yet solve the general problem concretely.

Given nobody has implemented this for a year, I don't feel comfortable trying to increase scope at this point.

SgtPooki · 2023-09-19T23:51:12Z

src/ipips/ipip-0342.md

+Nodes will conceptually track a registry about known content routers.
+This registry will be able to understand for a given content router two
+properties:


Suggested change

Nodes will conceptually track a registry about known content routers.

This registry will be able to understand for a given content router two

properties:

Nodes will conceptually track a registry of known content routers.

This registry will maintain two properties for a given content router:

SgtPooki · 2023-09-19T23:54:26Z

src/ipips/ipip-0342.md

+* reliability - how many good vs bad responses has this router responded
+with. This statistic should be windowed, such that the client can calculate
+it in terms of the last week or month. This will in practice be stored as
+daily buckets of successful and unsuccessful queries against a router, where
+success indicates that the router was queried, and the data was subsequently
+retrieved from a node returned as a provider by that router.


Is it possible for a content router who returns a peer that should have content, to know that they successfully provided that content?

A asks B for providers of bafy1

B finds C and tells A about it

A asks C for bafy1.

Is this spec requiring A to tell B the result of item 3 above? How is this tracking accomplished?

src/ipips/ipip-0342.md

guillaumemichel · 2023-09-20T08:36:56Z

src/ipips/ipip-0342.md

+2. When its AutoNAT status indicates it is eligible to be a DHT server, and
+it has not successfully performed a sync in over a day.


What is the reasoning here?

willscott requested a review from a team as a code owner November 12, 2022 09:34

willscott requested review from ajnavarro, aschmahmann, guseggert and lidel November 12, 2022 09:35

mxinden reviewed Nov 12, 2022

View reviewed changes

IPIP/0000-content-router-discovery.md Outdated Show resolved Hide resolved

Update IPIP/0000-content-router-discovery.md

2585578

Co-authored-by: Max Inden <[email protected]>

lidel changed the title ~~Proposal for automatic discovery of content routers~~ IPIP-342: Automatic Discovery of Content Routers Nov 14, 2022

lidel changed the title ~~IPIP-342: Automatic Discovery of Content Routers~~ IPIP-342: Ambient Discovery of Content Routers Nov 14, 2022

willscott added 2 commits November 15, 2022 10:10

update with ipip number

1202cdf

Merge branch 'feat/content-router-discovery' of github.com:ipfs/specs…

92cf422

… into feat/content-router-discovery

ajnavarro reviewed Nov 16, 2022

View reviewed changes

Some updates in response to @ajnavarro's review

07710d5

aschmahmann reviewed Nov 17, 2022

View reviewed changes

lidel reviewed Nov 18, 2022

View reviewed changes

Address code review comments

8ac57d9

lidel reviewed Nov 18, 2022

View reviewed changes

further elaboration of protocol.

e558e66

Clarify generality potential of protocol

lidel mentioned this pull request Nov 21, 2022

Create Routing Specs #347

Open

6 tasks

lidel reviewed Nov 21, 2022

View reviewed changes

guillaumemichel reviewed Nov 23, 2022

View reviewed changes

IPIP/0342-content-router-discovery.md Outdated Show resolved Hide resolved

guillaumemichel reviewed Nov 23, 2022

View reviewed changes

IPIP/0342-content-router-discovery.md Outdated Show resolved Hide resolved

guillaumemichel reviewed Nov 23, 2022

View reviewed changes

IPIP/0342-content-router-discovery.md Outdated Show resolved Hide resolved

Apply suggestions from gui

d79bb40

Co-authored-by: Guillaume Michel - guissou <[email protected]>

guseggert reviewed Nov 30, 2022

View reviewed changes

Apply suggestions from code review

d99131a

Co-authored-by: Gus Eggert <[email protected]>

walkerlj0 mentioned this pull request Dec 19, 2022

Add a section about network indexer to IPFS or Filecoin protocol/launchpad#428

Open

lidel mentioned this pull request Jan 19, 2023

IPIP-359: Multi gateway client #359

Draft

aschmahmann mentioned this pull request Jan 27, 2023

Introduce specification for cascading lookup query parameter ipni/specs#9

Merged

lidel reviewed Jun 1, 2023

View reviewed changes

IPIP/0342-content-router-discovery.md Outdated Show resolved Hide resolved

willscott added 3 commits June 2, 2023 13:09

Merge remote-tracking branch 'origin/main' into feat/content-router-d…

b033bee

…iscovery

update format

b05960a

lint

b6a3611

lidel reviewed Aug 7, 2023

View reviewed changes

chore: editorial tweaks to enable HTML render

8174cea

lidel reviewed Sep 19, 2023

View reviewed changes

Update ipip-0342.md

6c837cc

Explicit response "router" field expressing the content router type, and explicit example of the "ipni" use case.

SgtPooki reviewed Sep 19, 2023

View reviewed changes

guillaumemichel reviewed Sep 20, 2023

View reviewed changes

src/ipips/ipip-0342.md Outdated Show resolved Hide resolved

guillaumemichel reviewed Sep 20, 2023

View reviewed changes

src/ipips/ipip-0342.md Outdated Show resolved Hide resolved

guillaumemichel reviewed Sep 20, 2023

View reviewed changes

src/ipips/ipip-0342.md Outdated Show resolved Hide resolved

guillaumemichel reviewed Sep 20, 2023

View reviewed changes

src/ipips/ipip-0342.md Outdated Show resolved Hide resolved

guillaumemichel reviewed Sep 20, 2023

View reviewed changes

src/ipips/ipip-0342.md Outdated Show resolved Hide resolved

guillaumemichel reviewed Sep 20, 2023

View reviewed changes

src/ipips/ipip-0342.md Outdated Show resolved Hide resolved

correcting typos

88dd4de

guillaumemichel reviewed Sep 20, 2023

View reviewed changes

		IPFS nodes will advertise and coordinate discover of content routers using a
		new libp2p protocol advertised as "/ipfs/content-router-discovery/1.0.0".

		Cons:
		* Nodes cannot drop use of the DHT / other content routing options always are 'second tier'.


		#### Static list of known routers distributed with IPFS clients

		This has worked for the current IPFS bootstrap node, but leads to the need for

		@@ -0,0 +1,267 @@
		# IPIP 0342: Content Router Ambient Discovery

	# IPIP 0342: Content Router Ambient Discovery
	# IPIP 0342: IPNI Content Router Ambient Discovery

		2. When its AutoNAT status indicates it is eligible to be a DHT server, and
		it has not successfully performed a sync in over a day.

IPIP-342: Ambient Discovery of Content Routers #342

Are you sure you want to change the base?

IPIP-342: Ambient Discovery of Content Routers #342

Conversation

willscott commented Nov 12, 2022

ajnavarro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willscott commented Nov 16, 2022

ajnavarro commented Nov 17, 2022

willscott commented Nov 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidel Nov 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidel Nov 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidel Nov 18, 2022 • edited Loading

Choose a reason for hiding this comment

willscott commented Nov 20, 2022

lidel Nov 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidel Nov 29, 2022 • edited Loading

Choose a reason for hiding this comment

guseggert Nov 30, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BigLep commented May 30, 2023

lidel Jun 1, 2023 • edited Loading

Choose a reason for hiding this comment

lidel Jun 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidel Sep 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidel Nov 17, 2022 •

edited

Loading

lidel Nov 29, 2022 •

edited

Loading

lidel Nov 18, 2022 •

edited

Loading

lidel Nov 21, 2022 •

edited

Loading

lidel Nov 29, 2022 •

edited

Loading

guseggert Nov 30, 2022 •

edited

Loading

lidel Jun 1, 2023 •

edited

Loading

lidel Jun 14, 2023 •

edited

Loading

lidel Sep 19, 2023 •

edited

Loading