Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP-77: Negentropy syncing #1494

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

NIP-77: Negentropy syncing #1494

wants to merge 3 commits into from

Conversation

fiatjaf
Copy link
Member

@fiatjaf fiatjaf commented Sep 12, 2024

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Sep 12, 2024

IMHO, In the long run, sync will have to become a mandatory part of the protocol. I just don't see Nostr taking over everything without robust syncing operations. Yes, we can sync via REQs but that is so bad that I don't actually consider it a viable solution.

That means that the sync protocol must be extremely well defined such that we can actually have 100s of different implementations in all possible languages, with all possible database options, working together. We need to simplify and clarify this NIP to the extreme. It needs to become so good that it is fine if we add it to 01.md

Libraries can help, but the ability to code it from near scratch is a MUST to me.


```jsonc
[
"NEG-ERR",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new message types must be added to readme as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not do it for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

@jb55
Copy link
Contributor

jb55 commented Sep 18, 2024

sync will have to become a mandatory part of the protocol

I hope so

@braydonf
Copy link
Contributor

A great one to add, this will be great for mobile clients and for relay to relay syncing. Lots of potential. Let's go! :)

@staab
Copy link
Member

staab commented Sep 27, 2024

It would be great to merge this so that we can start checking supported_nips for support rather than targeting negentropy software versions.

@braydonf
Copy link
Contributor

Something that could be helpful for these types of protocol extensions is a consistent machine readable response that indicates it isn't supported. Currently I'm getting several different NOTICE responses (not intended to be machine readable) from relays, such as:

  • "ERROR: bad msg: negentropy disabled"
  • "ERROR: negentropy error: negentropy query missing elements"
  • "could not parse command"
  • "ERROR: bad msg: invalid message: {'message_type': ['Invalid enum value NEG-OPEN']}
  • "error: bad message"
  • "could not parse command"
  • "bad message type"
  • "invalid: "value" does not match any of the allowed types
  • "Command unrecognized"

@fiatjaf
Copy link
Member Author

fiatjaf commented Oct 27, 2024

https://github.com/fiatjaf/khatru supports this as of now, https://github.com/nbd-wtf/go-nostr has supported it from the client side for some weeks.

fiatjaf/khatru@1dc12e5

You were interested in trying it, @gzuuus @barrydeen, so once you did let me know if it's broken.

@fiatjaf
Copy link
Member Author

fiatjaf commented Oct 27, 2024

@hoytech what do you think of getting rid of the CLOSED and TOO_MANY_RESULTS error messages and reusing the other format that is already common on Nostr of an arbitrary string with a machine-readable prefix, like closed: you took too long to respond!, blocked: this query is too big (just the part before the colon matters), then we could also use other common responses, like auth-required: _ for when a relay only accepts negentropying in these conditions, for example, and a generic error: _, for internal errors and other stuff.

@hoytech
Copy link
Contributor

hoytech commented Oct 29, 2024

@fiatjaf - On the error message format: Good idea, this would be better. I will push a change shortly.

@bezysoftware
Copy link
Contributor

As someone who just finished implementing the base negentropy protocol in C# (https://github.com/bezysoftware/negentropy.net) I have a few notes:

  • I think there is no point in attaching the base spec in this NIP (appendix)
    • Most relay implementations will use a dedicated negentropy package for their language
    • Devs wanting to implement it will have to look at @hoytech's repo anyway, as it's more descriptive, has more details and, more importantly, cross-implementation tests
    • We don't have sha256 or others described in NIP-01, it's just a basic assumption it's something standard. I think negentropy should be treated the same
  • I would appreciate more platform-independent tests (e.g. written in Gherkin)
    • The test suite in @hoytech's repo is great, but the items size is too huge to debug
    • But those should go to that repo, not here.
  • I agree with streamlining the error messages

@alltheseas
Copy link
Contributor

Added placeholder negentropy tracker nostrability/nostrability#125

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Nov 5, 2024

I am coding in Kotlin at the moment and I am not going to lie. I hate this VarInt and the Query encoding part. Yeah, it's very optimized, but there is so much byte crunching and type overflows that we are going to have bugs everywhere on this. I wish we just made it a regular JSON object when transmitting messages between nodes to keep the Nostr theme of "not really the most optimal, but really easy to code without bugs"

@bezysoftware
Copy link
Contributor

bezysoftware commented Nov 5, 2024

I am coding in Kotlin at the moment and I am not going to lie. I hate this VarInt and the Query encoding part. Yeah, it's very optimized, but there is so much byte crunching and type overflows that we are going to have bugs everywhere on this. I wish we just made it a regular JSON object when transmitting messages between nodes to keep the Nostr theme of "not really the most optimal, but really easy to code without bugs"

That was my thought too, all the VarInt and Bound encoding (and especially relative timestamps) is non trivial, but given the complexity of the algorithm overall, I think it's all or nothing. Either adopt Negentropy as it is, or not at all and use something like #1027

I still think it's worth it - invest in base libraries and you'll save a ton in bandwidth

@vitorpamplona
Copy link
Collaborator

invest in base libraries and you'll save a ton in bandwidth

I don't think varint is saving a lot. Similar to the delta timestamps in the same message, which turns the algo stateful. Maybe I am wrong, but I think @hoytech could have used longs for everything and this would be much simpler to implement at the expense of a few extra bytes in each message.

And yes, we desperately need static test cases.

@alltheseas
Copy link
Contributor

If a nostr app, or a nostr relay implements negentropy, is it backwards compatible?

@vitorpamplona
Copy link
Collaborator

Kotlin PR is up: hoytech/negentropy#15

@bezysoftware
Copy link
Contributor

invest in base libraries and you'll save a ton in bandwidth

I don't think varint is saving a lot. Similar to the delta timestamps in the same message, which turns the algo stateful. Maybe I am wrong, but I think @hoytech could have used longs for everything and this would be much simpler to implement at the expense of a few extra bytes in each message.

And yes, we desperately need static test cases.

That depends on the size of the diff. For fun I gutted one of @hoytech's tests with around 40k items on both sides (with 9k mismatches both ways); once with delta timestamps + varints and once without those (numbers+dates encoded as longs - 8 bytes). The non-optimized query was 2x as large in the first round.
The combination of delta timestamp+varint save space especially if the timestamps are close to each other when you can encode them using only 1 or 2 bytes,

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Nov 6, 2024

That's the issue with the test cases... They generate created_ats on the same day (1000s of events in the same day), which is not very realistic.

But it's ok. We can keep it like that.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Nov 8, 2024

I finally got parity with the C++ and JavaScript implementations on hoytech/negentropy#15

Pros:

  • Extremely efficient in the number of messages and rounds to sync.
  • "Set and forget" protocol

Cons:

  • It's finicky. Requires absolute byte-parity on all implementations. If somebody misses an = in a bound filter, the whole thing goes crazy and the implementation might not realize it's talking oranges to a client speaking apples. This can be used to DDoS relays and clients.
  • It's stateful. Requires keeping the storage for Negentropy objects (which must be separate from the DB) live to complete sync and since Nostr is famous for delays, reconnections, and clients/relays straight out going away, there will be some memory costs in keeping so many filters in memory if this gets adopted for real.
  • No caching. The sync bands are dynamically defined which means no pre-caching of the hashes can be done. If clients and servers are syncing the same filters, caching on a static band protocol could significantly reduce computational costs.

Meanwhile, something like #826 is:

Pros:

  • Bugs are less problematic. It's harder to get different implementations because the bug surface is so much smaller. And if different implementations exist, a simple rule of syncing twice and giving up if it can't, can solve everything.
  • Cache. Hashes can be cached for syncing in repeat filters. Old data won't change so much on nostr, so caching could be an effective tool.
  • Stateless. Clients and relays might lose connection at any point and they will be fine. There is no added memory requirements. Each side can keep changing their DB in the middle of the sync.

Cons:

  • Requires 10x more rounds to complete a sync in a similar way Negentropy would do and thus it is way less efficient in time (not necessarily in processing power/data consumption).
  • Requires Nostr devs to make their own protocol for how to sync and what to ask for.

@hoytech
Copy link
Contributor

hoytech commented Nov 8, 2024

Thank you for this @vitorpamplona! I pretty much agree with your assessment, although I'll make a couple comments. I'm going to find a moment to merge your PR, hopefully this weekend.

Requires absolute byte-parity on all implementations

The protocol is actually relatively forgiving of implementation differences. Yes, the test-suite is much stricter and does verify that they exactly conform, but this is mostly just to help developers detect bugs, and I could relax this if needed.

It's stateful

It depends. In theory, you could throw away the set of IDs after generating each message and re-run the query for each received message. The protocol is tolerant of changes to the ID set while messages are in flight. This would make it stateless at the expense of much higher CPU usage. Personally I think keeping state is the right tradeoff in most cases.

For implementations with btree storage like strfry, queries for specific pre-selected filters (plus their variants with since/until) can be performed entirely statelessly. I think this is most useful for full-DB syncs (ie {}) or partial but common requests (ie {"kinds":[0,3]}).

And yes, we desperately need static test cases.

Yes, this would be wonderful. I will add this to my TODO list and will make some (eventually!).

@vitorpamplona
Copy link
Collaborator

The protocol is actually relatively forgiving of implementation differences.

I ran into many issues of "forever syncing" when fingerprints/bands didn't perfectly match (because of the bugs in my implementation). I do think a malicious actor can use the sync SPEC to keep relays busy, but my real concern is if there are too many slightly-off implementations of the protocol such that nothing would work and people just give up because of the lack of consistent support across the network. Libraries help, but the risk is still there.

Let me know if you need any help running the kotlin/java stuff.

@hoytech
Copy link
Contributor

hoytech commented Nov 12, 2024

@fiatjaf - I made a PR to update the error code format as per your suggestion: #1579

I'm updating strfry now. I will check to see if any other relays have their own implementation that needs updating (but I don't think so). AFAIK no clients depend on these exact errors at this time either.


Error reasons are the same format as in NIP-01. They should begin with a machine-readable single-word prefix, followed by a `:` and then a human-readable message with more information.

The current suggested error reasons are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add something like invalid: unknown subscription id? AFAIK there is no equivalent in NIP-01, since it's not needed. In case of CLOSE message the client is not expecting any reply, so it's fine the relay doesn't send back an error, but in case of MSG the client is actually waiting for a response, so it makes send to send appropriate error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Currently strfry sends closed: unknown subscription handle in this case. It doesn't bother trying to differentiate between a subscription ID that was previously open but got closed (by a NEG-CLOSE message, timeout, etc) and one that was never opened in the first place.

@bezysoftware
Copy link
Contributor

The relay portion of this is now implemented in Netstr running here: https://relay.netstr.io/
It's internally using the Negentropy.net package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants