Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'since' meaning in filters #78

Open
mikedilger opened this issue Nov 26, 2022 · 25 comments
Open

'since' meaning in filters #78

mikedilger opened this issue Nov 26, 2022 · 25 comments

Comments

@mikedilger
Copy link
Contributor

I don't want a list of events with created_at after my 'since' filter. What I really want is all the events that the relay has become aware of since I last got events off of that relay. But we can only ask for one or the other (and I'm not clear which one relays are using).

Event 'created_at' stamps are created by the author at time A. Relays may become aware of these events quite a bit later on, at time C > A. If I last fetched events from the relay at time B < C, and I come back and ask for events since time B, will I get the event created at time A < B? My assumption is that I will not, and the only way to pick these up is to just have a huge overlap time in my request and hope I don't miss much.

If I'm correct and 'since' refers to 'created_at', I would offer to write a NIP that extends filters with another field. Relays would have to record when they became aware of the event (seems pretty easy, a timestamp default now field in a database).

@eskema
Copy link
Collaborator

eskema commented Nov 26, 2022

I have a really hard time understanding this issue. since and until refer to the created_at, so if you request since X, all the events created_at > x will be returned

ok, I think I understand what you're referring

@fiatjaf
Copy link
Member

fiatjaf commented Nov 26, 2022

How will the relay know you are you?

I think maybe we could use the same subscription ID and somehow the relay would just "continue" the subscription from where it stopped?

Doesn't this complicate the job of relays too much to be worth doing? I don't know.

@eskema
Copy link
Collaborator

eskema commented Nov 26, 2022

this work should be done by the client, not the relay. the client is the one with the burden to reconstruct it's prefered timeline, the relay should not care. you need to query multiple relays and expect the info to be scatterred. If you depend on your info to be correct everytime, then use your own relay and make sure it has every post like you want it to.
P.S. I also do think relays should have a received_at value, because we can leverage that and this would be solved without relays having to know who you are and what do you already have requested.. (and they probably already log that along with the ip)

@fiatjaf
Copy link
Member

fiatjaf commented Nov 26, 2022

Is it really that bad to lose some backdated events? If that backdated event matters it will receive replies, be referenced or quoted by some other event, then the client should try to fetch it by ID and then everything should be sorted out.

@unclebob
Copy link
Contributor

unclebob commented Nov 26, 2022 via email

@eskema
Copy link
Collaborator

eskema commented Nov 26, 2022

The solution I use in more-speech is to find the latest created_at date in the messages that I’ve saved locally. Then I ask the relays for all messages created after that date. I presume the relays are using the created_at field in the messages for that comparison.
On Nov 25, 2022, at 22:24 , Michael Dilger @.> wrote: I don't want a list of events with created_at after my 'since' filter. What I really want is all the events that the relay has become aware of since I last got events off of that relay. But we can only ask for one or the other (and I'm not clear which one relays are using). Event 'created_at' stamps are created by the author at time A. Relays may become aware of these events quite a bit later on, at time C > A. If I last fetched events from the relay at time B < C, and I come back and ask for events since time B, will I get the event created at time A < B? My assumption is that I will not, and the only way to pick these up is to just have a huge overlap time in my request and hope I don't miss much. If I'm correct and 'since' refers to 'created_at', I would offer to write a NIP that extends filters with another field. Relays would have to record when they became aware of the event (seems pretty easy, a timestamp default now field in a database). — Reply to this email directly, view it on GitHub <#78>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAJAJKLJTHL7R3Q7YOTZY3WKGGHLANCNFSM6AAAAAASLZGBZQ. You are receiving this because you are subscribed to this thread.
Robert C. Martin (Uncle Bob) | @.
Uncle Bob Consulting LLC. | @unclebobmartin 847.922.0563 | cleancoder.com

The issue raised here is not very typical and only happens on certain conditions..
Alice subscribes to Bob's posts from some relay (or multiple, also applies). The client stores the last created_at it received so it can use that next time it needs to query. meanwhile, there was an event from Bob older than this last created_at stored value that for some reason, the relays didn't have before, but they do now. When Alice checks back on her feed, that new event will not come if she uses the since value because the event is older than that, thus, making the timeline incomplete.

If we had a received_at value to query from relays, then we could query every event received since our last saved date, and we would also receive backdated events that came after.

@mikedilger
Copy link
Contributor Author

mikedilger commented Nov 26, 2022

To be clear, the relays do not need to know who you are. All I'm asking is that the relays use THEIR time clock, rather then the event signer's clock. Each message on a relay would know when it was created by the signer, according to the signer, but also when the relay first became aware of it. Then I want to ask "give me the messages you became aware of after time X".

Like I say, this is really easy if relays store events in databases. Just add a "timestamp default now" field that automatically sets when the relay created the record, and filter on that field.

@monlovesmango
Copy link
Member

monlovesmango commented Nov 26, 2022

are you proposing to add new filter fields for the relay timestamp or changing the current since and until fields to only query the relay timestamp?

@eskema
Copy link
Collaborator

eskema commented Nov 26, 2022

this should be a new filter field, this should only extend, not replace existing behaviour

@mikedilger
Copy link
Contributor Author

are you proposing to add new filter fields for the relay timestamp or changing the current since and until fields to only query the relay timestamp?

New field, maybe called 'received_at'

@eskema
Copy link
Collaborator

eskema commented Nov 26, 2022

great, now someone needs to NIP it so some relay / client can implement. would love to hear what relay maintainers think. @cameri @scsibug

@cameri
Copy link
Member

cameri commented Nov 26, 2022

Both nostr-rs-relay and nostr-ts-relay store the timestamp when an event is seen for the first time so implementing this would be trivial.
I'd argue you'll eventually want to fetch ranges so we should implement the subscription filter as such.
I propose seen_since and seen_until where seen_until is exclusive.

@eskema
Copy link
Collaborator

eskema commented Nov 26, 2022

do we want this only for retrieving, so we don't actually need to know the value, or, do we also want relays to return this value with every event?

@mikedilger
Copy link
Contributor Author

I only care to have it in the filter. If relays returned it with events, it would have to be outside of the event, augmenting this: ["EVENT", <subscription_id>, <event JSON as defined above>] with another field. But I vote to not do that.

@cameri what do you mean by "seen_until is exclusive" I have no problem with adding seen_until. I won't be using it. I always want events right up until now.

@cameri
Copy link
Member

cameri commented Nov 27, 2022

I only care to have it in the filter. If relays returned it with events, it would have to be outside of the event, augmenting this: ["EVENT", <subscription_id>, <event JSON as defined above>] with another field. But I vote to not do that.

@cameri what do you mean by "seen_until is exclusive" I have no problem with adding seen_until. I won't be using it. I always want events right up until now.

I mean like an open-closed interval [a, b), aka seen_since <= first_seen < seen_until so seen_until is not included in the result.
If you don't pass seen_until you'll get events up until now.

@scsibug
Copy link
Collaborator

scsibug commented Nov 27, 2022

I think this feature exposes too much of the underlying relay state and complicates the subscriptions for clients (exposing two different timestamps). I'd much rather stick with a simpler model for querying events, even if we don't have a perfect way of knowing when a particular relay has received old events.

If one wants to ensure any old events get picked up, I'd just re-query without since/until filters, and get everything again. That is my preference at least.

It's pretty easy to implement for nostr-rs-relay though, pretty much just need another index and add an SQL clause, and some extra logic for the non-stored events.

@mikedilger
Copy link
Contributor Author

I disagree with @scsibug but there is another way to solve this. Events could include an optional reference to the previous event by that author. I don't like that solution, but it would allow my client to know it was all caught up. Of course the status quo isn't too terrible either as events are not likely to be missed, or they aren't terribly important if they do get missed.

@scsibug
Copy link
Collaborator

scsibug commented Nov 27, 2022

I only have a mild dislike, mostly because the simplicity of nostr has a huge appeal for me. That said, we can implement this with no changes for existing clients, so that is very nice. I will work on implementing if/when the NIP is numbered and merged.

@ok300
Copy link
Contributor

ok300 commented Dec 10, 2022

I think seen_since and seen_until assume the relay is a sort of "append-only log".

AFAIK that's not the case, there are:

If a client queries the relay in a seen_since / seen_until paradigm, it could lead to inconsistencies, because the same query with the same seen_since can have or not have certain events, depending when it's called.

It's not really seen_[since|until], it's more like seen_[since|until]_and_not_yet_deleted_or_replaced_by_newer.

I suppose that's fine for a client that only filters by seen_since incrementally, but the moment seen_until is used, and it's a value in the past, the query becomes unreliable.

One way to go around this is to only add seen_since and always imply seen_until is now. That way, even if a newer event caused deletion or replacement of an older one, at least it's included in the response and the client can reconstruct what happened.


Another side-effect is it adds more responsibility to the relay.

Before, a misconfigured NTP setting or a temporary outage only meant some relay log timestamps are out of order.

With this new filter, it can lead to certain clients missing certain events. It's not just a relay anymore, it's a "relay with an accurate and always well-synchronized local time". An NTP issue on a relay can lead to clients missing events, and it would be very hard to even notice there's an issue.


It looks deceptively simple and straightforward, but IMO complicates the lives of both clients and relays.

Maybe seen_since and seen_until, instead of timestamps, could be relay-specific incremental UUIDs? Would achieve the same thing, but get rid of the risks above.

@mikedilger
Copy link
Contributor Author

As I understand it, replaceable/deletable events are replaced/deleted by other events. If replacement or deletion events have shown up at the relay after I last checked, I want those.

If a relay interprets a replacement event as replacing an event in the past, and uses that past date of the original event in filters, but supplies the replaced event instead of the original event, then yes my logic won't get all events I have missed. I thought relays were dumb and clients were doing the interpretation.

I guess I really just don't understand what people are doing in regard to this issue. I've deleted the pull request because the majority of responses to it were negative. But I still don't know what my client is supposed to do -- ask for every event from the relay every time it is started? Or every event within the last N days (where users set N to whatever they like), even if they just closed and reopened the client?

@ok300
Copy link
Contributor

ok300 commented Dec 10, 2022

I've also been looking for an efficient way to stay in sync with a relay.

Solutions I'm aware of:

  • always query all events
  • always query all events with a relatively recent since (e.g. 1d, 1w, etc) and only go further back in time on-demand (client clicks "load more")
  • on first start query all events (until EOSE), then store timestamp, then query events with since = timestamp - 1h (or some amount of time to account for client time variance)
  • same as above, but instead of timestamp use the highest created_at of received events
    • assumes relay uses NIP-22 to reject events with an unreasonable created_at

Also curious to hear how other apps handle this (cc @jb55 @unclebob @monlovesmango @vishalxl @KoalaSat @brilliancebitcoin @balas @Giszmo)


As to seen_since, I see two variations that look promising:

  • your original idea above (seen_since filter that uses relay local time when events were recorded), but without seen_until
    • would support calling it with incremental seen_since
    • client gets all event deletions, replacements, etc and has a consistent view
    • no complications due to seen_until in the past and events getting replaced or deleted after
  • seen_since / seen_until, but where they point to incremental event UUID instead of timestamp
    • relay NTP issues no longer have an effect on query results

What do you think?

@mikedilger
Copy link
Contributor Author

Gossip code currently loads events back to either the last timestamp plus a small overlap period, and at most by one 'chunk size' which is configurable, so very much as you describe.

That last point with an incremental UUID is interesting. I know some feedback was that seen_since complicates things. I would think adding an incremental UUID also would get the same feedback. But if people liked it, this suits.

@KoalaSat
Copy link
Member

@ok300 it depends of the scenario, but when loading the main timeline, nostros applies the 3rd option, it checks the last created_at, as side note, careful mixing this with limits, you might loose some events if the gap from your last createt_at is bigger

@monlovesmango
Copy link
Member

on first start query all events (until EOSE), then store timestamp, then query events with since = timestamp - 1h (or some amount of time to account for client time variance)

I use this approximately. nostr-tools npm package doesn't yet have EOSE so I just store the current timestamp 3 min after queries have opened for a user (I am making an assumption that after 3 min everything is synced). once nostr-tools does have EOSE I will switch to using that instead. I also go back 7 days from sync timestamp. seems like overkill but would rather not miss things that took a couple days to propagate.

this isn't related to query completeness but once nostr-tools has EOSE I will use a different strategy for feed. I will query in intervals, starting with the latest interval and with each EOSE I will progress to the next interval until I hit the earliest interval (determined by last sync time). will likely write something for nostr-tools relay pool to handle this kind of stepped query. seen_since and seen_until would probably be more apt for this kind of query than since and until

@weex
Copy link

weex commented May 17, 2023

Is this issue satisfied by either of the two merged PRs above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants