Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow options to be passed for fetch and getOps #215

Merged
merged 1 commit into from
Jul 4, 2019
Merged

Allow options to be passed for fetch and getOps #215

merged 1 commit into from
Jul 4, 2019

Conversation

alecgibson
Copy link
Collaborator

@alecgibson alecgibson commented Jun 19, 2018

The database adapters allow an options object to be passed through to
them for enabling the return of metadata with a snapshot or ops.

Consumers could query the database directly, or even use the database
adapters, but this may give inconsistent results when comparing ops with
those fetched through getOps. For example, the Mongo adapter makes
sure that a valid set of ops with unique versions are returned, which
may not be the case when querying the database directly. Fetching ops
and snapshots through Backend methods also ensures that we call the
appropriate "sanitize" methods, and trigger the corresponding
middleware.

However, we don't expose this on Backend.getOps or Backend.fetch.
This change adds an optional options argument to these methods, which
can then be used to ask for metadata.

Note that an options argument has been added to Backend.subscribe, but
using it will return an error. This is to keep the signature consistent
with fetch and getOps. However, the implementation is beyond the
scope of this change, because we'd need to add some way to configure
SubmitRequest.commit to optionally pass metadata to the appropriate
clients, who provided that given option on subscribe.

@coveralls
Copy link

coveralls commented Jun 19, 2018

Coverage Status

Coverage increased (+0.03%) to 95.948% when pulling 95ae394 on alecgibson:getops-options into d210133 on share:master.

@gkubisa
Copy link
Contributor

gkubisa commented Jun 25, 2018

Personally I'd prefer not to merge this PR because getOps is not currently documented, so I'd consider it private, and as options are not required internally, I don't see a need to support that extra parameter.

Perhaps we could consider adding extra functions to the public server API, however, I'd like to know first some specific use cases where such functions would be necessary.

@alecgibson
Copy link
Collaborator Author

@gkubisa thanks for getting back to me. I agree that it's not a documented API, so maybe it would be nice to expose it properly?

We're currently trying to regenerate documents using the rich-text OT type. Specifically, we're trying to generate them at a specific point of time, which we're trying to do by fetching the ops, and filtering them by their timestamp, and then reconstructing the document from those ops.

@curran
Copy link
Contributor

curran commented Jun 26, 2018

@alecgibson That's a really cool feature! It would be great to document and expose all parts of the ShareDB API required to implement that. Perhaps this PR can add that documentation, and then we can consider it a "public" API?

@alecgibson
Copy link
Collaborator Author

Okay, so given that Backend.prototype.getOps is not preceded by an underscore, like Backend.prototype._triggerQuery (for example), that's why I assumed this function was "public".

If people are happy to already consider it public, I'm happy to document it (although having only dabbled in the codebase, I'm still unclear on what things like agent are...?).

If you're worried about the maintenance burden of supporting particular features, I'm also happy to expose a limited subset of options that can be provided (although I think I've only seen metadata floating around anyway...?).

@gkubisa
Copy link
Contributor

gkubisa commented Jun 26, 2018

Ok, so as I understand, you need a document at a specific version and the specific version you need is determined based on the operation timestamp.

Getting Document at a Specific Version

The rich-text type is not invertible, so you'd need the initial snapshot, which ShareDB does not store. Unless you know the initial snapshot for each document (eg it's always empty or stored outside ShareDB), you can't regenerate past snapshots. If your type were invertible, you could start with the latest snapshot and work your way backwards until you reach the snapshot you need.

ShareDB could be extended to support retrieving snapshots at specific versions for any type. In my opinion that functionality should be exposed through the client API, not the server API. In order to make it work for any type, not only invertible types, ShareDB would need to also store an initial snapshot for each document. The clients could then request a specific snapshot - possibly by specifying a version number as a parameter to the fetch function.

I'll also need this functionality, so I might implement it at some point, unless you or someone else wants to do it.

Getting a Version/Snapshot by a Timestamp

I'm not sure how to expose it in the public API in a nice way... the timestamps seem to be an implementation detail to me. Do you have any ideas? There's also always the option of querying the database directly.

@alecgibson
Copy link
Collaborator Author

so you'd need the initial snapshot, which ShareDB does not store

Unless I'm missing something, I always get the create op back when I ask for the ops, which lets us build it up from scratch. (If it matters, we're using sharedb-mongo).

We then apply all ops on top of that creation op, because - as you say - rich-text is (sadly) not invertible.

I'm not sure how to expose [getting a snapshot by timestamp] in the public API in a nice way

Depending on how nice you want to be to people, you could expose it as you suggested above, through the fetch API - perhaps you could specify a version by either version number, or by Date? ShareDb could then use its internal knowledge of the timestamps without exposing that directly.

There's also always the option of querying the database directly

I have thought about it (especially because sharedb-mongo could do with a performance boost), but eg sharedb-mongo performs some sanitization which we'd also have to implement - and at this point, shouldn't this be what the framework cares about, not me?

@gkubisa
Copy link
Contributor

gkubisa commented Jun 26, 2018

@alecgibson , thanks for a great reply. :-)

I always get the create op back when I ask for the ops

You're right, sorry I missed that.

you could expose it as you suggested above, through the fetch API - perhaps you could specify a version by either version number, or by Date?

Sounds good to me.

@alecgibson
Copy link
Collaborator Author

@gkubisa okay, so what we're saying is you'd prefer a PR that actually implements the version-fetching, rather than exposing this options API? If so, makes sense to me - it's always felt a bit weird that all this functionality theoretically exists, but is left to the consumer to implement.

Where would be the best place to discuss planning and implementation? Should I raise an issue to discuss further?

@gkubisa
Copy link
Contributor

gkubisa commented Jun 26, 2018

okay, so what we're saying is you'd prefer a PR that actually implements the version-fetching, rather than exposing this options API?

Yes, I'd prefer a client API for version-fetching because it'd be much easier to use - just 1 param for fetch and you get your result in Doc.data. With the server API you'd explicitly need to load all ops, apply them (3 types of operations: create, del and op), maybe handle projections (I don't know but they might complicate the algorithm in some cases) and finally find a way to deliver the result to the client, most likely through a separate connection.

Another advantage of the simple client API is that it could be optimized internally, without affecting client code. For example:

  • instead of applying all operations from the very beginning, start applying them from the latest create operation preceding the version you want to load.
  • if a type supports invert, the required snapshot could be generated faster by applying inverted operations.
  • we could store snapshots with some operations, which could be used as starting points when generating a snapshot for a specific version.

I don't think we need any of those optimizations for the initial implementation but it's good to know that they can be implemented in a way that's transparent to the client code.

it's always felt a bit weird that all this functionality theoretically exists, but is left to the consumer to implement.

Yes, I completely agree. I'd say that the reason is the lack of time needed to expose that functionality rather than anything else.

Where would be the best place to discuss planning and implementation? Should I raise an issue to discuss further?

Yes, I think a new Issue would be great.

@alecgibson
Copy link
Collaborator Author

I'm closing this and moving discussion to this issue: #218

@alecgibson
Copy link
Collaborator Author

After a bit more work with ShareDB, I think that this change is a valid one to make. Even if we ignore the desire to regenerate snapshots, since being told to store information in op metadata, we now have a legitimate need to access this metadata, and I think it's a fair thing to ask for.

Given that getOps is not preceded by an underscore, I think it's fair for consumers to assume that it's a stable part of the API, especially given that it's very useful for actually getting the ops for a document.

@alecgibson alecgibson reopened this Aug 9, 2018
@nateps
Copy link
Contributor

nateps commented May 8, 2019

If we're going to add options to the backend methods, I'd recommend we do this consistently and add this to all the methods that call into DB methods. Otherwise, no major objections. Seems useful

@alecgibson
Copy link
Collaborator Author

@nateps I've had a look, and I've reminded myself that we do indeed want this functionality - we sometimes want to fetch some ops and check the metadata on them (very helpful for displaying document history). I've had a look at just bypassing Backend.getOps (ie calling shareDb.db.getOps), but obviously this then skips the middleware (and projections), which isn't exactly ideal.

I'm all for also adding options to other methods, but I think we potentially need to discuss if we should add it to all the methods. For example, we could allow options on fetch, but then we should probably also allow them on subscribe, but then it's slightly confusing, because you'd potentially have the same options object being passed to fetch and getOps, which doesn't seem that intuitive to me.

@nateps
Copy link
Contributor

nateps commented May 22, 2019

Let's be explicit and in each case nest the options for the db menthod within an appropriate property. We could use snapshotOptions &opsOptions properties on the options object for example. Let's do this consistently on all methods even if they only need one in case we want to expand functionality in the future.

@alecgibson
Copy link
Collaborator Author

I've added options to getOps and fetch, but subscribe still seems tricky - I'm not sure I can pass options through to the pubsub?

The database adapters allow an `options` object to be passed through to
them for enabling the return of metadata with a snapshot or ops.

Consumers could query the database directly, or even use the database
adapters, but this may give inconsistent results when comparing ops with
those fetched through `getOps`. For example, the Mongo adapter makes
sure that a valid set of ops with unique versions are returned, which
may not be the case when querying the database directly. Fetching ops
and snapshots through `Backend` methods also ensures that we call the
appropriate "sanitize" methods, and trigger the corresponding
middleware.

However, we don't expose this on `Backend.getOps` or `Backend.fetch`.
This change adds an optional `options` argument to these methods, which
can then be used to ask for metadata.

Note that an options argument has been added to `Backend.subscribe`, but
using it will return an error. This is to keep the signature consistent
with `fetch` and `getOps`. However, the implementation is beyond the
scope of this change, because we'd need to add some way to configure
`SubmitRequest.commit` to optionally pass metadata to the appropriate
clients, who provided that given option on `subscribe`.
@alecgibson alecgibson changed the title Allow options to be passed into Backend.getOps Allow options to be passed for fetch and getOps Jun 20, 2019
@alecgibson
Copy link
Collaborator Author

As discussed, I've added the options argument to the signature for Backend.subscribe, but using it will return an error.

Copy link
Contributor

@nateps nateps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'd recommend adding documentation of the Backend class and these methods to the readme.

Thanks!

@alecgibson alecgibson merged commit 0b65164 into share:master Jul 4, 2019
@alecgibson alecgibson deleted the getops-options branch July 4, 2019 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants