Skip to content

Latest commit

 

History

History
155 lines (94 loc) · 9.46 KB

README.md

File metadata and controls

155 lines (94 loc) · 9.46 KB

Grove

Build Status

Grove is a generic document store layered on top of an actual database such as PostgreSQL. It can store and index structured documents like comments, blog posts, events etc. and organize these documents for easy retrieval later.

Data model

In Grove, a document is any dictionary-type object that can be represented with a JSON hash, including nested hashes. Grove organizes these documents inside posts; a post represents the "posting" of a document. All Grove APIs deal with posts, not documents directly.

A post collects the following information:

  • UID — a unique ID (see later section on UIDs).
  • Class — a class which indicates the type of document, typically being application-specific. Grove classes always have the prefix post., eg. post.comment.
  • Paths — paths that the document is associated with (there must be at least one).
  • Tags — a list of simple string tags.
  • Timestamps — timestamps for retrieving documents on a timeline.
  • Document — the document proper.
  • Sensitive data — data only available to the identity that has edit access, or to "god" identities.

Document class

The document class is an application-specific period-delimited string of identifiers. This can be used to filter queries, and can be used to distinguish documents of different types from each other.

Examples of typical class names:

  • post.blog
  • post.event
  • post.user_profile

Note that the first identifier must always be post. This signals to other applications that this specific object is handled by Grove.

Paths

Grove's document database can be viewed as a hierarchy of folders. Every document must be associated with at least one path. Paths with wildcards are typically used to query Grove for content.

A path is a period-delimited list of folder names. The first name must be the realm of the document (see Checkpoint for more on realms). The second name is by convention an application identifier, while the rest of the path is application-specific.

Examples of paths:

  • acmecorp.calendarapp.events.facebook
  • acmecorp.blogs.postings
  • acmecorp.blogs.football.postings
  • acmecorp.users

A document has one canonical path, which is where the "original" document is stored. If you need the document to appear in multiple places in the folder hierarchy you may post it to multiple paths, which will act like "symlinks" to the document, and enable the document to appear in query results as if it were stored in all the provided paths; in reality, the original document is always returned. If the underlying document is updated, it will be updated for all paths.

Folders are created automatically whenever a document is posted; you don't have to manually create them. Any path you postulate is acceptable as long as it is within the realm of your application.

Child objects

It is a convention in Pebble applications to put children of an object in a "subfolder" of its canonical path. A subpath is (conventionally) generated by appending the numeric ID of the parent object to the path and storing the children there.

For example, an article in the football blog:

post.article:acmecorp.blogs.football$323

The numeric ID here is 323. Comments on that article would be inside the path:

post.comment:acmecorp.blogs.football.323

A comment posted to that path would get an UID such as this:

post.comment:acmecorp.blogs.football.323$534

Tags

A set og tags may be applied to any document and subsequently be used to constrain results in queries. A tag is an opaque identifier that may contain letters, digits and underscores (but not spaces).

Occurrences

A document may also be organized on a timeline. A document may have any number of timestamps (occurrences) attached to it. Each timestamp is labeled. This can be used to model start-/end-times for events, or due dates for tasks.

When querying Grove, the result set can be constrained to documents with a specific labeled occurrence and optionally only documents with such an occurrence within a specified time window. This would typically be used to retrieve events that occur on a specific date, or tasks that are overdue.

Synchronization from external sources

When synchronizing data from external sources, you should give the document an external ID (external_id). The external ID may be any string, it may e.g. be the URL or database ID of the source object. The important thing is that it is invariant for the given source object, and that it is unique within the realm of your application. This ensures that updates written by multiple concurrent workers never results in duplicates.

Additionally, Grove supports external documents. If the content of the source document is synchronized to Grove as an external_document (not document) and local edits are written to the document field, Grove ensures that consecutive synchronization operations will not overwrite local edits, while fields that do not have local edits will still be updated from source. An example:

  • An event is synchronized from facebook to Grove. The fields are written to the external_document, document is blank.
  • An editor determines that the title of the event is unhelpful ("Big Launch!!!") and creates a local edit writing {"title": "Launch of the new Wagner Niebelung Ring Lego Kits!!!"}
  • The document now contains the key title while the rest of the content is in external_document.
  • A client requesting the document will see the merged content of external_document and document.
  • An updated event is synchronized from facebook. The updated document is written to external_document. The body and title of the source document has been updated from the source.
  • A client requesting the document sees the updated body, while the title is overridden by the content of document.
  • Since the external_document is newer than the document and an updated field is overridden the document is now marked as "conflicted" in Grove. An application may provide an interface to the user to resolve this conflict and update the document.

UIDs

Across all pebbles Grove documents are identified by their UIDs. The UID of a Grove document always has base class post. UIDs have the format:

<klass>:<canonical path>$<id>

Typical UIDs will look like this:

  • post.event:acmecorp.calendarapp.events.facebook$121
  • post.comment:acmecorp.blogs.fotball.postings.121$453211

Custom permission policies

Grove supports Checkpoint callbacks. You may override Grove's internal rules about who has permissions to create, update and delete what by implementing callbacks.

See the Checkpoint documentation for details on how to do this.

Sensitive and protected data

Every post can have data set in the sensitive field. This data is only readable (and writeable) by an identity which has edit access to the post.

Similarly, the protected column is only readable (and writerable) by an identity which has the god flag set.

Tags

Every post can have a list of tags. A tag is a basic keyword containing only non-space characters. Tags can be used to query documents.

Published status

A post is either visible or invisible. When published is true, it is always visible; when false, it is only visible if a query provides the parameter unpublished=include or unpublished=only.

Deleted

A post can be marked as deleted. When deleted is true, it is invisible to queries unless the query provides the parameter deleted=include.

API

Querying

To query for one or more documents, one performs as GET to:

/api/grove/v1/posts/<UID>?...

For example:

/api/grove/v1/posts/post:acmecorp.*?tags=unpaid

The UID part is the class, path and ID to search, and forms the basic query. All parts of this UID can contain wildcards, eg. *:acmecorp.invoices.*.

Additional parameters:

  • external_id: Filter by external ID.
  • tags: Constrain query by tags. Either a comma separated list of required tags or a boolean expression like 'paris & !texas' or 'closed & (failed | pending)'.
  • created_by: Filter by documents created by a Checkpoint identity (specified by UID).
  • created_after: Filter by documents created after this date (ISO 8601 date).
  • created_before: Filter by documents created before this date (ISO 8601 date).
  • unpublished: Either include (accessible unpublished posts will be included with the result) or only (only accessible unpublished posts will be included with the result). The default is to only include published documents.
  • deleted: If include, accessible deleted posts will be included with the result. The default is to exclude deleted documents.
  • occurrence[label]: Require that the post have an occurrence with the label specified in this parameter.
  • occurrence[from]: The occurrences must be later than this timestamp (ISO 8601 timestamp).
  • occurrence[to]: The occurrences must be earlier than this timestamp (ISO 8601 timestamp).
  • occurrence[order]: Order either by asc (ascending) or desc (descending).
  • limit: The maximum amount of posts to return.
  • offset: The index of the first result to return (for pagination).
  • sort_by: Field to sort by. Defaults to created_at.
  • direction: Direction of sort, either desc (descending; default) or asc (ascending).