Update OPDS #112

gbnewby · 2023-11-06T22:20:47Z

Per an email exchange between Eric and Greg, we would like to update OPDS to version 2.0.

Our currently OPDS is 0.9 and not necessarily working properly.

This will yield the IA/OpenLibrary api which is stable and there are python wrappers for it.

The goal is for OPDS to serve as the main public-facing API offered by Project Gutenberg.

eshellman · 2023-11-07T05:43:07Z

At a meeting last week, I discussed this with the product lead at Palace Project. They indicated that 2024Q1 would be a likely timeframe to look at a new PG feed.

…

On Nov 6, 2023, at 5:20 PM, Greg Newby ***@***.***> wrote: Per an email exchange between Eric and Greg, we would like to update OPDS to version 2.0. Our currently OPDS is 0.9 and not necessarily working properly. This will yield the IA/OpenLibrary api which is stable and there are python wrappers for it. The goal is for OPDS to serve as the main public-facing API offered by Project Gutenberg. — Reply to this email directly, view it on GitHub <#112>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHCGMOVDV6TPI4FFFFA7NDYDFPEXAVCNFSM6AAAAAA7AHCBMGVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DAMJXGM2TCMA>. You are receiving this because you are subscribed to this thread.

gbnewby · 2024-04-13T16:32:40Z

Now that 2024Q1 is complete, would you please follow up with the developers for a status update? @eshellman

ddaws · 2024-12-08T02:29:52Z

Hey @gbnewby @eshellman, I've been reading about OPDS 2.0 and I wanted to propose an API structure for feedback before starting implementation. Please let me know what you think 🙏

Endpoints

The OPDS 2.0 API is read only, so please assume all request are HTTP GET requests to the endpoint. The following proposes a set of initial endpoints, but in the future we could extend this to support exposing more collections and publications based on language, author, series, etc.

`/opds/2`

This is the base URL and would return an OPDS navigation collection referencing other routes. For example, the following feed endpoints could eventually support the newest and top 100 pages on the Project Gutenberg website.

{
  "metadata": {
    "title": "Project Gutenberg OPDS 2.0 API"
  },
  "links": [
    {"rel": "self", "href": "https://gutenberg.org/opds/2", "type": "application/opds+json"}
  ],
  "groups": [
    {
      "metadata": {"title": "Newest"},
      "navigation": [
        {
          "href": "/opds/2/new/1d", 
          "title": "Newest last 24 hours", 
          "type": "application/opds+json", 
          "rel": "http://opds-spec.org/sort/new"
        },
       {
          "href": "/opds/2/new/7d", 
          "title": "Newest last 7 days", 
          "type": "application/opds+json", 
          "rel": "http://opds-spec.org/sort/new"
        },
        {
          "href": "/opds/2/new/30d", 
          "title": "Newest", 
          "type": "application/opds+json", 
          "rel": "http://opds-spec.org/sort/new"
        }
      ]
    },
    {
      "metadata": {"title": "Top 100"},
      "navigation": [
        {
          "href": "/opds/2/top100/books/1d", 
          "title": "Top 100 books in the last 24 hours", 
          "type": "application/opds+json", 
          "rel": "http://opds-spec.org/sort/popular"
        },
        {
          "href": "/opds/2/top100/books/7d", 
          "title": "Top 100 books in the last 7 days", 
          "type": "application/opds+json", 
          "rel": "http://opds-spec.org/sort/popular"
        },
       {
          "href": "/opds/2/top100/books/30d", 
          "title": "Top 100 books in the last 30 days", 
          "type": "application/opds+json", 
          "rel": "http://opds-spec.org/sort/popular"
        },
        {
          "href": "/opds/2/top100/authors/1d", 
          "title": "Top 100 authors in the last 24 hours", 
          "type": "application/opds+json", 
          "rel": "http://opds-spec.org/sort/popular"
        },
        {
          "href": "/opds/2/top100/authors/7d", 
          "title": "Top 100 authors in the last 7 days", 
          "type": "application/opds+json", 
          "rel": "http://opds-spec.org/sort/popular"
        },
        {
          "href": "/opds/2/top100/authors/30d", 
          "title": "Top 100 authors in the last 30 days", 
          "type": "application/opds+json", 
          "rel": "http://opds-spec.org/sort/popular"
        }
      ]
    }
  ]
}

For the newest and top 100 feeds endpoints would be suffixed with the time period (1d, 7d, 30d) because the OPDS spec doesn't include search query parameters for time windowed ranges. If we are comfortable deviating slightly from the spec we could implement these more concisely as search endpoints like

{
  "metadata": {
    "title": "Project Gutenberg OPDS 2.0 API"
  },
  "links": [
    {"rel": "self", "href": "https://gutenberg.org/opds/2", "type": "application/opds+json"}
  ],
  "navigation": [
   {
      "href": "/opds/2/search/authors{?query,from,to,sort,sortOrder,limit,etc...}", 
      "title": "Authors search endpoint", 
      "type": "application/opds+json", 
      "rel": "search"
    },
    {
      "href": "/opds/2/search/books{?query,from,to,sort,sortOrder,limit,etc...}", 
      "title": "Books search endpoint", 
      "type": "application/opds+json", 
      "rel": "search"
    }
  ]
}

In this case we could query the top 100 authors in the last 7 days with the query parameters

GET /opds/2/search/authors?from=now-7d&to=now&sort=downloads&sortOrder=desc&limit=100

Some things to note

from and to could be expressed as an ISO 8601 date time (eg, 2024-12-08T14:30:00Z) or as a relative date marker like today, 1d, 1w, etc. This provides readable query strings and makes testing easier. A list of proposed relative date markers are included at the end of this post
sort and sortOrder determine the way collections are sorted. A list of proposed search parameters are included at the end of this post

The nice thing about this is that it is extensible. We could expose the newest additions to Project Gutenberg via

GET /opds/2/search/books?sort=createdAt&sortOrder=desc # Assuming we track the created at time in the DB

We could expand this to provide feeds for the top Russian authors in the past 7 days by adding support for a language query parameter and using the endpoint

GET /opds/2/search/authors?from=today-7d&to=today&sort=downloads&sortOrder=desc&language=ru

I propose supporting both the search endpoints, and the more verbose fully OPDS 2 compliant endpoints. We could expose an /opds/2/top100/authors/7d endpoint and have this endpoint effectively alias (call through to the search controller class for the author OPDS feed) the /opds/2/search/authors?from=now-7d&to=now&sort=downloads&sortOrder=desc. This way we expose a 100% OPDS 2.0 compliant endpoint (/opds/2/top100/authors/7d), and expose a more extensible search endpoint that we can use to dynamically build collections on.

Note: The "aliasing" would happen in the code by having an endpoint call the controller method for another endpoint with prepopulated query parameters. I can show an example of this in my PR when we're aligned on the structure of the API and I do not think it will require duplicating any code.

`/opds/2/new/{period}`

This endpoint would effectively alias the endpoint /opds/2/search/books{?query,from,to,sort,sortOrder,etc...} endpoint.

For example, the /opds/2/new/7d endpoint would resolve to

GET /opds/2/search/books?from=today-7d&to=today&sort=createdAt&sortOrder=desc

This would return something like the following

{
  "metadata": {
    "title": "Newest additions to Project Gutenberg"
  },
  "links": [
    {"rel": "self", "href": "https://gutenberg.org/opds/2/new/7d", "type": "application/opds+json"}
  ],
  "publications": [
    {
      "metadata": {
        "@type": "http://schema.org/EBook",
        "title": "Moby-Dick",
        "author": "Herman Melville",
        "identifier": "urn:isbn:978031600000X",
        "language": "en",
        "modified": "2015-09-29T17:00:00Z"
      },
      "links": [
        {"rel": "self", "href": "https://gutenberg.org/opds/2/books/by/id/12345", "type": "application/opds-publication+json"},
        { "rel": "http://opds-spec.org/acquisition/open-access", "href": "https://www.gutenberg.org/ebooks/12345.epub.noimages", "type": "application/epub+zip"}
        // ...
      ],
      "images": [
        {"href": "http://example.org/cover.jpg", "type": "image/jpeg", "height": 1400, "width": 800},
        // ...
      ]
    }
    // More books listed...
  ]
}

For any result set that exceeds maxPageSize items pagination would be included based on the pagination parameters defined in the OPDS 2 spec here.

`/opds/2/books/by/id/{bookId}`

This endpoint is linked to from collection endpoints (like our search, new, and top 100 endpoints) and returns the information for a specific publication. The endpoint is structured as /opds/2/book/by/id/{bookId} to give us the flexibility to support retrieving publications by other types of identifiers in the future. For example, in the future we could add support for a /opds/2/books/by/isbn/{bookISBN} in the future. This might be useful for applications that integrate our APIs that aren't aware of our internal IDs but have an ISBN and want to quickly look up publication information against Project Gutenberg.

This endpoint would return OPDS publication information. For example

{
  "metadata": {
    "@type": "http://schema.org/EBook",
    "title": "Moby-Dick",
    "author": "Herman Melville",
    "identifier": "urn:isbn:978031600000X",
    "language": "en",
    "modified": "2015-09-29T17:00:00Z"
  },
  "links": [
    {"rel": "self", "href": "https://gutenberg.org/opds/2/books/by/id/12345", "type": "application/opds-publication+json"},
    { "rel": "http://opds-spec.org/acquisition/open-access", "href": "https://www.gutenberg.org/ebooks/12345.epub.noimages", "type": "application/epub+zip"}
    // ...
  ],
  "images": [
    {"href": "http://example.org/cover.jpg", "type": "image/jpeg", "height": 1400, "width": 800},
    // ...
  ]
}

Initially we will return basic information about the book and files associated to the book, and in the future we can add support for

Linking to series collections
Linking to translations
Providing acquisition links
Etc...

The main difference between the information returned in the search endpoint and the publication endpoint are

The results of a search endpoint may change over time (number of downloads change, new books, etc)
The publication results in the search endpoint will always be concise to keep the response small
The publication results in the publication endpoint can layer in more information and data as we support it
The publication endpoint is stable, ie always provides information on the same publication

Summary

I think that we should implement fully OPDS 2.0 compliant endpoints to ensure we support applications that strictly implement the spec, and we should implement search endpoints that implement a superset of the spec to support querying and building collections dynamically. I do not think this will significantly increase the complexity or create code duplication, and should make it easier to implement more facets and collections in the future.

I would really appreciate feedback, and when we are aligned I can propose an implementation plan to break this into multiple PRs so we can merge endpoints one at a time and see incremental progress 😃

Appendix

Proposed relative date markers

now --> the current time
today --> the start of day server time, to align with interval periods that updates top 100 collections
Nd --> N days, used as now-3d or today-7d

We could add support for weeks (w) and minutes (m), and optionally yesterday (yesterday, aka today-1d) but I don't know if we have a real use case for this.

Proposed query parameters

All of the query parameters defined in the Readium Default Context as part of the OPDS 2.0 spec
from --> The starting date time in a time windowed search
to --> The ending date time in a time windowed search
sort --> The property to sort on
- downloads --> The number of downloads
- createdAt --> The date time the publication (ebook) was added to Project Gutenberg
sortOrder --> asc or desc
limit --> The maximum number of results

Edits

Fixed spelling of OPDS and opds (I had written ODPS and odps and then copy-pasted everywhere 😅 )
Fixed url (http://projectgutenberg.org => https://gutenberg.org)

gbnewby · 2024-12-08T17:06:16Z

Thanks for this careful analysis. I'll rely on Eric to provide input on these details. My curiosity is about the output these queries will generate. How might the output be used to, for example, build web pages?

…

On Sat, Dec 7, 2024 at 7:30 PM Dawson ***@***.***> wrote: Hey @gbnewby <https://github.com/gbnewby> @eshellman <https://github.com/eshellman>, I've been reading about OPDS 2.0 and I wanted to propose an API structure for feedback before starting implementation. Please let me know what you think 🙏 Endpoints The ODPS 2.0 API is read only, so please assume all request are HTTP GET requests to the endpoint. The following proposes a set of initial endpoints, but in the future we could extend this to support exposing more collections and publications based on language, author, series, etc. /odps/2 This is the base URL and would return an ODPS navigation collection referencing other routes. For example, the following feed endpoints could eventually support the newest and top 100 pages on the Project Gutenberg website. { "metadata": { "title": "Project Gutenberg ODPS 2.0 API" }, "links": [ {"rel": "self", "href": "http://projectgutenberg.org/opds/2", "type": "application/opds+json"} ], "groups": [ { "metadata": {"title": "Newest"}, "navigation": [ { "href": "/odps/2/new/1d", "title": "Newest last 24 hours", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/new" }, { "href": "/odps/2/new/7d", "title": "Newest last 7 days", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/new" }, { "href": "/odps/2/new/30d", "title": "Newest", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/new" } ] }, { "metadata": {"title": "Top 100"}, "navigation": [ { "href": "/odps/2/top100/books/1d", "title": "Top 100 books in the last 24 hours", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/popular" }, { "href": "/odps/2/top100/books/7d", "title": "Top 100 books in the last 7 days", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/popular" }, { "href": "/odps/2/top100/books/30d", "title": "Top 100 books in the last 30 days", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/popular" }, { "href": "/odps/2/top100/authors/1d", "title": "Top 100 authors in the last 24 hours", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/popular" }, { "href": "/odps/2/top100/authors/7d", "title": "Top 100 authors in the last 7 days", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/popular" }, { "href": "/odps/2/top100/authors/30d", "title": "Top 100 authors in the last 30 days", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/popular" } ] } ] } For the newest and top 100 feeds endpoints would be suffixed with the time period (1d, 7d, 30d) because the OPDS spec doesn't include search query parameters for time windowed ranges. If we are comfortable deviating slightly from the spec we could implement these more concisely as search endpoints like { "metadata": { "title": "Project Gutenberg ODPS 2.0 API" }, "links": [ {"rel": "self", "href": "http://projectgutenberg.org/opds/2", "type": "application/opds+json"} ], "navigation": [ { "href": "/odps/2/search/authors{?query,from,to,sort,sortOrder,limit,etc...}", "title": "Authors search endpoint", "type": "application/opds+json", "rel": "search" }, { "href": "/odps/2/search/books{?query,from,to,sort,sortOrder,limit,etc...}", "title": "Books search endpoint", "type": "application/opds+json", "rel": "search" } ] } In this case we could query the top 100 authors in the last 7 days with the query parameters GET /odps/2/search/authors?from=now-7d&to=now&sort=downloads&sortOrder=desc&limit=100 Some things to note - from and to could be expressed as an ISO 8601 date time (eg, 2024-12-08T14:30:00Z) or as a relative date marker like today, 1d, 1w, etc. This provides readable query strings and makes testing easier. A list of proposed relative date markers are included at the end of this post - sort and sortOrder determine the way collections are sorted. A list of proposed search parameters are included at the end of this post The nice thing about this is that it is extensible. We could expose the newest additions to Project Gutenberg via GET /odps/2/search/books?sort=createdAt&sortOrder=desc # Assuming we track the created at time in the DB We could expand this to provide feeds for the top Russian authors in the past 7 days by adding support for a language query parameter and using the endpoint GET /odps/2/search/authors?from=today-7d&to=today&sort=downloads&sortOrder=desc&language=ru I propose supporting both the search endpoints, and the more verbose fully ODPS 2 compliant endpoints. We could expose an /odps/2/top100/authors/7d endpoint and have this endpoint effectively alias (call through to the search controller class for the author ODPS feed) the /odps/2/search/authors?from=now-7d&to=now&sort=downloads&sortOrder=desc. This way we expose a 100% ODPS 2.0 compliant endpoint ( /odps/2/top100/authors/7d), and expose a more extensible search endpoint that we can use to dynamically build collections on. *Note:* The "aliasing" would happen in the code by having an endpoint call the controller method for another endpoint with prepopulated query parameters. I can show an example of this in my PR when we're aligned on the structure of the API and I do not think it will require duplicating any code. /odps/2/new/{period} This endpoint would effectively alias the endpoint /odps/2/search/books{?query,from,to,sort,sortOrder,etc...} endpoint. For example, the /odps/2/new/7d endpoint would resolve to GET /odps/2/search/books?from=today-7d&to=today&sort=createdAt&sortOrder=desc This would return something like the following { "metadata": { "title": "Newest additions to Project Gutenberg" }, "links": [ {"rel": "self", "href": "http://projectgutenberg.org/odps/2/new/7d", "type": "application/opds+json"} ], "publications": [ { "metadata": { ***@***.***": "http://schema.org/EBook", "title": "Moby-Dick", "author": "Herman Melville", "identifier": "urn:isbn:978031600000X", "language": "en", "modified": "2015-09-29T17:00:00Z" }, "links": [ {"rel": "self", "href": "http://projectgutenberg.org/odps/2/books/by/id/12345", "type": "application/opds-publication+json"}, { "rel": "http://opds-spec.org/acquisition/open-access", "href": "https://www.gutenberg.org/ebooks/12345.epub.noimages", "type": "application/epub+zip"} // ... ], "images": [ {"href": "http://example.org/cover.jpg", "type": "image/jpeg", "height": 1400, "width": 800}, // ... ] } // More books listed... ] } For any result set that exceeds maxPageSize items pagination would be included based on the pagination parameters defined in the ODPS 2 spec here <https://drafts.opds.io/opds-2.0.html#4-pagination>. /odps/2/books/by/id/{bookId} This endpoint is linked to from collection endpoints (like our search, new, and top 100 endpoints) and returns the information for a specific publication. The endpoint is structured as odps/2/book/by/id/{bookId} to give us the flexibility to support retrieving publications by other types of identifiers in the future. For example, in the future we could add support for a /odps/2/books/by/isbn/{bookISBN} in the future. This might be useful for applications that integrate our APIs that aren't aware of our internal IDs but have an ISBN and want to quickly look up publication information against Project Gutenberg. This endpoint would return ODPS publication information. For example { "metadata": { ***@***.***": "http://schema.org/EBook", "title": "Moby-Dick", "author": "Herman Melville", "identifier": "urn:isbn:978031600000X", "language": "en", "modified": "2015-09-29T17:00:00Z" }, "links": [ {"rel": "self", "href": "http://projectgutenberg.org/odps/2/books/by/id/12345", "type": "application/opds-publication+json"}, { "rel": "http://opds-spec.org/acquisition/open-access", "href": "https://www.gutenberg.org/ebooks/12345.epub.noimages", "type": "application/epub+zip"} // ... ], "images": [ {"href": "http://example.org/cover.jpg", "type": "image/jpeg", "height": 1400, "width": 800}, // ... ] } Initially we will return basic information about the book and files associated to the book, and in the future we can add support for - Linking to series collections - Linking to translations - Providing acquisition links - Etc... The main difference between the information returned in the search endpoint and the publication endpoint are - The results of a search endpoint may change over time (number of downloads change, new books, etc) - The publication results in the search endpoint will always be concise to keep the response small - The publication results in the publication endpoint can layer in more information and data as we support it - The publication endpoint is stable, ie always provides information on the same publication Summary I think that we should implement fully ODPS 2.0 compliant endpoints to ensure we support applications that strictly implement the spec, *and* we should implement search endpoints that implement a superset of the spec to support querying and building collections dynamically. I do not think this will significantly increase the complexity or create code duplication, and should make it easier to implement more facets and collections in the future. I would really appreciate feedback, and when we are aligned I can propose an implementation plan to break this into multiple PRs so we can merge endpoints one at a time and see incremental progress 😃 Appendix Proposed relative date markers - now --> the current time - today --> the start of day server time, to align with interval periods that updates top 100 collections - Nd --> N days, used as now-3d or today-7d We could add support for weeks (w) and minutes (m), and optionally yesterday (yesterday, aka today-1d) but I don't know if we have a real use case for this. Proposed query parameters - All of the query parameters defined in the Readium Default Context <https://github.com/readium/webpub-manifest/tree/master/contexts/default> as part of the ODPS 2.0 spec - from --> The starting date time in a time windowed search - to --> The ending date time in a time windowed search - sort --> The property to sort on - downloads --> The number of downloads - createdAt --> The date time the publication (ebook) was added to Project Gutenberg - sortOrder --> asc or desc - limit --> The maximum number of results — Reply to this email directly, view it on GitHub <#112 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFQRDLUV6O3ACJTFVEC2M5D2EOVLLAVCNFSM6AAAAABPGZDSC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRVGM4DMMZZGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ddaws · 2024-12-09T01:13:26Z

My curiosity is about the output these queries will generate

The output format is strictly defined by the OPDS 2.0spec, but how we structure our API, aka the endpoints/routes we expose, is up to us. The OPDS 2.0 is pretty good, and because it relies on JSON-LD it makes it easy to "discover" endpoints by following linking data.

How might the output be used to [...] build web pages?

For example, the "Frequently Downloaded" page that lists top 100 books yesterday could be populated by sending a HTTP GET request to /opds/2/top100/books/1d. This would return a result like

{
  "metadata": {
    "title": "Top 100 books yesterday"
  },
  "links": [
    {"rel": "self", "href": "https://gutenberg.org/opds/2/top100/books/1d", "type": "application/opds+json"}
  ],
  "publications": [
    {
      "metadata": {
        "@type": "http://schema.org/EBook",
        "title": "Moby-Dick",
        "author": "Herman Melville",
        "identifier": "urn:isbn:978031600000X",
        "language": "en",
        "modified": "2015-09-29T17:00:00Z"
      },
      "links": [
        {"rel": "self", "href": "https://gutenberg.org/opds/2/books/by/id/12345", "type": "application/opds-publication+json"},
        { "rel": "http://opds-spec.org/acquisition/open-access", "href": "https://gutenberg.org/ebooks/12345.epub.noimages", "type": "application/epub+zip"}
        // ...
      ],
      "images": [
        {"href": "http://example.org/cover.jpg", "type": "image/jpeg", "height": 1400, "width": 800},
        // ...
      ]
    }
    // More books listed...
  ]
}

This includes all of the information required to populate the "Top 100 EBooks yesterday" list, and more, and the OPDS spec supports additional metadata if we want to improve the listing (to include images, download links, alt languages, etc)

The page could be server side rendered by having the page controller call through to the OPDS controller, or could be client side rendered by having the client browser call the /opds/2/new/1d endpoint.

Similarly the feed of latest books on the landing page could be populated by sending a HTTP GET request to /opds/2/new/1d. This endpoint would return the exact same JSON structure (defined in the OPDS spec) with different publications. The response could also include paging parameters so the client could scroll through the latest additions.

eshellman · 2024-12-09T17:29:52Z

Have you downloaded an OPDS client? OPDS was designed for phone based apps to show "lanes" of books. Like netflix. So the most natural channels for PG (along with the top feeds would probably be the "bookshelves". Our current implementation is triggered by adding ".opds" to a url. This is not a common implementation. So for example, https://gutenberg.org/ebooks/25344.opds insstead of https://gutenberg.org/ebooks/25344 or https://gutenberg.org/ebooks/bookshelf/435.opds instead of https://gutenberg.org/ebooks/bookshelf/435 We should probably leave these endpoints to avoid breaking existing clients. The endpoints you suggest are not in use, so can be implemented for testing without breaking existing use. Or where you want to overlay functionality, add a version param. I would focus on first updating existing functions rather than inventing new ones. Then work with api consumers to address their specific needs. I strongly suggest avoiding custom parameters unless there is a popular api client where it is in wide use.

…

On Dec 7, 2024, at 9:30 PM, Dawson ***@***.***> wrote: Hey @gbnewby <https://github.com/gbnewby> @eshellman <https://github.com/eshellman>, I've been reading about OPDS 2.0 and I wanted to propose an API structure for feedback before starting implementation. Please let me know what you think 🙏 Endpoints The ODPS 2.0 API is read only, so please assume all request are HTTP GET requests to the endpoint. The following proposes a set of initial endpoints, but in the future we could extend this to support exposing more collections and publications based on language, author, series, etc. /odps/2 This is the base URL and would return an ODPS navigation collection referencing other routes. For example, the following feed endpoints could eventually support the newest and top 100 pages on the Project Gutenberg website. { "metadata": { "title": "Project Gutenberg ODPS 2.0 API" }, "links": [ {"rel": "self", "href": "http://projectgutenberg.org/opds/2", "type": "application/opds+json"} ], "groups": [ { "metadata": {"title": "Newest"}, "navigation": [ { "href": "/odps/2/new/1d", "title": "Newest last 24 hours", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/new" }, { "href": "/odps/2/new/7d", "title": "Newest last 7 days", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/new" }, { "href": "/odps/2/new/30d", "title": "Newest", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/new" } ] }, { "metadata": {"title": "Top 100"}, "navigation": [ { "href": "/odps/2/top100/books/1d", "title": "Top 100 books in the last 24 hours", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/popular" }, { "href": "/odps/2/top100/books/7d", "title": "Top 100 books in the last 7 days", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/popular" }, { "href": "/odps/2/top100/books/30d", "title": "Top 100 books in the last 30 days", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/popular" }, { "href": "/odps/2/top100/authors/1d", "title": "Top 100 authors in the last 24 hours", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/popular" }, { "href": "/odps/2/top100/authors/7d", "title": "Top 100 authors in the last 7 days", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/popular" }, { "href": "/odps/2/top100/authors/30d", "title": "Top 100 authors in the last 30 days", "type": "application/opds+json", "rel": "http://opds-spec.org/sort/popular" } ] } ] } For the newest and top 100 feeds endpoints would be suffixed with the time period (1d, 7d, 30d) because the OPDS spec doesn't include search query parameters for time windowed ranges. If we are comfortable deviating slightly from the spec we could implement these more concisely as search endpoints like { "metadata": { "title": "Project Gutenberg ODPS 2.0 API" }, "links": [ {"rel": "self", "href": "http://projectgutenberg.org/opds/2", "type": "application/opds+json"} ], "navigation": [ { "href": "/odps/2/search/authors{?query,from,to,sort,sortOrder,limit,etc...}", "title": "Authors search endpoint", "type": "application/opds+json", "rel": "search" }, { "href": "/odps/2/search/books{?query,from,to,sort,sortOrder,limit,etc...}", "title": "Books search endpoint", "type": "application/opds+json", "rel": "search" } ] } In this case we could query the top 100 authors in the last 7 days with the query parameters GET /odps/2/search/authors?from=now-7d&to=now&sort=downloads&sortOrder=desc&limit=100 Some things to note from and to could be expressed as an ISO 8601 date time (eg, 2024-12-08T14:30:00Z) or as a relative date marker like today, 1d, 1w, etc. This provides readable query strings and makes testing easier. A list of proposed relative date markers are included at the end of this post sort and sortOrder determine the way collections are sorted. A list of proposed search parameters are included at the end of this post The nice thing about this is that it is extensible. We could expose the newest additions to Project Gutenberg via GET /odps/2/search/books?sort=createdAt&sortOrder=desc # Assuming we track the created at time in the DB We could expand this to provide feeds for the top Russian authors in the past 7 days by adding support for a language query parameter and using the endpoint GET /odps/2/search/authors?from=today-7d&to=today&sort=downloads&sortOrder=desc&language=ru I propose supporting both the search endpoints, and the more verbose fully ODPS 2 compliant endpoints. We could expose an /odps/2/top100/authors/7d endpoint and have this endpoint effectively alias (call through to the search controller class for the author ODPS feed) the /odps/2/search/authors?from=now-7d&to=now&sort=downloads&sortOrder=desc. This way we expose a 100% ODPS 2.0 compliant endpoint (/odps/2/top100/authors/7d), and expose a more extensible search endpoint that we can use to dynamically build collections on. Note: The "aliasing" would happen in the code by having an endpoint call the controller method for another endpoint with prepopulated query parameters. I can show an example of this in my PR when we're aligned on the structure of the API and I do not think it will require duplicating any code. /odps/2/new/{period} This endpoint would effectively alias the endpoint /odps/2/search/books{?query,from,to,sort,sortOrder,etc...} endpoint. For example, the /odps/2/new/7d endpoint would resolve to GET /odps/2/search/books?from=today-7d&to=today&sort=createdAt&sortOrder=desc This would return something like the following { "metadata": { "title": "Newest additions to Project Gutenberg" }, "links": [ {"rel": "self", "href": "http://projectgutenberg.org/odps/2/new/7d", "type": "application/opds+json"} ], "publications": [ { "metadata": { ***@***.***": "http://schema.org/EBook", "title": "Moby-Dick", "author": "Herman Melville", "identifier": "urn:isbn:978031600000X", "language": "en", "modified": "2015-09-29T17:00:00Z" }, "links": [ {"rel": "self", "href": "http://projectgutenberg.org/odps/2/books/by/id/12345", "type": "application/opds-publication+json"}, { "rel": "http://opds-spec.org/acquisition/open-access", "href": "https://www.gutenberg.org/ebooks/12345.epub.noimages", "type": "application/epub+zip"} // ... ], "images": [ {"href": "http://example.org/cover.jpg", "type": "image/jpeg", "height": 1400, "width": 800}, // ... ] } // More books listed... ] } For any result set that exceeds maxPageSize items pagination would be included based on the pagination parameters defined in the ODPS 2 spec here <https://drafts.opds.io/opds-2.0.html#4-pagination>. /odps/2/books/by/id/{bookId} This endpoint is linked to from collection endpoints (like our search, new, and top 100 endpoints) and returns the information for a specific publication. The endpoint is structured as odps/2/book/by/id/{bookId} to give us the flexibility to support retrieving publications by other types of identifiers in the future. For example, in the future we could add support for a /odps/2/books/by/isbn/{bookISBN} in the future. This might be useful for applications that integrate our APIs that aren't aware of our internal IDs but have an ISBN and want to quickly look up publication information against Project Gutenberg. This endpoint would return ODPS publication information. For example { "metadata": { ***@***.***": "http://schema.org/EBook", "title": "Moby-Dick", "author": "Herman Melville", "identifier": "urn:isbn:978031600000X", "language": "en", "modified": "2015-09-29T17:00:00Z" }, "links": [ {"rel": "self", "href": "http://projectgutenberg.org/odps/2/books/by/id/12345", "type": "application/opds-publication+json"}, { "rel": "http://opds-spec.org/acquisition/open-access", "href": "https://www.gutenberg.org/ebooks/12345.epub.noimages", "type": "application/epub+zip"} // ... ], "images": [ {"href": "http://example.org/cover.jpg", "type": "image/jpeg", "height": 1400, "width": 800}, // ... ] } Initially we will return basic information about the book and files associated to the book, and in the future we can add support for Linking to series collections Linking to translations Providing acquisition links Etc... The main difference between the information returned in the search endpoint and the publication endpoint are The results of a search endpoint may change over time (number of downloads change, new books, etc) The publication results in the search endpoint will always be concise to keep the response small The publication results in the publication endpoint can layer in more information and data as we support it The publication endpoint is stable, ie always provides information on the same publication Summary I think that we should implement fully ODPS 2.0 compliant endpoints to ensure we support applications that strictly implement the spec, and we should implement search endpoints that implement a superset of the spec to support querying and building collections dynamically. I do not think this will significantly increase the complexity or create code duplication, and should make it easier to implement more facets and collections in the future. I would really appreciate feedback, and when we are aligned I can propose an implementation plan to break this into multiple PRs so we can merge endpoints one at a time and see incremental progress 😃 Appendix Proposed relative date markers now --> the current time today --> the start of day server time, to align with interval periods that updates top 100 collections Nd --> N days, used as now-3d or today-7d We could add support for weeks (w) and minutes (m), and optionally yesterday (yesterday, aka today-1d) but I don't know if we have a real use case for this. Proposed query parameters All of the query parameters defined in the Readium Default Context <https://github.com/readium/webpub-manifest/tree/master/contexts/default> as part of the ODPS 2.0 spec from --> The starting date time in a time windowed search to --> The ending date time in a time windowed search sort --> The property to sort on downloads --> The number of downloads createdAt --> The date time the publication (ebook) was added to Project Gutenberg sortOrder --> asc or desc limit --> The maximum number of results — Reply to this email directly, view it on GitHub <#112 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHCGMLPUU7DJVSO5B2BHET2EOVLLAVCNFSM6AAAAABPGZDSC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRVGM4DMMZZGI>. You are receiving this because you were mentioned.

ddaws · 2024-12-10T01:35:31Z

Have you downloaded an OPDS client?

I haven't 😅 I will do this today.

So the most natural channels for PG (along with the top feeds would probably be the "bookshelves".

Exposing endpoints for bookshelves makes a lot of sense. My initial goal would be to expose feeds for top and new, and then add support for bookshelves. I think this is okay because the base path, /opds/2 would return a OPDS 2.0 group of navigations. This uses JSON-LD to effectively tells consumers what endpoints to hit to get specific publication feeds.

So we could start by just exposing the top and new feeds because these are an easier first implementation, and then we could add in bookshelves soon thereafter.

Our current implementation is triggered by adding ".opds" to a url. This is not a common implementation. So for example, https://gutenberg.org/ebooks/25344.opds insstead of https://gutenberg.org/ebooks/25344 or https://gutenberg.org/ebooks/bookshelf/435.opds instead of https://gutenberg.org/ebooks/bookshelf/435

This makes sense. I want to avoid changing the current OPDS 1.x implementation to avoid breaking any consumers that I am not aware of. I think that we should implement an entirely different set of paths (base path = /opds/2/) because

It gives us flexibility to support different path patterns in the future
- For example, /opds/2/books/by/id/{id} or /opds/2/books/by/isbn/{isbn}. This allows us to resolve the same information using different identifiers which could be useful to integrators. We don't need to do this unless there is a use case, but mounting this API under a different base path gives us this flexibility in the future.
It simplifies Apache routing rules
- We could run two autocat3 processes. We could route all non /opds/2 requests to process 1, and all /opds/2/* to process 2. This would allow us to assign different resources to each process (via Linux cgroups) to ensure the API doesn't starve out the main autocat3 process and vice versa.

We can return the Content-Type: application/opds+json header to tell consumers these routes return OPDS 2.0 in the response.

Then work with api consumers to address their specific needs

Yup, makes a lot of sense 👍

I am going to go use some OPDS clients to get a better first hand understanding and will follow up. It probably also doesn't hurt to start work on a PoC, and we can change the paths around as we get aligned 🙂

eshellman · 2024-12-10T16:05:56Z

On Dec 9, 2024, at 8:35 PM, Dawson ***@***.***> wrote: This makes sense. I want to avoid changing the current OPDS 1.x implementation to avoid breaking any consumers that I am not aware of. I think that we should implement an entirely different set of paths (base path = /opds/2/) because It gives us flexibility to support different path patterns in the future For example, /opds/2/books/by/id/{id} or /opds/2/books/by/isbn/{isbn}. This allows us to resolve the same information using different identifiers which could be useful to integrators. We don't need to do this unless there is a use case, but mounting this API under a different base path gives us this flexibility in the future. It simplifies Apache routing rules We could run two autocat3 processes. We could route all non /opds/2 requests to process 1, and all /opds/2/* to process 2. This would allow us to assign different resources to each process (via Linux cgroups) to ensure the API doesn't starve out the main autocat3 process and vice versa.

we'd need a single Apache rule to route /opds/* to autocat3 Autocat3 routing is in CherryPyApp.py. No need for separate processes. Bottleneck will always be the database.

eshellman · 2024-12-10T16:38:42Z

I've created an opds branch you can target

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update OPDS #112

Update OPDS #112

gbnewby commented Nov 6, 2023

eshellman commented Nov 7, 2023 via email

gbnewby commented Apr 13, 2024 •

edited

Loading

ddaws commented Dec 8, 2024 •

edited

Loading

gbnewby commented Dec 8, 2024 via email

ddaws commented Dec 9, 2024

eshellman commented Dec 9, 2024 via email

ddaws commented Dec 10, 2024

eshellman commented Dec 10, 2024 via email

eshellman commented Dec 10, 2024

Update OPDS #112

Update OPDS #112

Comments

gbnewby commented Nov 6, 2023

eshellman commented Nov 7, 2023 via email

gbnewby commented Apr 13, 2024 • edited Loading

ddaws commented Dec 8, 2024 • edited Loading

Endpoints

/opds/2

/opds/2/new/{period}

/opds/2/books/by/id/{bookId}

Summary

Appendix

Proposed relative date markers

Proposed query parameters

Edits

gbnewby commented Dec 8, 2024 via email

ddaws commented Dec 9, 2024

eshellman commented Dec 9, 2024 via email

ddaws commented Dec 10, 2024

eshellman commented Dec 10, 2024 via email

eshellman commented Dec 10, 2024

gbnewby commented Apr 13, 2024 •

edited

Loading

ddaws commented Dec 8, 2024 •

edited

Loading

`/opds/2`

`/opds/2/new/{period}`

`/opds/2/books/by/id/{bookId}`