Skip to content

Conversation

@imalsogreg
Copy link
Contributor

@imalsogreg imalsogreg commented Feb 22, 2019

Replacement for #399

Add two endpoints, which reuse existing URLs but use the requested content type to choose a JSON response instead of HTML. The new endpoints provide basic information about a package (author, description, license), and a listing of available versions for a package, along with each version's deprecation status.

For example:

/package/avro-0.4.1.2

{
    "author": "Thomas M. DuBuisson", 
    "description": "Avro serialization and deserialization support for Haskell", 
    "license": "BSD-3-Clause", 
    "metadata_revision": 0
}

/package/avro

{
    "0.4.1.1": "normal", 
    "0.4.1.2": "deprecated", 
    "0.4.1.4": "normal"
}

For the first (single-version) view, the URL can specify a metadata revision:
/packages/avro-0.4.1.1/revision/1

{
    "author": "Thomas M. DuBuisson", 
    "description": "Avro serialization and deserialization support!", 
    "license": "BSD-3-Clause", 
    "metadata_revision": 1
}

(when no revision is specified, the most recent one is used)

NOTE: There is no caching implemented for these endpoints. Is it Ok to assume that the computations being done (parsing several cabal files per request) are cheap enough to justify not adding some AcidState caching?

There is no dependencies data in this PR. I'm thinking of adding that in a second PR, after talking with @hvr about his ideas for formatting there.

/cc @hvr @alexcmcdaniel @gbaz

@imalsogreg
Copy link
Contributor Author

There's at least one error: package/avro returns the JSON result even when we aren't requesting json.

@alexcmcdaniel
Copy link

Would it be possible to include homepage as well?

@imalsogreg
Copy link
Contributor Author

Fixed and cleaned up. Ready for review :)

@imalsogreg imalsogreg mentioned this pull request Feb 23, 2019
@hvr
Copy link
Member

hvr commented Feb 25, 2019

there is no caching implemented for these endpoints. Is it Ok to assume that the computations being done (parsing several cabal files per request) are cheap enough to justify not adding some AcidState caching?

Depends... there's some packages which have 150+ releases and there's also .cabal files which are not that cheap to parse and take a significant amount of time and space to parse. But I don't think you need to parse all .cabal files for any single service request here?

Btw, when there's a description field, I'd also expect the synopsis-field to be present. And if you extract the license-information, then you also ought to extract the copyright: property imo.

@imalsogreg
Copy link
Contributor Author

@hvr re: synopsis and copyright, agreed! Done.

re: caching, yes, requests for the version listing only force parsing of the cabal files for that package. Requests for a particular version only parse that one version. Would it be too optimistic to assume that when we are only using these top-level fields, laziness saves us from having to parse the whole file?

@alexcmcdaniel
Copy link

alexcmcdaniel commented Feb 25, 2019

I noticed that the sample endpoints do not have .json, is that intentional?

@imalsogreg
Copy link
Contributor Author

@alexcmcdaniel Thanks, yep. If you just type those example URLs into a browser you'll get html. Implicitly I meant for an Accept: application/json header to be attached to the requests.

@alexbiehl
Copy link
Member

Would it be too optimistic to assume that when we are only using these top-level fields, laziness saves us from having to parse the whole file?

Yes. To claim a successful parse the whole file needs to be examined. Usually lazy parsing only works well if your format supports laziness explicitly.

@alexcmcdaniel
Copy link

@imalsogreg any updates?

@imalsogreg
Copy link
Contributor Author

@alexcmcdaniel I'm slowly working on the caching part.

@alexcmcdaniel
Copy link

@imalsogreg how is the caching coming?

@imalsogreg imalsogreg force-pushed the jsonPackageReport2 branch from 8e4ac86 to a73f029 Compare April 6, 2019 12:41
@imalsogreg
Copy link
Contributor Author

I added AcidState actions for the API endpoints. It does read-through caching, with a hook on package change to delete the cache lines for a package.

I did not add any backup logic, since all the data that would be backed up is a redundant view of other already-backed-up data. And our API is cheap to call. So the backup would cost us more in maintenance and versioning than it would help us with site integrity. Think so?

I've rebased to clean up the commit history, added a couple small tests and gone through a cleanup pass. Is there anything else we should do before final review and merge? Any more fields to add to the package description API?

-- | Basic information about a package. These values are
-- used in the `/package/:packagename` JSON endpoint
data PackageBasicDescription = PackageBasicDescription
{ pbd_license :: License
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make these fields strict? We put that into a map in memory, it would be a shame to introduce a leak.

import Data.Aeson ((.=), (.:))
import Data.Acid (Query, Update, makeAcidic)
import qualified Data.HashMap.Strict as HashMap
import qualified Data.Map as Map
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it might make sense to use Data.Map.Strict here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: You should use Data.Map.Strict instead of this module if:

You will eventually need all the values stored.
The stored values don't represent large virtual data structures to be lazily computed.

Sounds like me! 👍

@imalsogreg
Copy link
Contributor Author

Anyone available to approve/request changes? I'd love to get this in to hackage :)

setDescriptionFor pkgId descr = State.modify $ \p ->
case descr of
Just d -> p {descriptions = Map.alter (const (Just d)) pkgId (descriptions p)}
Nothing -> p {descriptions = Map.filterWithKey (\pkgId' _ -> fst pkgId' /= fst pkgId) (descriptions p)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iamalsogreg can you explain why we wouldn't use Map.delete here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah - since descriptions is keyed on (PackageIdentifier, Maybe Int), if I use delete, I would be deleting just a particular package/metadata combination. But what I want is for a change to the package to delete every package entry for some fixed package, at every metadata revision.

If there were a deleteWithKey function that transforms the key before deciding what to delete, that could be nicer than the filterWithKey I use.

@alexbiehl
Copy link
Member

Ah makes sense, I didn't see both fst!

@alexcmcdaniel
Copy link

@imalsogreg hey sorry to make a last second request but would it be possible to add the homepage as well?

@alexcmcdaniel
Copy link

alexcmcdaniel commented Apr 22, 2019

oh never mind I see that you added it in the last commit, any update on the timing of the release? @imalsogreg @hvr

@aj-arena
Copy link

aj-arena commented May 6, 2019

I am also interested in this pull request, any update?

@gbaz

gbaz added a commit that referenced this pull request Feb 24, 2022
Package JSON API (replacement of #810)
@gbaz
Copy link
Contributor

gbaz commented Mar 28, 2022

obsoleted by #996

@gbaz gbaz closed this Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants