The Yahoo Knowledge COVID-19 API provides JSON-API and GraphQL interfaces to access COVID-19 public data sourced and consolidated by the Yahoo Knowledge Graph Team. Raw data is available here.
This API powers a sample dashbboard to visualize the spread of COVID-19. The source code for the visualization is available here.
The API is powered by Elide.io - an open source framework for building model driven APIs in JSON-API and GraphQL.
The API to power the dashboard is publicly hosted by Verizon Media, but is restricted to only the query shapes needed by the dashboard.
The Application landing page supports both Swagger documentation and the GraphiQL IDE to get familiar with the APIs.
The default landing page is /api/index.html
which can switch between Swagger and GraphiQL. Specific endpoints for Swagger and GraphiQL can be found at /api/swagger/index.html
and /api/graphiql/index.html
respectively.
The data model is evolving as more data becomes available worldwide. The current model includes pre-aggregated health record data broken out by geography. LatestHealthRecords
is a snapshot of the latest records available for the current day. HealthRecords
includes records dated by publication date (there can be gaps if data was unavailable).
Place
represents a geographical region in the map. Place
has many to many relationships with itself to enable parent-child hierarchy. This allows Cities to have multiple Zip Codes (multiple parents in the hierarchy).
To build and run:
- mvn clean install
- java -jar webservice/target/covid-19-api.jar
- Open http://localhost:8080/api
The server is available on localhost:8080.
Logs will be written to /tmp/log/covid-19-api/access.log and /tmp/log/covid-19-api/server.log.
Both JSON-API and GraphQL APIs support:
- Complex filtering predicates
- Sorting by model attributes
- Pagination and page totals
- Compound/complex model retrieval in a single payload.
Collection retrieval is capped at 3,000 records per request. By default (with no pagination parameters defined), retrieval is set to 50 records per request.
To get familiar with JSON-API, check out jsonapi.org and Elide documentation on JSON-API.
Here is an example request with filtering, sorting, and pagination:
curl "localhost:8080/api/json/v1/healthRecords?filter=totalDeaths=gt=0&sort=-totalDeaths&page[totals]&page[number]=1&page[size]=1"
Here is the server response:
{
"data": [
{
"type": "healthRecords",
"id": "58c73ef0-7547-3e38-b1e7-8d95354070c4",
"attributes": {
"dataSource": "https://github.com/yahoo/covid-19-data/blob/master/data-sources.md",
"label": "Earth",
"latitude": 0,
"longitude": 0,
"numActiveCases": null,
"numDeaths": null,
"numRecoveredCases": null,
"numTested": null,
"referenceDate": "2020-05-13T00:00Z",
"totalConfirmedCases": 4205883,
"totalDeaths": 290296,
"totalRecoveredCases": null,
"totalTestedCases": null,
"wikiId": "Earth"
},
"relationships": {
"place": {
"data": {
"type": "places",
"id": "Earth"
}
}
}
}
],
"meta": {
"page": {
"number": 1,
"totalRecords": 65144,
"limit": 1,
"totalPages": 65144
}
}
}
The GraphQL API semantics are documented in detail in here. Elide accepts GraphQL queries embedded in HTTP POST requests. It follows the convention defined by GraphQL for serving over HTTP. Namely, ever GraphQL query is wrapped in a JSON envelope object with one required attribute (query):
curl -g -X POST -H"Content-Type: application/json" -H"Accept: application/json" \
"http://localhost:8080/api/graphql/v1" \
-d'{
"query" : "{ healthRecords(filter: \"totalDeaths=gt=0\", sort: \"-totalDeaths\", first: \"1\", after: \"0\") { pageInfo { totalRecords }, edges { node { id totalDeaths }}}}"
}'
Here is the same payload for easier reading including filtering, sorting, and pagination:
{
healthRecords(filter: "totalDeaths=gt=0", sort: "-totalDeaths", first: "1", after: "0")
{
pageInfo
{
totalRecords
},
edges
{
node
{
id
totalDeaths
}
}
}
}
The server returns the following response:
{
"data": {
"healthRecords": {
"pageInfo": {
"totalRecords": 2
},
"edges": [
{
"node": {
"id": "6aa1fc99-a988-33f5-93ac-1d8fc859dc8d",
"totalDeaths": 3873
}
}
]
}
}
}
The API repo is broken into three sub-modules:
- db-builder - A command line utility that downloads the covid-19 data from public github and produces a H2 database snapshot.
- db - A resource only jar that packages the latest H2 database shapshot of the covid-19 data.
- webservice - An elide web service that ships with the db resource only jar.
The elide web service is a shared-nothing, read-only, scalable architecture where each instance of the service is packaged with a snapshot of the latest covid-19 data. Because the API is unauthenticated, the surface of the API is heavily restricted to support the dashboard/visualization. This is done by disabling GraphQL endpoints and restricting the shapes of the JSON-API requests that are allowed in production.
Please contact [email protected] with any questions.
Please refer to the contributing.md file for information about how to get involved. We welcome issues, questions, and pull requests.
This project is licensed under the terms of the Apache 2.0 open source license. Please refer to LICENSE for the full terms.