Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gatsby-source-drupal): Use the collection count from JSON:API extras to enable parallel API requests for cold builds #32883

Merged
merged 8 commits into from
Aug 26, 2021
6 changes: 6 additions & 0 deletions packages/gatsby-source-drupal/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,12 @@ module.exports = {
}
```

On the Drupal side, we highly recommend installing [JSON:API
Extras](https://www.drupal.org/project/jsonapi_extras) and enabling "Include
count in collection queries" `/admin/config/services/jsonapi/extras` as that
[speeds up fetching data from Drupal by around
4x](https://github.com/gatsbyjs/gatsby/pull/32883).

### Filters

You can use the `filters` option to limit the data that is retrieved from Drupal. Filters are applied per JSON API collection. You can use any [valid JSON API filter query](https://www.drupal.org/docs/8/modules/jsonapi/filtering). For large data sets this can reduce the build time of your application by allowing Gatsby to skip content you'll never use.
Expand Down
60 changes: 59 additions & 1 deletion packages/gatsby-source-drupal/src/gatsby-node.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,28 @@ const agent = {
// http2: new http2wrapper.Agent(),
}

let start
let apiRequestCount = 0
let initialSourcing = true
let globalReporter
async function worker([url, options]) {
// Log out some progress indicators during the initial sourcing
if (initialSourcing) {
apiRequestCount += 1
if (!start) {
start = Date.now()
}
const queueLength = requestQueue.length()
if (apiRequestCount % 50 === 0) {
globalReporter.verbose(
`gatsby-source-drupal has ${queueLength} API requests queued and the current request rate is ${(
apiRequestCount /
((Date.now() - start) / 1000)
).toFixed(2)} requests / second`
)
}
}

return got(url, {
agent,
cache: false,
Expand Down Expand Up @@ -72,6 +93,7 @@ exports.sourceNodes = async (
},
pluginOptions
) => {
globalReporter = reporter
const {
baseUrl,
apiBase = `jsonapi`,
Expand Down Expand Up @@ -293,6 +315,7 @@ exports.sourceNodes = async (
drupalFetchActivity.start()

let allData
const typeRequestsQueued = new Set()
try {
const res = await requestQueue.push([
urlJoin(baseUrl, apiBase),
Expand Down Expand Up @@ -370,7 +393,39 @@ exports.sourceNodes = async (
if (d.body.included) {
dataArray.push(...d.body.included)
}
if (d.body.links && d.body.links.next) {

// If JSON:API extras is configured to add the resource count, we can queue
// all API requests immediately instead of waiting for each request to return
// the next URL. This lets us request resources in parallel vs. sequentially
// which is much faster.
if (d.body.meta?.count) {
// If we hadn't added urls yet
if (d.body.links.next?.href && !typeRequestsQueued.has(type)) {
typeRequestsQueued.add(type)

// Get count of API requests
// We round down as we've already gotten the first page at this point.
const pageSize = new URL(d.body.links.next.href).searchParams.get(
`page[limit]`
)
const requestsCount = Math.floor(d.body.meta.count / pageSize)

reporter.verbose(
`queueing ${requestsCount} API requests for type ${type} which has ${d.body.meta.count} entities.`
)

const newUrl = new URL(d.body.links.next.href)
await Promise.all(
_.range(requestsCount).map(pageOffset => {
// We're starting 1 ahead.
pageOffset += 1
// Construct URL with new pageOffset.
newUrl.searchParams.set(`page[offset]`, pageOffset * pageSize)
return getNext(newUrl.toString())
})
)
}
} else if (d.body.links?.next) {
await getNext(d.body.links.next)
}
}
Expand Down Expand Up @@ -480,6 +535,9 @@ exports.sourceNodes = async (
createNode(node)
}

// We're now down with the initial sourcing.
KyleAMathews marked this conversation as resolved.
Show resolved Hide resolved
initialSourcing = false

return
}

Expand Down