Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flowctl: Batch draft spec upserts, rather than incorrectly attempting to paginate the response #1629

Merged
merged 2 commits into from
Sep 17, 2024

Conversation

jshearer
Copy link
Contributor

@jshearer jshearer commented Sep 16, 2024

Description:

When pagination was originally introduced, this code-path was incorrectly switched to use it, despite it actually being an upsert which doesn't support pagination

let rows: Vec<SpecSummaryItem> = api_exec_paginated(
client
.from("draft_specs")
.select("catalog_name,spec_type")
.upsert(String::from_utf8(body).expect("serialized JSON is always UTF-8"))
.on_conflict("draft_id,catalog_name"),
)

Requests that don't support pagination just straight up ignore the range header. That wasn't a problem because the pagination logic only tried to fetch the next page if it got back exactly the number of rows in its requested range:

if resp.len() == request.page_size

So, this was an extremely rare edge case that would only get hit if you tried to publish exactly 1000 specs. But, I noticed that for requests that do respect pagination, sometimes postgrest will actually return one more row than was requested, so I updated this to a >= in #1516:

// Sometimes, it seems, we can get back more than the requested page size.
// So far I've only seen this on a request of 1,000 and a response of 1,001.
if resp.len() >= request.page_size

Before that change is committed, any request that returns more rows than were requested will incorrectly cause pagination to think it has reached the end and result in an incomplete result set being returned. After that change is committed, any attempt to publish >= 1000 specs will sit in an infinite loop attempting to load the next page of a request that doesn't support pagination.

This PR fixes both cases:

  • It switches pagination to use >= check to correctly load paginated data where PostgREST returns more rows than the requested range
  • It switches catalog publishes to batch their inputs and not use pagination when making upsert queries.

Testing:

Figuring out what was going on here was a bit complicated as the reason I thought my original PR fixed the issue was that I was working ontop of my branch in #1516 that had the pagination fix applied, but not the fix for flowctl catalog publish. So, after confirming that my locally-built flowctl is just ontop of this branch jshearer/batch_drafts, which is ontop of master:

$ flowctl catalog pull-specs --captures=true --collections=true --flat --target flow.yaml --prefix test/many_collections
Wrote 1734 specifications under file:///Users/js/Documents/estuary/test/flow.yaml.

$ flowctl catalog publish --source flow.yaml
Created draft: 0e...
Will publish the following 1734 specs:
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────────────╮
│ Name                                                                                                                            │ Type            │
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╪═════════════════╡
│ test/many_collections/...                                                                                                       │ capture         │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────────┤

Also reverts --name back to its previous behavior since pagination doesn't make sense there.


This change is Reviewable

@jshearer jshearer added the change:unplanned This change is unplanned, useful for things like docs label Sep 16, 2024
@jshearer jshearer marked this pull request as draft September 16, 2024 16:02
@jshearer jshearer force-pushed the jshearer/batch_drafts branch from b40fdfa to 1df9e8f Compare September 16, 2024 16:36
@jshearer jshearer marked this pull request as ready for review September 16, 2024 16:41
@jshearer jshearer force-pushed the jshearer/batch_drafts branch from 1df9e8f to e0a6959 Compare September 16, 2024 16:41
@jshearer
Copy link
Contributor Author

I had wanted to use buffered_unordered() to limit the request concurrency, but got some compile errors that I haven't seen before and wasn't sure what to do with them, so I used FuturesUnordered like we do elsewhere, presumably for the same reason:

error: implementation of `Send` is not general enough
  --> crates/flowctl/src/main.rs:22:18
   |
22 |     let handle = runtime.spawn(async move { cli.run().await });
   |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ implementation of `Send` is not general enough
   |
   = note: `Send` would have to be implemented for the type `&flowctl::auth::Auth`
   = note: ...but `Send` is actually implemented for the type `&'0 flowctl::auth::Auth`, for some specific lifetime `'0`

error: implementation of `Send` is not general enough
  --> crates/flowctl/src/main.rs:22:18
   |
22 |     let handle = runtime.spawn(async move { cli.run().await });
   |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ implementation of `Send` is not general enough
   |
   = note: `Send` would have to be implemented for the type `&Cli`
   = note: ...but `Send` is actually implemented for the type `&'0 Cli`, for some specific lifetime `'0`

@jshearer jshearer force-pushed the jshearer/batch_drafts branch from e0a6959 to afacaea Compare September 16, 2024 16:58
Copy link
Member

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % a couple comments

crates/flowctl/src/draft/author.rs Outdated Show resolved Hide resolved
crates/flowctl/src/draft/author.rs Show resolved Hide resolved
… to paginate the response

Also revert `--name` back to its previous behavior since pagination doesn't make sense there.
… page size, as PostgREST appears to be sometimes returning more rows than was requested.
@jshearer jshearer force-pushed the jshearer/batch_drafts branch from ce6ed29 to bae42ad Compare September 17, 2024 12:38
@jshearer jshearer merged commit 9e50e5f into master Sep 17, 2024
4 checks passed
@jshearer jshearer deleted the jshearer/batch_drafts branch September 18, 2024 21:10
github-actions bot pushed a commit to estuary/homebrew-flowctl that referenced this pull request Sep 19, 2024
## What's Changed
* Batch draft spec upserts, rather than attempting to paginate the response by @jshearer in estuary/flow#1629
* use max build_id for validating local specs by @psFried in estuary/flow#1636
* Update pagination to use 0-indexed inclusive ranges, and page by response length rather than fixed page size. by @jshearer in estuary/flow#1637
github-actions bot pushed a commit to estuary/homebrew-flowctl that referenced this pull request Sep 19, 2024
## What's Changed
* Batch draft spec upserts, rather than attempting to paginate the response by @jshearer in estuary/flow#1629
* use max build_id for validating local specs by @psFried in estuary/flow#1636
* Update pagination to use 0-indexed inclusive ranges, and page by response length rather than fixed page size. by @jshearer in estuary/flow#1637
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:unplanned This change is unplanned, useful for things like docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants