Skip to content

Comments

[docs] api versioning#1958

Merged
fisx merged 12 commits intodevelopfrom
doc-api-versions
Dec 10, 2021
Merged

[docs] api versioning#1958
fisx merged 12 commits intodevelopfrom
doc-api-versions

Conversation

@fisx
Copy link
Contributor

@fisx fisx commented Dec 2, 2021

This is a draft of a proposal for a system for negotiating dialects between old servers and new clients and vice versa.

https://wearezeta.atlassian.net/browse/SQSERVICES-1101

Checklist

  • The PR Title explains the impact of the change.
  • The PR description provides context as to why the change should occur and what the code contributes to that effect. This could also be a link to a JIRA ticket or a Github issue, if there is one.
  • changelog.d contains the following bits of information (details):
    • A file with the changelog entry in one or more suitable sub-sections. The sub-sections are marked by directories inside changelog.d.

@fisx fisx force-pushed the doc-api-versions branch from 50a6c4a to fec43f5 Compare December 3, 2021 23:29
@fisx fisx force-pushed the doc-api-versions branch from fec43f5 to 5bf1c9b Compare December 7, 2021 09:53
@fisx fisx changed the title [docs] api versioning [skip ci] [docs] api versioning Dec 7, 2021
@fisx fisx marked this pull request as ready for review December 7, 2021 10:01
@fisx fisx requested a review from akshaymankar December 7, 2021 10:40
@fisx
Copy link
Contributor Author

fisx commented Dec 8, 2021

note to self:

  • i forgot do discuss semantic versioning in the alternatives section
  • should i mention https://hackage.haskell.org/package/safecopy in the miration section? (even if we don't want to depend on it, we may still benefit from knowing of it.)

@fisx
Copy link
Contributor Author

fisx commented Dec 8, 2021

@pcapriotti I have skimmed through the confluence docs you wrote on the topic. If there is anything I should adopt from those please point it out!

Copy link
Contributor

@pcapriotti pcapriotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice exposition. Thanks for writing this up.

I like the overall approach. At first glance, it differs quite significantly from my own proposal, which was based on the idea of having a single version per node, and then defining a notion of "compatibility" between different versions.

However, I think that one can actually map this proposal to my original one. What I was referring to as "version" is now a set of versions. The compatibility matrix is defined by saying that two "versions" are compatible if they have a non-empty intersection. And in fact, in the more general setting based on compatibility being a lattice, we can just say that the compatibility of two "versions" is their intersection.

Therefore, your proposal solves the issue of having to define what "compatibility" means, but in exchange you pay the price of having to maintain a set of possibly very different versions of possibly very large pieces of the codebase around.

And this is really the crux of the discussion: is it possible to maintain multiple versions sanely in the same codebase? I think your arguments are pretty good, but I'm worried by the amount of boilerplate that is going to be generated, and by compile times potentially skyrocketing. I'm in favour of giving this a go, but I have a few reservations.

For example, here are a few questions that haven't been addressed:

  • How does version negotiation work for server-to-server interactions? Does it happen on every RPC? Does it require some kind of persistence? Note that at the moment all server-to-server calls are stateless.
  • What do we do when another backend has no compatible version?
  • How to structure (federation) client code? The discussion in this document focuses on the server implementation. Do similar ideas apply the client side of the equation?
  • Relatedly: is the set of versions supported on the client side necessarily the same as the ones supported on the server side?
  • How do we change endpoint structure between versions when using the record-based routing table? Should we just switch to the plain style (API types concatenated by :<>). That would also have the benefit of improving compile times.

Copy link
Member

@jschaul jschaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good plan!

I suppose this requires having all existing endpoints servantified as otherwise clients would try to call /v17/something but so far we only have /something defined in wai-routes in a service that doesn't have any middlewareing yet?

@fisx
Copy link
Contributor Author

fisx commented Dec 9, 2021

I suppose this requires having all existing endpoints servantified as otherwise clients would try to call /v17/something but so far we only have /something defined in wai-routes in a service that doesn't have any middlewareing yet?

that is correct. we could servantify just the end-points that change between v0 and v1, and then hack something into the middleware that only sticks version numbers in front of the already-servantified end-points. that would allow us to move forward faster, but the approach as outlined in this PR only works with servant.

Co-authored-by: Paolo Capriotti <paolo@capriotti.io>
@fisx
Copy link
Contributor Author

fisx commented Dec 9, 2021

  • How does version negotiation work for server-to-server interactions? Does it happen on every RPC? Does it require some kind of persistence? Note that at the moment all server-to-server calls are stateless.

https://github.com/wireapp/wire-server/blob/d7feb3ff8a97f6215f53509bf084253a36d9f73b/docs/developer/api-versioning.md#update-detection-and-version-re-negotation

this was clear to me when i wrote it, but maybe it can be made clearer. do you have a suggestion?

* What do we do when another backend has no compatible version?

https://github.com/wireapp/wire-server/blob/d7feb3ff8a97f6215f53509bf084253a36d9f73b/docs/developer/api-versioning.md#no-shared-api-version

* How to structure (federation) client code? The discussion in this document focuses on the server implementation. Do similar ideas apply the client side of the equation?

https://github.com/wireapp/wire-server/blob/d7feb3ff8a97f6215f53509bf084253a36d9f73b/docs/developer/api-versioning.md#data-migration-aka-data-marshalling, but probably more importantly: https://github.com/wireapp/wire-server/blob/d7feb3ff8a97f6215f53509bf084253a36d9f73b/docs/developer/api-versioning.md#writing-client-code (starting with the second paragraph.

again, the fact that you ask this question probably means i should be clearer there, ideally without being any less concise. any ideas?

* Relatedly: is the set of versions supported on the client side necessarily the same as the ones supported on the server side?

probably? if they are both up-to-date backends, the version sets should be identical. not sure i understand the question?

* How do we change endpoint structure between versions when using the record-based routing table? Should we just switch to the plain style (API types concatenated by `:<>`). That would also have the benefit of improving compile times.

i think you can freely mix :<|> and records, but i don't have much experience with that. my idea was to include all end-points in the record that appear in at least one supported version, and add a new verb NotInThisVersion:

https://github.com/wireapp/wire-server/blob/d7feb3ff8a97f6215f53509bf084253a36d9f73b/docs/developer/api-versioning.md#changing-structure

I think both approaches are valid and can be mixed.

@fisx
Copy link
Contributor Author

fisx commented Dec 9, 2021

Added some answers and some follow-up questions (sometimes disguised as statements. :))

I may merge at some point to get this out of the way, but I think it's productive to keep the discussion going a little longer.

Note to self: consider incorporating some of the discussion in this PR into the markdown. Or at least link the PR.

@pcapriotti
Copy link
Contributor

again, the fact that you ask this question probably means i should be clearer there, ideally without being any less concise. any ideas?

I'm asking these questions about the client-side of things because I can't quite make sense of what you wrote in the document unless I interpret it as being about the clients of the public API. Let me just use the word "clients" to refer to those, and not also the client code in the backend, to avoid confusion.

I'm not sure the two use cases can be so easily conflated. As I mentioned in my question, server-to-server RPCs are stateless. So what to do when something fails due to version incompatibility? Of course the simple answer is return with some error and forget about it, but then the error is likely to occur over and over, and a more sensible answer would be to come up with some system that marks a backend as "too old" (or "too new") and acts accordingly in the future. How to do this is unclear to me, due the simple way things are organised at the moment.

For clients, this is different, because if they detect an incompatible backend, they can simply stop operating until the user retries.

A similar difference exists for negotiation. Should it be done every single time? The document seems to assume some kind of stateful notion of version agreement, which is persistent, and talks about how to refresh it. Again, I assume this is about clients. In the backend, we have the opposite problem, in fact.

As for how to write client code, the discussion in the document is focused on clients (it refers to typescript, for example), and also somehow accepts that functions have to be duplicated. But for the server API version, we are talking about a system to avoid duplication, and I don't see a discussion of how to use the client side of that system. Maybe it's obvious to everyone, and I'm just being slow?

probably? if they are both up-to-date backends, the version sets should be identical. not sure i understand the question?

Let me try to explain the question more clearly. The (latest) backend defines a set of API versions S, but, when it acts as a client of the federation API, it needs to implement some logic that selects a version out of those advertised by a peer backend. What is this logic? Does it consider any version in S to be ok? Does it make sense to allow more (or fewer) versions?

The answer to this depends on the answer to the previous questions, like how to actually structure client code, and whether the client versioning logic is somehow tied to the server one. I don't have a clear picture of how this is going to work.

i think you can freely mix :<|> and records, but i don't have much experience with that. my idea was to include all end-points in the record that appear in at least one supported version, and add a new verb NotInThisVersion:

https://github.com/wireapp/wire-server/blob/d7feb3ff8a97f6215f53509bf084253a36d9f73b/docs/developer/api-versioning.md#changing-structure

I think both approaches are valid and can be mixed.

I don't understand how NotInThisVersion is supposed to be used. Can you maybe write down an example of an endpoint that exists for version 2, but not in version 1 and 3?

It is quite common that behavior of end-points changes together with
the syntax, or even without a change in the syntax.

This is not a fundamental problem: since the handler can be called
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to old frontend clients if we force a change of version when the behavior changes, but the data doesn't?

E.g. there's an endpoint to upload assets, that are stored in a somehow generic bucket (V1).
A new implementation is introduced so that the image is stored on a per-user bucket (V2).
The endpoint otherwise has the same data between V1 and V2.

Old clients that only talk V1 could, in theory, still use the endpoint to send images. The fact that they are not aware of the new behavior does not prevent them from sending images; both V1 and V2-aware backends and clients can process those images successfully. However, since the client doesn't speak V2, it will be prevented from talking to V2 backends. This sounds to me like an artificial limitation.

Probably in cases like this we should consider changing the behavior of the V1 endpoint to behave like V2, keep the V1, and at the same time introduce V2. Basically backporting the V2 behavior to V1, since they are compatible data-wise. This might increase complexity on the backend implementation (does it?), but has the advantage of not forcing a client update for every user.

Would this be a valid strategy?

Copy link
Contributor

@pcapriotti pcapriotti Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend against retroactively changing APIs of old versions. That somewhat defeats the purpose of using versioning, breaks the whole model and can introduce hard-to-diagnose issues.

I don't quite understand your example here. If V2 is the same as V1, but just implemented differently, it should just not cause a version bump, so V1=V2, and clients have no compatibility issues. If V2 has strictly more features than V1, the backend can support both, so V1-clients can still talk to {V1,V2}-backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @pcapriotti, I agree! @marcoconti83 let me know if this just raises further questions. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My example was a bit convoluted, but you @pcapriotti got what I meant when you said:

If V2 has strictly more features than V1, the backend can support both, so V1-clients can still talk to {V1,V2}-backends.

... (adding) thanks to the backend still honoring the V1 API, even if the internals of the backend behavior is changed, if that change is not relevant for the API.

I'll rewrite the example:

With V1, we store assets without any user information (who uploaded them).
The clients use POST /v1/assets.

With V2, we introduce a new feature to keep track of which user uploaded which asset, and allow users to delete them. We somehow run a migration script once to assign all previously unassigned assets to a user (doesn't matter how) and from now on, we assume all assets are assigned to a user.
The clients use POST /v2/assets, GET /v2/assets and DELETE /v2/assets/<id>

We now have two options:

  • Have the backend support only V2
    • Clean cut, but it also means all clients need to update NOW or they will stop working as they still post to V1
  • Have the backend support V1 and V2
    • When clients use POST /v1/assets, the backend assigns a user ID to the asset. This was not the behavior before V2 was introduced, BUT the V1 API is still honored (this is what I meant by "backporting" the behavior). The new behavior is not relevant for the API contract, so why deprecate it? We just map /v1/assets/ to /v2/assets/. We can remove the extra v2 bits from the response to look like v1, but that's not even strictly necessary if they were only additions.
    • V2 clients can still do the extra bits: delete assets, even on assets uploaded by V1 clients
    • This means we can release the backend version that supports V2 and wait for all clients to update to V2 whenever they are ready

I'm bringing up this example because we had to do such hybrid systems in the past to support a transition between one behavior and another, and I don't want the versioning system to preclude this possibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @marcoconti83 for clarifying the example, now I understand better what you mean. I think option two should work without problems. In practice, what I imagine would happen in the server is that V1 would be unmodified at the level of the API and the corresponding types, but its implementation would share code with V2, implementing the new user-tracking functionality.

But I see your point now: in some sense, we are indeed retroactively changing V1 when we add V2, since V1 is now implemented the same as V2. However, this is only in the implementation, and the API contract of V1 is still identical to before.

How to make sure that these contracts are indeed preserved over time is another matter altogether, and not discussed in this proposal. We've had some previous discussions around a compatibility testing framework, but it was focused on the server-to-server case, and based on a slightly different approach to versioning: https://wearezeta.atlassian.net/wiki/spaces/CORE/pages/529170506/Compatibility+testing+for+different+backend+versions+in+federation. I imagine many of the ideas still apply, though.

=> <swagger for this particular version>
```

The client developer can pull the swagger docs of a new version and
Copy link

@tmspzz tmspzz Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean: there is an automated diffing too that the backend provides. 😉

Even better: there is a compare function similar to https://github.com/octocat/linguist/compare/master...octocat:master


The client developer can pull the swagger docs of a new version and
diff it against the one they already support, and work their way
through the changes (see below).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean: And the changes automatically generate a new stub client

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yes.

`unsupported-version`. The versioning middleware can do that (see
above).

Client should install a catch-all that will handle this specific
Copy link

@tmspzz tmspzz Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might cause a data loss on the client if a data format changes.

Client can't renegotiate mid operation the api version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In those cases the operation has to fail. Anyway this only happens if our upgrade TOS have been violated, so it's ok to fail, we just want to fail gracefully so the user knows what's happening.


If a mandatory attribute is added in a newer version, there may be a
plausible default value that can be used in the forward migration
(backward migration would still remove the field).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Older clients won't be able to read the field because they are unaware of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not about an additional field that clients can not yet read, but an additional fiend that the backend expects FROM the client. If the client did not specify that field, the backend will assume a default value.


If you generate, say, typescript or kotlin or swift from swagger:

0. have a generated source module `Gen.ts`, plus a source module with
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will version a library per version of the API achieving something similar to what you describe I believe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may end you up with more dead code in the executable, but that may not be a problem. Implementation detail!

@fisx fisx changed the title [docs] api versioning [docs] api versioning [skip ci] Dec 10, 2021
@fisx
Copy link
Contributor Author

fisx commented Dec 10, 2021

I don't understand how NotInThisVersion is supposed to be used. Can you maybe write down an example of an endpoint that exists for version 2, but not in version 1 and 3?

76e718c (may or may not be the answer. :))

@fisx fisx changed the title [docs] api versioning [skip ci] [docs] api versioning Dec 10, 2021
@fisx fisx merged commit ecadf71 into develop Dec 10, 2021
@fisx fisx deleted the doc-api-versions branch December 10, 2021 13:43
@sysvinit sysvinit mentioned this pull request Dec 10, 2021
@fisx
Copy link
Contributor Author

fisx commented Dec 13, 2021

note to self: more thoughts on routes that only exist in some versions:

data NotInThisVersion = NotInThisVersion

data V = V0 | V1 | V2 | V3
  deriving (Eq, Ord, Show)

class SomeVersionsOnly (v :: V) (is :: '[V]) (es :: '[V]) (a :: *) where
  type NotInVersionT v is es :: * -- = either NotInThisVersion or a

instance v `Member` is => SomeVersionsOnly (v :: V) (is :: '[V]) (es :: '[V]) (a :: *) where
  type NotInVersionT v is es = a

instance v `Member` es => SomeVersionsOnly (v :: V) (is :: '[V]) (es :: '[V]) (a :: *) where
  type NotInVersionT v is es = NotInThisVersion

type API (v :: V) =
  v :> "users" :> Get '[JSON] ()
    :<|> v :> SomeVersionsOnly v '[V1] '[V2, V3] ("oonvs" :> Get '[JSON] ())

-- test that collects all calls to SomeVersionsOnly from a routing table and makes sure all existing versions are mentioned.
class Test a where
  test :: Spec ()

instance Test (a :<|> as) where
  test = do
    test a
    test as

instance {-# OVERLAPPING #-} Test (a :> SomeVersionsOnly v is es) where
  test = do
    sort (is <> es) `shouldBe` [minBound ..]

suite :: Spec ()
suite = test @(API V0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants