Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: design suggestion: Make this a "3-tier library" #392

Open
weissi opened this issue Jul 7, 2021 · 11 comments
Open

RFC: design suggestion: Make this a "3-tier library" #392

weissi opened this issue Jul 7, 2021 · 11 comments

Comments

@weissi
Copy link
Contributor

weissi commented Jul 7, 2021

At the moment, we have a pretty gnarly issue with AHC's API: Settings can either be supplied globally or per request. So if you want to set certain configuration for a certain component in your app (say TLS config, redirect config, ...) then you either have to specify it for every single request (tedious) or globally (expensive as you'll get a new connection pool, need to maintain lifecycle etc).

What I suggest here is making AHC a "3-tier library":

  • tier 1: What is currently HTTPClient: the highly stateful, expensive, and lifecycle-managed object that owns the TLSContext, the connection pool, and if not .shared also the ELG
  • tier 2: (doesn't exist right now) A very lightweight (probably struct) and not lifecycle managed client object which is essentially a reference to the tier 1 object and a bag of configuration
  • tier 3: the individual requests

An API sketch could look like this:

// tier 1
let http = HTTPService(eventLoopGroupProvider: .shared(myELG)) // This is currently called `HTTPClient`
defer {
    try! http.syncShutdown()
}

// tier 2 (creating this is ~free, doesn't need to be shut down)
let httpClient = http.client(tlsConfiguration: specialConfig, redirectConfig: ...)

// tier 3
let result = httpClient.get("https://...").wait()

Benefits

  • Allows wider sharing of Tier 1 because it doesn't take configuration
  • Makes the API nicer to use because we can supply configuration to Tier 2 so we don't need to pass it to every request
  • Would allow Tier 2 to grow a "middleware" concept which would allow doing things like authenticating requests or controlling redirects etc without hard-wiring all that into the core HTTP client.

Important requirements per tier

Tier 1

  • do not take any configuration apart from the ELG

Tier 2

  • doesn't need lifecycle management
  • holds bags of configuration
  • ~free to create

Tier 3

  • can override most(all?) of the configuration set in Tier 2
@fabianfett
Copy link
Member

I agree with most of the things you mention. And I'm +1 on the general proposal. However I don't get why we should be able to override most settings in tier3. If tier2 is basically free to create, I don't see any benefit by overriding the settings. In my point of view the client should be a struct with value semantics.

@weissi
Copy link
Contributor Author

weissi commented Jul 7, 2021

@fabianfett good point! I don't know either :D

@Lukasa
Copy link
Collaborator

Lukasa commented Jul 7, 2021

Different TLS configurations interact somewhat poorly with the connection pool, as they produce new keys. So this isn't entirely a panacea for the connection pool re-use. Otherwise agreed.

@weissi
Copy link
Contributor Author

weissi commented Jul 7, 2021

Different TLS configurations interact somewhat poorly with the connection pool, as they produce new keys. So this isn't entirely a panacea for the connection pool re-use. Otherwise agreed.

Agreed, but we already have that issue/feature today. And we added it because people very commonly requested it, usually because the need a different trust roots/client certs to talk to different services.

@fabianfett
Copy link
Member

fabianfett commented Jul 7, 2021

And I would suggest a rename HTTPService is in my point of view a connection pool manager (and all connection pool settings need to be defined on this level). Since HTTPConnectionPoolManager is a mouthful and most users won't care about the exact semantics, I think I would prefer to call this HTTPConnectionPool.

However this leads us to the point where we have a HTTPConnectionPool which is actually a ConnectionPoolManager. Naming is hard. What are your thoughts?

@weissi
Copy link
Contributor Author

weissi commented Jul 7, 2021

And I would suggest a rename HTTPService is in my point of view a connection pool manager (and all connection pool settings need to be defined on this level). Since HTTPConnectionPoolManager is a mouthful and most users won't care about the exact semantics, I think I would prefer to call this HTTPConnectionPool.

I don't like giving it such a specific name because it can be more than a connection pool. And also, most users have no idea what a connection pool is.

@weissi
Copy link
Contributor Author

weissi commented Jul 7, 2021

also CC @adam-fowler , I think https://github.com/soto-project/soto has a similar-ish design?

@adam-fowler
Copy link
Member

adam-fowler commented Jul 7, 2021

Soto has two tiers, in that you have

  1. AWSClient where every public function takes a AWSServiceConfig parameter and then
  2. Every service object (S3, DynamoDB etc) has an instance of a AWSServiceConfig which it passes to AWSClient when it calls AWSClient methods.

The AWSServiceConfig is used to define endpoints, protocol used, server region, signing name etc).

In theory you could call the AWSClient public functions directly but in reality nobody does and will always do it via a service object. This design came from @fabianfett. Previously in Soto v4 we had one object which combined the service and client.

I guess there are similarities to what is described above in that AWSClient would be your tier 1 object and the service objects would be your tier 2 objects. In situations where I have had config options I wanted to easily change on a per call basis (what appears to be your tier 3) I have added a .with function which creates a copy of the service object with edited attributes. For instance

let s3 = S3(client: awsClient, region: .euwest1)
s3.getObject(bucket: "my-bucket", key: "my-file")
s3.with(options: .s3UseDualStackEndpoint).getObject(bucket: "my-bucket", key: "my-file")
s3.with(timeout: .minutes(15)).getObject(bucket: "my-bucket", key: "my-file")

@fabianfett
Copy link
Member

More previous art… Go‘s http client has three levels as well:

  1. Transport - which is an equivalent to a connection pool
  2. Client
  3. Request

A client can be created with a transport, but it doesn’t require a transport.

@weissi
Copy link
Contributor Author

weissi commented Aug 11, 2021

@fabianfett Thanks! I kinda like HTTPTransport, HTTPClient, HTTPRequest. Don't think "transport" really makes sense but it's not too terrible either :).

@weissi
Copy link
Contributor Author

weissi commented Jun 1, 2022

@fabianfett / @Lukasa / @adam-fowler I think we need to make progress on this issue. For a user of AsyncHTTPClient I see the following problem over and over again:

Let's say I have a library that internally does some HTTP. Currently, I have two options for it:

  1. Have the library internally create & own a HTTPClient instance which it'll share with nothing else. That forces the library's API to have full-blown lifecycle management (so it can shut down the HTTPClient). Also anything TLS/proxy config related becomes a pain (because now the library will need to accept a lot of config that it can then pass down to HTTPClient on creation).
  2. Accept a HTTPClient instance to share. From a resource perspective that's nice because the library can now share the HTTPClient with others. Proxy & TLS config also becomes easier because the user can pre-configure the HTTPClient to do the right thing. Additionally, the library might now get away w/o lifecycle management (because it doesn't own the HTTPClient, it just uses it). But unfortunately now I can't control anything w.r.t. timeouts or redirect configuration (because that's a global config on HTTPClient). So it's actually possible for the library to not work because say it needs to follow redirects or the connect timeout is too short or something else.

Both options feel bad but I run into them all the time :(.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants