Exception raised when running too many get requests #266

binarytemple · 2023-07-31T09:45:01Z

Guessing that k8s is spawning the gke-gcloud-auth-plugin command every time a get request request is issued.

I'm running something like this:

with_services_added  = ingress_list  |> 
  Enum.map(
    fn i  -> 
      ret = Task.async(
        fn -> 
          operation = K8s.Client.get("v1", "Service", 
            namespace: i.namespace, 
            name: i.service_name)
          {:ok, service }  = K8s.Client.run(conn, operation)

          i |> 
          Map.put(:selector, service["spec"]["selector"] |> then(fn m -> "|#{Kernel.inspect(m)}|" end) )  |>
          Map.put(:project, service["metadata"]["annotations"]["gitlab.binarytemple.com/projectname"]  ) 

        end)
      ret
    end) |>
  Enum.map(&Task.await(&1))

  CSV.encode(with_services_added, headers: true, delimiter: "") |> Enum.each(&IO.puts(&1))

And receiving the following error message:

10:40:12.830 [error] GenServer #PID<27964.2401.0> terminating
** (stop) :emfile
    :erlang.open_port({:spawn_executable, '/Users/bryanh/brew/share/google-cloud-sdk/bin/gke-gcloud-auth-plugin'}, [{:env, []}, :use_stdio, :exit_status, :binary, :hide, {:args, []}])
    (elixir 1.14.3) lib/system.ex:1071: System.do_cmd/3
    lib/k8s/conn/auth/exec.ex:87: K8s.Conn.Auth.Exec.generate_token/1
    lib/k8s/conn/auth/exec.ex:69: K8s.Conn.RequestOptions.K8s.Conn.Auth.Exec.generate/1
    lib/k8s/conn.ex:361: K8s.Conn.RequestOptions.K8s.Conn.generate/1
    lib/k8s/discovery/driver/http.ex:58: K8s.Discovery.Driver.HTTP.http_get/2
    lib/k8s/discovery/driver/http.ex:17: K8s.Discovery.Driver.HTTP.resources/3
    #cell:bq4txyso2sm2jmyzennytxcbnv4rdczy:10: (file)
Last message: {:DOWN, #Reference<27964.1503083199.3897032710.249364>, :process, #PID<27964.2400.0>, {:emfile, [{:erlang, :open_port, [erl_child_setup: failed with error 32 on line 281
{:spawn_executable, '/Users/bryanh/brew/share/google-cloud-sdk/bin/gke-gcloud-auth-plugin'}, [{:env, []}, :use_stdio, :exit_status, :binary, :hide, {:args, []}]], [error_info: %{module: :erl_erts_errors}]}, {System, :do_cmd, 3, [file: 'lib/system.ex', line: 1071]}, {K8s.Conn.Auth.Exec, :generate_token, 1, [file: 'lib/k8s/conn/auth/exec.ex', line: 87]}, {K8s.Conn.RequestOptions.K8s.Conn.Auth.Exec, :generate, 1, [file: 'lib/k8s/conn/auth/exec.ex', line: 69]}, {K8s.Conn.RequestOptions.K8s.Conn, :generate, 1, [file: 'lib/k8s/conn.ex', line: 361]}, {K8s.Discovery.Driver.HTTP, :http_get, 2, [file: 'lib/k8s/discovery/driver/http.ex', line: 58]}, {K8s.Discovery.Driver.HTTP, :resources, 3, [file: 'lib/k8s/discovery/driver/http.ex', line: 17]}, {:elixir_eval, :__FILE__, 1, [file: '#cell:bq4txyso2sm2jmyzennytxcbnv4rdczy', line: 10]}]}}

There's 104 requests being shot out in quick succession/concurrently there, which I might expect to cause HTTP client failures, but what I suspect is happening is that 104 instances of the gke-gcloud-auth-plugin executable are being spawned.

The text was updated successfully, but these errors were encountered:

JTarasovic · 2024-01-20T22:52:35Z

Yeah, the Exec auth strategy doesn't do any type of caching or anything so ends up spawning quite a lot of processes to generate tokens. We're running into a similar situation with generating tokens for EKS clusters.

FWIW, both the v1beta1 and v1 response have an (optional) expiration timestamp. However, it doesn't look like there's a really ergonomic way to maintain that state between calls.

I'm happy to add that functionality but would want a bit of guidance on how we'd want to do that and what would be acceptable before getting too far (e.g. ETS, GenServer state, etc).

cc: @coryodaniel @elliottneilclark

binarytemple · 2024-01-21T23:00:36Z

Hey thanks for the reply, long time Erlang/Elixir dev. People get hung up on pure functions etc, but typically what I see used for this kind of thing is an ETS table, yep, global state, impure, etc, but Erlang is a pragmatic platform first and functional second. You might be tempted to use a point cut approach and add a caching library but honestly, for something like this, a public ETS table to store the tokens and fetch from every time one was needed or invalidated would do a fine job. If you pointed me at the appropriate place in the code path I could whip a PR up for you next weekend if you liked.

elliottneilclark · 2024-01-22T05:25:58Z

I would like to suggest a GenServer per K8s.Connection with a DynamicSupervisor for this first. We can always add an ETS table if the token read throughput is too slow.

Agents are best used when there's a single state; that's hardly ever changed, and we don't need to specialize access. That doesn't fit here since we will have some expiry at the very least.

GenServer is the default to reach for when we have state and need more. Additionally, K8s already spins up a dynamic supervisor and a GenServer per connection for adapters Here. So following that we can add a DynamicSupervisor for Auth to use. Then, we can start a process during AuthProvider.create/2. Using the process, we get a single writer even if the K8s.Connection is shared across processes. Additionally, we can schedule an early refresh (with jitter) of the token in a Task that sends back. Assuming the command being called is quick (for some configurable amount of expected shell/exec time), no requests to Kubernetes should see the latency hit of shelling out.

Looking at K8s.Conn.Auth.Azure and K8s.Conn.Auth.Certificate would benefit from following this pattern using HTTP and file io, respectively. K8s.Conn.Auth.Certificate can use this improvement the most since certificates do expire, and many cloud providers are giving shorter and shorter certificate lengths. For high-security clusters, this can get pretty extreme in CA expiry time.

mruoss · 2024-01-22T09:01:48Z

I have noticed this as well. I fully agree with @elliottneilclark's approach. We can refresh the token "automatically" in the GenServer some time before it expires. IMO it might also be enough to just check the expiry timestamp when using the token. If it is expired, it needs to be renewed. Yes, this one request will take longer. But I wonder if we ever even reach that point as these auth providers are mostly used locally. But you tell me.

That being said, is anybody willing to implement this?

elliottneilclark · 2024-01-22T19:37:27Z

That being said, is anybody willing to implement this?

We will. I'll start with the exec and put it up for review.

elliottneilclark · 2024-01-22T23:38:00Z

#302 is my start at this. I or @JTarasovic still need to do a lot before the entire feature is complete.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception raised when running too many get requests #266

Exception raised when running too many get requests #266

binarytemple commented Jul 31, 2023

JTarasovic commented Jan 20, 2024

binarytemple commented Jan 21, 2024

elliottneilclark commented Jan 22, 2024 •

edited

Loading

mruoss commented Jan 22, 2024 •

edited

Loading

elliottneilclark commented Jan 22, 2024

elliottneilclark commented Jan 22, 2024

Exception raised when running too many get requests #266

Exception raised when running too many get requests #266

Comments

binarytemple commented Jul 31, 2023

JTarasovic commented Jan 20, 2024

binarytemple commented Jan 21, 2024

elliottneilclark commented Jan 22, 2024 • edited Loading

mruoss commented Jan 22, 2024 • edited Loading

elliottneilclark commented Jan 22, 2024

elliottneilclark commented Jan 22, 2024

elliottneilclark commented Jan 22, 2024 •

edited

Loading

mruoss commented Jan 22, 2024 •

edited

Loading