Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin stuck in broken state waiting for RPC address #306

Open
agaudreault opened this issue Apr 18, 2024 · 6 comments
Open

Plugin stuck in broken state waiting for RPC address #306

agaudreault opened this issue Apr 18, 2024 · 6 comments

Comments

@agaudreault
Copy link

agaudreault commented Apr 18, 2024

When trying to start a plugin, it sometimes ends up in a broken state and the only way to unblock is to restart the host.

Main function trying to start the plugin

		t.client[pluginName] = goPlugin.NewClient(&goPlugin.ClientConfig{
			HandshakeConfig: handshakeConfig,
			Plugins:         pluginMap,
			Cmd:             exec.Command(pluginPath, args...),
			Managed:         true,
		})
		rpcClient, err := t.client[pluginName].Client()
		if err != nil {
			// getting log: waiting for RPC address
			// then error: timeout while waiting for plugin to start
			return nil, fmt.Errorf("unable to get plugin client (%s): %w", pluginName, err)
		}

Function trying to get the plugin if it exist, and somewhat recover on error

	client, err := t.client[pluginName].Client()
	if err != nil {
		// Getting error: exec: Stdout already set
		// cannot run kill() because it requires a valid client
		return nil, fmt.Errorf("unable to get plugin client (%s) for ping: %w", pluginName, err)
	}
	if err := client.Ping(); err != nil {
		t.client[pluginName].Kill()
		t.client[pluginName] = nil
		return nil, fmt.Errorf("could not ping plugin will cleanup process so we can restart it next reconcile (%w)", err)
	}

Everytime we create a new client (restart the program), there is a new process that is created in the following state

USER               PID  %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME COMMAND
agaudreault      99885   0.0  0.0 410089056     48   ??  UE    1:58pm   0:00.00 /Users/agaudreault/plugin-bin/my-plugin
agaudreault      98102   0.0  0.0 410089056     48   ??  UE    1:56pm   0:00.00 /Users/agaudreault/plugin-bin/my-plugin
agaudreault      96380   0.0  0.0 410089056     48   ??  UE    1:55pm   0:00.00 /Users/agaudreault/plugin-bin/my-plugin
agaudreault      94682   0.0  0.0 410089056     48   ??  UE    1:54pm   0:00.00 /Users/agaudreault/plugin-bin/my-plugin
agaudreault      92807   0.0  0.0 410089056     48   ??  UE    1:52pm   0:00.00 /Users/agaudreault/plugin-bin/my-plugin
agaudreault      91001   0.0  0.0 410089056     48   ??  UE    1:51pm   0:00.00 /Users/agaudreault/plugin-bin/my-plugin
agaudreault      89169   0.0  0.0 410089056     48   ??  UE    1:50pm   0:00.00 /Users/agaudreault/plugin-bin/my-plugin
agaudreault      87411   0.0  0.0 410089056     48   ??  UE    1:48pm   0:00.00 /Users/agaudreault/plugin-bin/my-plugin

The process is running locally on Mac

    System Software Overview:
      System Version: macOS 14.4.1 (23E224)
      Kernel Version: Darwin 23.4.0
      Chip: Apple M1 Max

Running on version

github.com/hashicorp/go-plugin v1.6.0
@joshwizzy
Copy link
Contributor

The default value of StartTimeout is set 1 minute when not explicitly set it ClientConfig
What does your plugin do before calling goplugin.Serve?

@agaudreault
Copy link
Author

@joshwizzy not much to be honest. The plugin works, when it starts correctly. You can find the code here: https://github.com/agaudreault/argo-rollouts/blob/13e8829a6691e27743f00a4ac1ec5163e51251c1/test/cmd/step-plugin-e2e/main.go.

I found out that deleting the executed binary solved the issue as well. Easier than restarting when running locally!

@joshwizzy
Copy link
Contributor

@agaudreault
Would you mind adding an issue to the argo-rollouts repo with instructions on how to reproduce the broken state.
It might be easier for folks who are interested/familiar with the argo-rollouts plugin integration to investigate.
I am unfamiliar with argo-rollouts but this seems to be part of an initiative to add Step plugins to the project.
Most people here may not have the required context to help out.

@agaudreault
Copy link
Author

agaudreault commented Apr 23, 2024

@joshwizzy I am the one implementing the step plugin with other maintainers hence why I'm reaching out about the behavior. This is not an argo rollout issue, but you can see the code and perhaps find out if something is wrong with how the plugin is started. It's quite simple so building a repro app should be quite easy with access to existing code.

The steps are

  1. Start an application
  2. Let the application start the plugin
  3. Kill the application process
  4. Start the application
  5. Plugin cannot start

It seems like the go-plugin does not support to be re-executed if it is not shutdown gracefully

@joshwizzy
Copy link
Contributor

@agaudreault
I am unable to reproduce this using the basic greeter example.
I followed these steps:

  1. comment out this line so the plugin is not killed when the client exits
    defer client.Kill()

  2. build and trun the example app

 ./basic   
2024-04-23T15:02:57.225+0300 [DEBUG] plugin: starting plugin: path=./plugin/greeter args=[./plugin/greeter]
2024-04-23T15:02:57.227+0300 [DEBUG] plugin: plugin started: path=./plugin/greeter pid=86088
2024-04-23T15:02:57.227+0300 [DEBUG] plugin: waiting for RPC address: plugin=./plugin/greeter
2024-04-23T15:02:57.237+0300 [DEBUG] plugin.greeter: message from plugin: foo=bar timestamp=2024-04-23T15:02:57.237+0300
2024-04-23T15:02:57.237+0300 [DEBUG] plugin.greeter: plugin address: address=/var/folders/sl/h6pntx3j05l0fk99605hzxy40000gn/T/plugin918908959 network=unix timestamp=2024-04-23T15:02:57.237+0300
2024-04-23T15:02:57.237+0300 [DEBUG] plugin: using plugin: version=1
2024-04-23T15:02:57.239+0300 [DEBUG] plugin.greeter: message from GreeterHello.Greet: timestamp=2024-04-23T15:02:57.239+0300
Hello!

The plugin process stays running after the client exits

ps ax |grep greeter
86088 s000  S      0:00.01 ./plugin/greeter
  1. Run the client again successfully
./basic            
2024-04-23T15:04:38.176+0300 [DEBUG] plugin: starting plugin: path=./plugin/greeter args=[./plugin/greeter]
2024-04-23T15:04:38.178+0300 [DEBUG] plugin: plugin started: path=./plugin/greeter pid=86897
2024-04-23T15:04:38.178+0300 [DEBUG] plugin: waiting for RPC address: plugin=./plugin/greeter
2024-04-23T15:04:38.191+0300 [DEBUG] plugin.greeter: message from plugin: foo=bar timestamp=2024-04-23T15:04:38.191+0300
2024-04-23T15:04:38.191+0300 [DEBUG] plugin.greeter: plugin address: address=/var/folders/sl/h6pntx3j05l0fk99605hzxy40000gn/T/plugin523748303 network=unix timestamp=2024-04-23T15:04:38.191+0300
2024-04-23T15:04:38.191+0300 [DEBUG] plugin: using plugin: version=1
2024-04-23T15:04:38.193+0300 [DEBUG] plugin.greeter: message from GreeterHello.Greet: timestamp=2024-04-23T15:04:38.193+0300
Hello!

Two instances of the plugin are now running

ps ax |grep greeter
86088 s000  S      0:00.01 ./plugin/greeter
86897 s000  S      0:00.02 ./plugin/greeter

@joshwizzy
Copy link
Contributor

@agaudreault
Seeing as you are running on an M1 mac
Could you try running the app with this setting and see if it makes a difference
export GODEBUG=asyncpreemptoff=1

hashicorp/terraform-provider-aws#20274
hashicorp/terraform-provider-aws#16073
https://yaleman.org/post/2021/2021-01-01-apple-m1-terraform-and-golang/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants