Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: --inspect-store flag #298

Closed
alexkozy opened this issue May 8, 2019 · 29 comments
Closed

proposal: --inspect-store flag #298

alexkozy opened this issue May 8, 2019 · 29 comments

Comments

@alexkozy
Copy link
Member

alexkozy commented May 8, 2019

Hi,

I'd like to propose a solution for following problem: how to discover inspector web socket URL for locally running node instances. Imagine we have a tool that inspects all running node instances and capture some information using inspector protocol. It might be process and its child processes or set of independent node processes.

When we have more then one process to inspect they can not use the same port, so there is no way to define this port ahead of time So we need to pass 0 as port to allow node find free port for inspector by itself. We can pass it using environment and then all processes started with this flag will get own port.

In case of child process we still can parse stderr and try to find magic line that contains full WebSocket URL. It works only when we control process that we would like to start. In a lot of cases it is not convenient or not possible, e.g. we'd like to debug npm run test command as is without passing it to some magic runner script.

So I would like to propose new node flag --inspect-store which takes a folder as argument. When this flag is passed directly or using NODE_OPTIONS, every time when inspector server started in node it will put special random-id file to passed folder. File will contain dump of /json/list endpoint (it contains some information about node including process title and inspector web socket URL). When inspector server finished node will try to remove this file without strict contract - tools that use this flag should be ready for stale files inside and should cleanup this folder after it is finished.

I prefer this flag to be implemented in native node code to avoid any pollution of node runtime with some user land code, e.g. we can use inspector module to actually start inspector, get its url and save to file but it will force loading of inspector and fs modules that can change program behavior with and without inspection.

As alternative we might consider implementing some kind of inspector port that can be predicted but in this case tool will need to constantly ping different ports and waste CPU time.

Another alternative: we can add flag that forces node to send inspector web socket URL back using some channel, e.g. by sending get request to given URL. It requires daemon that should always be available in background. This solution sound more complicated to me than --inspector-store flag.

PoC: nodejs/node#27605

@addaleax
Copy link
Member

addaleax commented May 8, 2019

So … my first intuitive preference would be to “let userland handle this” – it’s not too complicated to set NODE_OPTIONS="--require /path/to/script.js", where the script reads inspector.url() and maybe some process metadata and stores that where it’s best fit… the difference to --inspector-store being that it’s much more flexible, but doesn’t come out-of-the-box?

@eugeneo
Copy link

eugeneo commented May 8, 2019

I feel bit queasy about accessing the filesystem on Node.js startup - so much can go wrong. No access to a writable FS might also be a common problem...

It works only when we control process that we would like to start.

Command line flag also requires controlling the process.

File will contain dump of /json/list endpoint (it contains some information about node including process title and inspector web socket URL)

Is anything other then URL (or even port) required? Given the port one can always access the /json/list endpoint for that process.

e.g. we can use inspector module to actually start inspector, get its url and save to file but it will force loading of inspector and fs modules that can change program behavior with and without inspection

  1. Does this have to be in core Node and not an ecosystem module?
  2. Are we certain those modules are not always loaded anyways.

Another alternative: we can add flag that forces node to send inspector web socket URL back using some channel, e.g. by sending get request to given URL. It requires daemon that should always be available in background. This solution sound more complicated to me than --inspector-store flag.

Would it be possible to solve this by having "parent" instance in fork and cluster scenarios cover this role? Then parent instances could publish child instances in their /json/list and/or pass them up the hierarchy. I am assuming inspector URL discovery for the "root" instance is a solved problem.

One problem I see is if the "hierarchy chain" is broken on some level because some process terminated.

@alexkozy
Copy link
Member Author

alexkozy commented May 8, 2019

So … my first intuitive preference would be to “let userland handle this” – it’s not too complicated to set NODE_OPTIONS="--require /path/to/script.js", where the script reads inspector.url() and maybe some process metadata and stores that where it’s best fit… the difference to --inspector-store being that it’s much more flexible, but doesn’t come out-of-the-box?

I have experience using this approach inside ndb and it comes with following disadvantages:

  • we can not profile / debug node startup with this approach that might be significant with short running programs or for node.js collaborators who works on different node parts,
  • we need to expose something like inspector.waitForConnection() in userland that completely blocks current event loop to avoid any code from being executed, otherwise it might broke some assumptions, e.g. for channel between parent and channel process, I do not like idea of exposing any blocking methods,
  • any userland code that we put there a little bit change program behavior under inspection and without inspection, e.g. we need to require inspector and fs to get url and store it somewhere.
  • minor thing, --require requires full path to js script it might be trickier to get it works nice on windows, since this file should be located in some folder without spaces in path, it is much easier to create and pass tmp folder name as argument, than copy script to safe location - most likely it means that we need to fix --require.
  • some fun starts when when we pass regular --require, e.g. with some module loader, who runs first, how to guarantee that our require is first.

@alexkozy
Copy link
Member Author

alexkozy commented May 8, 2019

I feel bit queasy about accessing the filesystem on Node.js startup - so much can go wrong. No access to a writable FS might also be a common problem...

I am writing tool so I should create something writable before passing it to node. If node can not write to file then it won't and tool won't detect processes and should fix it on its side. We already have some flags that dumps information to file e.g. --cpu-prof-name.

Command line flag also requires controlling the process.

NODE_OPTIONS=--inspect-store=/tmp/store node script.js
As well as you can define this environment variable using any other way.

Is anything other then URL (or even port) required? Given the port one can always access the /json/list endpoint for that process.

Only url with port are required but I believe that if we are dumping something - it is nice to dump something already available to not create new format for almost the same information.

  1. Does this have to be in core Node and not an ecosystem module?
  2. Are we certain those modules are not always loaded anyways.

In general if we run anything - it means that we run something differently in comparison with normal run and most implementations should be as simple as possible to avoid pollution of inspecting process so they should get url as fast as possible and dump it using as less code as possible. As soon as you get more complicated logic there user will be closer and closer to a point when they inspect/debug this code than actual node scripts.

Would it be possible to solve this by having "parent" instance in fork and cluster scenarios cover this role? Then parent instances could publish child instances in their /json/list and/or pass them up the hierarchy. I am assuming inspector URL discovery for the "root" instance is a solved problem.

One problem I see is if the "hierarchy chain" is broken on some level because some process terminated.

It can be worse, inspected processes might not know about each other, imagine that there are multiple processes running on cron and you'd like to build a tool for them. With my solution you create folder, setup NODE_OPTIONS=--inspect-store=/tmp/store in environment and it works. Any central orchestration is not needed.

@jkrems
Copy link
Contributor

jkrems commented May 8, 2019

we can not profile / debug node startup with this approach that might be significant with short running programs or for node.js collaborators who works on different node parts

Not sure I follow this part. Wouldn't writing to a file, picking up the change, and then trying to connect also realistically miss the startup of node?

@alexkozy
Copy link
Member Author

alexkozy commented May 8, 2019

Not sure I follow this part. Wouldn't writing to a file, picking up the change, and then trying to connect also realistically miss the startup of node?

If in addition to --inspect-store you pass --inspect-brk it will block in native code waiting until someone connected and send Runtime.runIfWaitingForDebugger. So tool have time to send any other preparation command before run, e.g. Profiler.startPreciseCoverage. Node won't stop at breakpoint in this case since tool won't pass Debugger.enable to enable debugger if it does not need it.

So writing files and waiting for Runtime.runIfWaitingForDebugger happens in native code before any node boostrap or user land code.

@joyeecheung
Copy link
Member

joyeecheung commented May 9, 2019

I am writing tool so I should create something writable before passing it to node. If node can not write to file then it won't and tool won't detect processes and should fix it on its side. We already have some flags that dumps information to file e.g. --cpu-prof-name.

The file specified by --cpu-prof-name is written on process exit. There are plenty of other options that specify files to be accessed during startup, though, like --experimental-policy. If environment variables count (since you'd also need control of the process) then there are files accessed even earlier like NODE_REPL_HISTORY or NODE_EXTRA_CA_CERTS. (I actually think the feature proposed here is more akin to NODE_REPL_HISTORY?)

What is the use case of this flag? Is the consumer looking for the endpoint given a PID, or is it trying to find all Node.js process with such endpoints open and these endpoints?

@alexkozy
Copy link
Member Author

alexkozy commented May 9, 2019

The file specified by --cpu-prof-name is written on process exit. There are plenty of other options that specify files to be accessed during startup, though, like --experimental-policy. If environment variables count (since you'd also need control of the process) then there are files accessed even earlier like NODE_REPL_HISTORY or NODE_EXTRA_CA_CERTS. (I actually think the feature proposed here is more akin to NODE_REPL_HISTORY?)

It looks like we use a lot of files in Node, I definitely can follow all best practice in this area. Thanks for these examples.

What is the use case of this flag? Is the consumer looking for the endpoint given a PID, or is it trying to find all Node.js process with such endpoints open and these endpoints?

Consumer is trying to find all Node.js processes started inside current process environment with such endpoints. Process environment has nice property that it is inherited by default by child processes.

There are two use cases:

  1. ndb requires reliable way to detect child processes, as well it needs another feature when you can start this tool, go to your terminal, run something like export NODE_OPTIONS='--inspect-store=/tmp/store --inspect-brk=0' there and later any process started from that terminal session will be automatically detected by ndb. E.g. I have ndb with pause on exception checkbox enabled in background, start npm run test, ndb automatically detects npm process and any child test process, pause on exception and bring itself to front.
  2. any other tool can create temporary folder and ask user to add the same magic NODE_OPTIONS to environment where target node processes are executed, this tool can watch for folder and every time when new node process appear it can capture some information, e.g. DevTools heap sampling profile or cpu profile, using protocol.

@eugeneo
Copy link

eugeneo commented May 9, 2019

Can there be just a single file that gets appended?

@alexkozy
Copy link
Member Author

alexkozy commented May 9, 2019

Can there be just a single file that gets appended?

In theory yes but in this case different node processes will need to implement some kind of synchronization between them to not messed up each other additions.
At the same time I believe that inspector should remove this file at the end when it is possible. Removing lines from the middle of the file definitely requires synchronization.

If node has native support for some file based database - it can be used here, but I believe that all file based databases are implemented in user land.

@alexkozy alexkozy changed the title proposal: --inspector-store flag proposal: --inspect-store flag May 9, 2019
@joyeecheung
Copy link
Member

joyeecheung commented May 10, 2019

There are two use cases...

Thanks for the explanation.

Consumer is trying to find all Node.js processes started inside current process environment with such endpoints. Process environment has nice property that it is inherited by default by child processes.

As I understand, the consumer does not need to discover and attach to child processes after they are already started? I wonder whether we could do the reverse: instead of having the consumer reading something from a folder, we actively notify the consumer from the Node.js instances - e.g. through a pipe or a domain socket, i.e. something like our PipeWrap - when the inspector server starts listening. It may be less prone to race conditions that way. IIUC, that's also how we notify child processes from the master that the inspector is started in the cluster implementation.

@alexkozy
Copy link
Member Author

As I understand, the consumer does not need to discover and attach to child processes after they are already started?

I do not want to limit consumer in this way. If consumer would like to have, scheduled using cron, script that starts from time to time, detect all running node instances and captures sampling heap profile using protocol to aggregate this data later - it sounds like pretty valid use case. With proposed solution consumer needs to pass NODE_OPTIONS='--inspect-store=/tmp/store --inspect' in environment to get this behavior.

I wonder whether we could do the reverse: instead of having the consumer reading something from a folder, we actively notify the consumer from the Node.js instances - e.g. through a pipe or a domain socket, i.e. something like our PipeWrap - when the inspector server starts listening. It may be less prone to race conditions that way. IIUC, that's also how we notify child processes from the master that the inspector is started in the cluster implementation.

Reverse connection might give us a lot of more flexibility over how information about discovered node processes is stored. I like it but at the same time it adds requirement for some demon that is running in background and requires some cpu cycles. I actually tried reverse connection with ndb but it feels more complicated to me and less reliable.
I am not sure about what race we are talking about, could you describe potential race in more details? With --inspect-brk node will wait connection and Runtime.runIfWaitingForDebugger command over protocol, so consumer has control over when node instance will start and do any preparation work.

@alexkozy
Copy link
Member Author

I was thinking about current available capabilities a little bit more and I can say that parsing stderr to get WebSocket to debug child processes has one crucial issue. If you pass --inspect-brk using NODE_OPTIONS and parent process ignores its child's stderr stream then child will wait for connection forever since there is no any way to detect child's websocket port.

--inspect-store solves this problem.

@joyeecheung
Copy link
Member

joyeecheung commented May 11, 2019

I do not want to limit consumer in this way. If consumer would like to have, scheduled using cron, script that starts from time to time, detect all running node instances and captures sampling heap profile using protocol to aggregate this data later - it sounds like pretty valid use case.

I see, that's fair.

I am not sure about what race we are talking about, could you describe potential race in more details?

What happens if process.debugPort is mutated? There are also process._debugProcess(), process._debugEnd() (unfortunately not deprecated), and SIGUSR1, I imagine coordination with these may be prone to races. Though these should be edge cases that come from how the user uses the inspector so they are the one to fix it.

I like it but at the same time it adds requirement for some demon that is running in background and requires some cpu cycles. I actually tried reverse connection with ndb but it feels more complicated to me and less reliable.

I think there are some trade-offs here. Using the file system means we are adding another layer of states to manage, which may be out of sync with the actual states in the processes (since we are not managing those in-memory). For instance, what if the process crashes without having a chance to clean up the file?

@alexkozy
Copy link
Member Author

What happens if process.debugPort is mutated? There are also process._debugProcess(), process._debugEnd() (unfortunately not deprecated), and SIGUSR1, I imagine coordination with these may be prone to races. Though these should be edge cases that come from how the user uses the inspector so they are the one to fix it.

Good point. If it is possible then it would be nice to prohibit all these methods and signal if --inspect flag already passed to node. At the same time I believe that it is user choice: if they would like to use --inspect-store, they should be prepared for some super rare packages which do crazy stuff using mentioned methods.

I think there are some trade-offs here. Using the file system means we are adding another layer of states to manage, which may be out of sync with the actual states in the processes (since we are not managing those in-memory). For instance, what if the process crashes without having a chance to clean up the file?

External tool that uses --inspect-store flag is responsible for maintaining folder which it passed to node processes. Having stale files is not big issue since tool detects that file exists, will get WebSocket url from this file and when it can not connect to Node using this URL it can remove this file or implement any other required logic.

@alexkozy
Copy link
Member Author

alexkozy commented May 15, 2019

Some action items after diagnostics meeting:

  • consider using diagnostics report to report WebSocket URL: https://nodejs.org/api/report.html
  • research reverse connection option more,
  • decide should it be possible to inspect/debug bootstrap code or not.

@alexkozy
Copy link
Member Author

alexkozy commented May 15, 2019

@cjihrig please take a look based on git blame it looks like you implemented big part of --experimental-report. I am considering adding inspector WebSocket URL to report and adding flag that pushes Node to dump report at start.

Based on my analysis it looks like we need created and initialized isolate only for PrintGCStatistics and for current JavaScript stack trace so it looks like we easily can add --report-on-start flag. It should dump report exactly after inspector started and before inspector blocks execution waiting for connection if inspector enabled.

I find that all report code is behind NODE_REPORT define, is it because this feature is experimental or it is part of design and by default node will be build without NODE_REPORT?

If you think that the idea makes some sense, I will be happy to try to implement it.

@cjihrig
Copy link

cjihrig commented May 16, 2019

I find that all report code is behind NODE_REPORT define, is it because this feature is experimental or it is part of design and by default node will be build without NODE_REPORT?

I anticipate that being removed once the feature is out of experimental. Even if it does remain, it's already enabled by default. That said, it means that some users might have it disabled.

Some action items after diagnostics meeting:
consider using diagnostics report to report WebSocket URL

I'm not sure what was said at the meeting. Can you comment on that?

I do think we should add inspector information to the diagnostic report output, regardless of the outcome of this thread. I'm not sure how I feel about the idea of a --report-on-start flag, TBH. I wouldn't block it though.

@gireeshpunathil
Copy link
Member

IMO:

  • NODE_REPORT can stay even after exiting experimental, to provide flexibility for users who build from source to exclude report if need be, following the modular examples for inspector, ICU, SSL etc.

  • As I said in the meeting, the bootstrap sequence has been on the move recently, and may change again too. @joyeecheung may be able to state what is the final / desired state.

  • --report-on-start / --report-on-init looks great to me. However, I also feel the regular report (generated on the fly on a fully warmed up Node) could also record the inspector URL in it - catering to may be a different type of use case?

@alexkozy
Copy link
Member Author

alexkozy commented May 16, 2019

There is another important consideration: when we are running couple node processes in the same environment, it should not be possible by default for one process to find inspector websocket url of another process and connect to it. We consider inspector websocket url as unguessable one.

We have one privileged node process which runs our tool and any number of other node processes which our tool needs to inspect. On tool side it should be possible to start all other node instances (including child processes) in a way when only tool can get access to their inspector websocket url.

There are two ways to get inspector websocket url: parse stderr and access http://<ip>:<port>/json/list endpoint. I have separate proposal to disable json endpoint (#303).

With this test in mind I am not comfortable with putting inspector websocket url to report by default. Based on sensitivity of this information I prefer to have separate flag that enable and control how we store inspector websocket url, if we put this information to report, any further report related work requires having mentioned consideration in mind.

Any user land solution in this case might be risky, if user land code can get instructions where to store information, it means that any other part of current node process can get this instructions and try to find there information about other node processes.

With --inspect-store: we can utilize OS capabilities and give only write access for unprivileged node processes (chmod 333 /tmp/store), the tool process gets read access as well and can list all running nodes. This solution sounds good to me since it is based on OS primitives. If we'd like to make it safer, inspector can put file with its inspector websocket url if and only if it can not read from the given folder.

Reverse connection passed the test, there is only risk that someone can somehow kill server that listens for reverse connection and starts own server instead, it adds another requirement for demon to be run all the time.

alexkozy added a commit to alexkozy/node that referenced this issue Jun 14, 2019
The argument takes the IPC path as an argument when it is passed,
the inspector web socket server will send its full web socket url
to IPC server with the given path at the start.

This flag is required to solve inspector web socket url discovery
the problem for child processes.

Only one discovery option is available right now; it is searching
for ws:// URLs in stderr of node process. This approach does not
work when the parent process ignores the stderr of the child process,
e.g. update-notifier uses this technique.

Discussion about using files instead can be found here:
nodejs/diagnostics#298
@alexkozy
Copy link
Member Author

Since this problem still exists and ndb (and any other tool that needs to debug child processes) suffers without proposed flag.

I would like to propose alternative solution. In addition to files Node.js implements IPC socket and server for IPC communications so my new pull request implements --inspect-publish-uid-ipc-path, when this flag is there, inspector will try to send web socket url to given named socket.

Any thoughts?

@hediet
Copy link

hediet commented Jun 20, 2019

So … my first intuitive preference would be to “let userland handle this” – it’s not too complicated to set NODE_OPTIONS="--require /path/to/script.js", where the script reads inspector.url() and maybe some process metadata and stores that where it’s best fit… the difference to --inspector-store being that it’s much more flexible, but doesn’t come out-of-the-box?

This is exactly what easy-attach is doing. It does not use inspector.url() though, but rather connects to the debugger itself using websockets. It then sends this url to VS Code, again using websockets. It works pretty well.

@alexkozy
Copy link
Member Author

This is exactly what easy-attach is doing. It does not use inspector.url() though, but rather connects to the debugger itself using websockets. It then sends this url to VS Code, again using websockets. It works pretty well.

easy-attach helps with a different use case when user should inject some JS in the target process. I believe that you can use inspector module to make code a little bit cleaner and avoid using undocumented process._debugProcess.
This issue is about discovering inspector port without any change in user code.

@hediet
Copy link

hediet commented Jun 20, 2019

@ak239 You can still inject easy-attach with NODE_OPTIONS="--require ..." without any change in user code.

@alexkozy
Copy link
Member Author

alexkozy commented Jun 20, 2019

@ak239 You can still inject easy-attach with NODE_OPTIONS="--require ..." without any change in user code.

@hediet ndb uses this approach nowadays - it injects preload.js. It is definitely fine for debugging user code but when we need it for cpu or memory profiling - I'd like to get clear picture without any injected code. At the same time with this approach it is not possible to debug/profile bootstrap code.
At the same time I found injecting module from global package using --require a little bit painful, e.g. it was not possible to use spaces in a path on windows at least in Node 8.

If we already publish inspector.url() to stderr and using /json/list endpoint, I believe we can expose it using another channel as well. Channel that is friendly for tools that do not want to think a lot about how and what code should be injected.

In general we can workaround a lot of things, the question here how friendly platform should be for tools.

@alexkozy
Copy link
Member Author

@hediet I thought about your solution a little bit more and I think that there is no proper way to emulate sync waiting for debugger to connect except inspector.open with true as last flag. The problem here that any debugger or tool internally do some variation of:

  1. connect to discovered node.
  2. send some preparation commands using protocol, e.g. Debugger.enable.
  3. send Runtime.runIfWaitingForDebugger to start node.

This contract is crucial to avoid possible race conditions. Node is not processing any inspector related activity when it is blocked on any sync call. I believe that you added waitSomeCycles function to address this problem.

inspector.open runs nested message loop, waits for connection, process all protocol commands and only when Runtime.runIfWaitingForDebugger is called - it allows node to continue.

@hediet
Copy link

hediet commented Jun 21, 2019

@ak239 That sounds nice, thanks man :) I will definitely explore the inspector module and see what I can do with it.
waitSomeCycles is only one part to wait for the debugger - the other part is the debugger proxy that is launched and inspects the messages sent to the debugger and waits for certain events.

@alexkozy
Copy link
Member Author

With the recent inspector.waitForDebugger addition, userland solution works fine to me. I will close this one for now.

@hediet
Copy link

hediet commented Sep 24, 2019

@ak239 thanks, I use inspector.open now and it works great! I also use chrome-remote-interface to step out and continue if no one wants to attach. Its quite funny to have a program that launches another that debugs the first :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants