-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exec/2: posix_spawn #3133
base: master
Are you sure you want to change the base?
Add exec/2: posix_spawn #3133
Conversation
There are some cases of prior-art, but from what I can tell no one's implemented I would appreciate testing on additional platforms (I've verified functionality + asan + ubsan + leaksan on amd64 linux and aarch64 macos). I don't have any amd64 macos or aarch64 linux machines that I'd be comfortable testing on. Windows is not yet implemented as I do not have any Windows machines. If there are any issues with this PR as it is, please let me know! |
Forgot I should probably tag this. 1614: this is a more robust implementation of the request .releases |= map(.date = (exec("date"; ["-jf%F", .date, "+%s"]) | trim)) |
There has historically been strong support for a "shell-out" feature for jq, To avoid confusion about this, I would recommend adding the ability to Since that PR has not been accepted, and since there has been an Btw, from these brief notes, I'm not sure whether the intent is for the output of |
As of currently, there is no sandbox. I can't depend on an unmerged PR… as it's not merged yet.
The output of exec is a single JSON string. This is a limitation, since now, the entire output must fit in memory, but once this becomes a use-case, it feels like jq is no longer the correct tool for the job (why are you shelling in gigabytes upon gigabytes of data?). Another design I had considered was returning an object like |
Yes, there is a bit of a chicken-and-egg problem, but if you were (for example) to add --sandbox first, then the onus of integration would fall on the other side. But perhaps you missed my main point, which is that without some way to toggle the feature easily, there is (in my estimation) no chance of your hard work being merged. This would probably mean your PR will languish, or worse. I've seen it happen far too often. Thanks for explaining the rationale for your handling of STDOUT. I think it would be a (fairly grievous) mistake not to provide a convenient way for the output to be presented as a stream (of JSON strings). jq is, after all, strongly stream-oriented (much like *nix :-), and the stream-oriented approach has the big advantage of being economical with memory, whereas the "slurping" approach has no real advantage as it's always easy to "slurp". Perhaps the resolution here is to pick one of the three main options, and then allow provide exec/3, where the third argument would allow the required variant to be specified. |
"Onus"? Again, it's possible to merge this before
Frankly, this comes off as extremely condescending. This isn't my first open source project contribution, and far from the biggest. I'd maybe be more receptive to this from a maintainer. In the meanwhile, the sandbox feature you claim to be a requirement for getting the PR merged is not receiving much interest from actual maintainers in that thread.
I disagree. Picking an arbitrary character to separate a stream (in your case, a newline) would make the interface asymmetric. To make it symmetric, I'd also have to accept a stream of tokens. This, however, makes the interface much clunkier to use (you now need a To wit, this is another subject where I'd be much more receptive to feedback from a maintainer. I think there is value in implementing feature requests that have been outstanding for over 10 years, and I think that the security implications of running untrusted code (i.e. downloaded .jq files, which is the risk here) are the same for all PLs. |
@CosmicToast - I'm sorry my comments came across as condescending and pushy. However, my comments were offered not from the perspective of a As for the specific issues, I still think you may have misunderstood Anyway, please don't feel obliged to respond to this post. |
I still feel like this muddies up the PR. Instead of this: static jv f_exec(jq_state *jq, jv input, jv path, jv args) {
int ret = 0;
// …
} You would have this: static jv f_exec(jq_state *jq, jv input, jv path, jv args) {
if (jq_is_sandbox(jq)) {
jv_free(input), jv_free(path), jv_free(args);
return jv_invalid_with_msg(jv_string("exec/2 is unavailable when jq is sandboxed"));
}
int ret = 0;
// …
} (Pardon the tabs, I tend to do style checkups as the last steps before de-drafting.) I'm working on error reporting / tests / docs right now, so within a couple of hours you should have that be available here. |
I have initial docs and tests up, including valgrind. Tested are inputs, outputs, args. |
I took a look at outputting a stream of tokens, but I'm not seeing how it would be done from the C side. Additionally, as expected, splitting the tokens on any input means it would become required to actually look for it in the input rather than doing buffering, and avoiding buffering (buf size of 1) to avoid consuming any input that's intended for the next token. I.e. unless I'm mistaken, with the current design of jq, it doesn't seem to be possible. |
Alright, exec now returns an object like: I also added At this point, I think there's more or less nothing left to improve in a way that would be reasonable, so once this gets reviewed I'll do a code style pass, maybe add more tests, add myself to AUTHORS as a contributor, and recommend a squash commit. |
Thanks. Just to be clear, I don't think that not having the option to stream the output will or should be a "show-stopper"; the fate of PR #1005 corroborates your observations about the difficulty of doing so. And for the record, I'd like to mention a use-case for such an option: random sampling from very large populations (e.g. reservoir sampling). Since jq does not currently have a PRNG, this is at best tricky in jq at present. The addition of a PRNG would certainly address the issue, so I find it intriguing that previous attempts to add a PRNG as a "built-in" have also foundered. |
I could probably take a look at adding a PRNG (namely PCG32) in a separate PR if there's interest. |
If you use |
This is off-topic for this PR. We can discuss it later in the PR in question if you'd like. |
This is an initial MVP with quite a few things still missing (such as: better error messages, documentation, tests). Despite this, it is already feature-complete on POSIX platforms (on Windows it currently reports an "unsupported on this platform" error). The signature is `anything | exec(path; [args…])`. `path` and all `args` must be strings. `anything` will be converted to a string if it isn't a string in memory, then piped into the process' stdin. The output is all stdout of the process. The exit code is not reported. Technically, "path" can be a simple name and `$PATH` will be searched. This is because the underlying function is `posix_spawnp`. This can bec hanged easily. The process does not have access to environment variables. This can be changed as well. Piping between programs works. Here's an example to try it out: `tostring | exec("seq"; [.]) | exec("wc"; "-l")` Expected output when inputting numbers is that number, but it notably goes through seq, then line-counting.
This is an initial MVP with quite a few things still missing (such as: better error messages, documentation, tests).
Despite this, it is already feature-complete on POSIX platforms (on Windows it currently reports an "unsupported on this platform" error).
The signature is
anything | exec(path; [args…])
.path
and allargs
must be strings.anything
will be converted to a string if it isn't a string in memory, then piped into the process' stdin.The output is all stdout of the process.
The exit code is not reported.
Technically, "path" can be a simple name and
$PATH
will be searched. This is because the underlying function isposix_spawnp
. This can bec hanged easily.The process does not have access to environment variables. This can be changed as well.
Piping between programs works. Here's an example to try it out:
tostring | exec("seq"; [.]) | exec("wc"; "-l")
Expected output when inputting numbers is that number, but it notably goes through seq, then line-counting.