Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overflow stack on recursive assignment #139

Closed
itchyny opened this issue Nov 30, 2023 · 8 comments
Closed

Overflow stack on recursive assignment #139

itchyny opened this issue Nov 30, 2023 · 8 comments

Comments

@itchyny
Copy link
Contributor

itchyny commented Nov 30, 2023

I'm not sure how jaq's correctness answers to this query, but it shouldn't panic.

 $ jq -nc '[[]] | .. = ..'
[[[]]]
[[]]
 $ jaq -nc '[[]] | .. = ..'
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
zsh: abort      jaq -nc '[[]] | .. = ..'

I also found a small discrepancy.

 $ jq -nc '[[]] | .. = .[]'
[[]]
 $ jaq -nc '[[]] | .. = .[]'
[]
@01mf02
Copy link
Owner

01mf02 commented Sep 2, 2024

Hi @itchyny, sorry for my late response on this!
The stack overflow in jaq is expected and comes from the different interpretation of updates.
In particular, in jaq, .. |= f can be shown to be equivalent to rec_upd(f), where def rec_upd(f): f | .[]? |= rec_upd(f);.
Using this definition, jq yields the same results as jaq; that is:

$ jq -nc 'def rec_upd(f): f | .[]? |= rec_upd(f); [[]] | rec_upd([[]])'
[... (does not terminate)]

To see an explanation of how the update works in jaq, see chapter 6 of https://github.com/01mf02/jq-lang-spec.
By the way, this chapter is currently a bit unsatisfactory, because it only describes the jaq way of performing updates.
This is because I do not understand to full detail the jq way of performing updates. However, you seem to understand it, as demonstrated by jqlang/jq@371abc7. If you would like to contribute a section to my jq language specification that explains how updates are performed exactly in jq, then this would be very welcome.

@itchyny
Copy link
Contributor Author

itchyny commented Sep 2, 2024

This report is about assignment operator (=), not updating assignment operator (|=). In jq, the semantics of f = g is g as $v | reduce path(f) as $p (.; setpath($p; $v)).

 $ jq -nc '[[]] | .. as $v | reduce path(..) as $p (.; setpath($p; $v))'
[[[]]]
[[]]

@01mf02
Copy link
Owner

01mf02 commented Sep 3, 2024

@itchyny, yes, I understand that your report is about = and not about |=. However, = is defined in jaq via |= (using that f = g is equivalent to g as $v | f |= $v), so I wanted to explain you the behaviour of = by explaining |=, because |= is more general than = (at least in jaq).

I would be interested in the precise mechanics that are involved in, for example, making [1, 2, 3] | .[] |= empty return []. How did you make this work?

@itchyny
Copy link
Contributor Author

itchyny commented Sep 3, 2024

In jq (≧1.7), deletion by |= is done at last, to avoid the issue of jqlang/jq#2051. This breaks the property (f, g) |= h is (f |= h) | (g |= h) (consider (.[0],.[2]) |= empty is not (.[0] |= empty) | (.[2] |= empty) when the input value is [1,2,3] (I also noticed that jaq does not work well with del/1 with array indices). Semantically, consider introducing satisfying (⊥ | f) = ⊥ for any f, f |= g firstly folds the iteration of updating each path of f by the first value of g ( if not exist), and then deletes all paths where the value is .

@01mf02
Copy link
Owner

01mf02 commented Oct 7, 2024

Thanks for your explanation! If I understand it correctly, that means that f |= g does something like the following:

  1. Collect all paths p that match f
  2. For every path p, run p |= g, and if g returns at least one output, then update the input, else add p to pe (list of paths to be deleted later)
  3. Finally, for every path p in pe, delete p in input

This would be enough to explain the example

[[0]] | (.[0][0], .[0]) |= if . >= [] then {} else empty end

which fails with an error, because first .[0] is updated (because it did not produce empty), yielding [{}], and then .[0][0] is updated, because it originally yielded empty, but at that point, this path does not point to valid data anymore.

However, this alone cannot be what jq currently does. For example, if that would be all, then [1, 2, 3] | .[] |= empty would first delete index 0 (yielding [2, 3]), then index 1 (yielding [2]), then index 2 (which does not exist anymore at this point).
So there seems to happen something to pe inbetween step 2 and 3.

Could it be that pe is simply reversed before step 3? This would have explained why the following example

[0, [{a: 1}]] | (.[1].a, .[0]) |= empty

yields an error (Cannot index array with string "a"). Here, the paths pe to be deleted are [[1, "a"], [0]]. If we would delete them from left to right, this would yield [{}]. However, this first seems to delete [0] (yielding [{a: 1}]), then [1, "a"], which is why the error occurs.

However, if pe was simply reversed, then

[0, [{a: 1}]] | (.[0], .[1].a) |= empty

should not yield an error. But it also yields one, again: Cannot index array with string "a".
This greatly confuses me.

Can you explain to me in which order paths for which g yields empty are deleted at the end?

@itchyny
Copy link
Contributor Author

itchyny commented Oct 7, 2024

In both order it works.

 $ jq -n '[0, {a: 1}] | (.[1].a, .[0]) |= empty'
[
  {}
]

 $ jq -n '[0, {a: 1}] | (.[0], .[1].a) |= empty'
[
  {}
]

@01mf02
Copy link
Owner

01mf02 commented Oct 7, 2024

Thank you for spotting my error. I accidentally put this object into an array.

Anyway, I understand now. I found in jq_aux.c that the paths to delete (delpaths) are sorted before deletion. That explains why both filters you mentioned return the same results.
Thanks again for clearing this up for me.

@01mf02
Copy link
Owner

01mf02 commented Oct 9, 2024

Perhaps to address your original question: Assignments with .. can yield unexpected results both in jq and jaq.
The general problem is that .. returns "larger" parts of the input before "smaller" parts.
Updating the larger parts before the smaller parts, however, can (a) make existing smaller parts disappear or (b) make new smaller parts appear.
For example:

$ jq -n '[0] | .. = {a: []}'
jq: error (at <unknown>): Cannot index object with number
$ jaq -n '[0] | .. = {a: []}'
fatal runtime error: stack overflow

Here, jq tries to update . and .[0], but after updating . to {a: []}, the value .[0] is not valid anymore, so it stumbles over (a).
jaq does not consider disappeared smaller parts (a), but it considers the new smaller parts (b), which makes it run into an infinite recursion, thus a stack overflow. This is the same problem as what you described in your original post.

To avoid such problems and the one that you originally posted, I would argue that it's generally more predictable in assignments to update smaller parts before larger parts.
We can define a filter recurse_up that does that:

$ jq -cn 'def recurse_up: (.[]? | recurse_up), .; [{a: 1}] | recurse_up'
1
{"a":1}
[{"a":1}]

Compare its output with recurse:

$ jq -cn '[{a: 1}] | recurse'
[{"a":1}]
{"a":1}
1

We can see that recurse_up first returns the smallest elements, whereas recurse first returns the largest elements.

We can use recurse_up in an update:

$ jq -cn 'def recurse_up: (.[]? | recurse_up), .; [0] | recurse_up = {a: []}'
{"a":[]}

Here, jaq yields the same result as jq.

To summarise: .. on the left-hand side of assignments can be quite confusing; using something like recurse_up helps.

@01mf02 01mf02 closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants