Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve performance of join by make it internal #125

Merged
merged 1 commit into from
Sep 22, 2021

Conversation

wader
Copy link
Contributor

@wader wader commented Sep 20, 2021

Avoids exponential string append

Before:
BenchmarkJoinShort
BenchmarkJoinShort-4 26947 44399 ns/op
BenchmarkJoinLong
BenchmarkJoinLong-4 145 7295904 ns/op

After:
BenchmarkJoinShort
BenchmarkJoinShort-4 81789 14456 ns/op
BenchmarkJoinLong
BenchmarkJoinLong-4 22002 53791 ns/op

@pkoppstein
Copy link

This is just a question: does this version behave properly when given a stream as argument?

$ gojq -nr '["a","b"] | join(",", "\n")'
a,b
a
b
gojq -nr '["a","b"] | join(empty)'

@wader
Copy link
Contributor Author

wader commented Sep 20, 2021

Good question, yes should work, internal functions does not know about generators, will only get values. But i'm not sure how it get compiled to make it work in these cases, now i got curious. So i guess a call to a function that has any argument that results in no outputs will itself result in no output? seems resonable and the only way it could be? :)

Tried your examples with the change:

$ go run cmd/gojq/main.go -nr '["a","b"] | join(",", "\n")'
a,b
a
b
$ go run cmd/gojq/main.go -nr '["a","b"] | join(empty)'

@@ -1777,7 +1777,7 @@
- 'try join(",") catch .'
input: '["1","2",[3,4,5]]'
expected: |
"cannot add: string (\"1,2,\") and array ([3,4,5])"
"cannot join: array ([3,4,5])"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to change the error if we do it this way or?

@wader
Copy link
Contributor Author

wader commented Sep 20, 2021

Maybe it's possible to have a more generic internal help function for improving performance of reduce + binop builtin functions? like keep map over array in jq and then pass result to a internal function with argument for how it should aggregate?

@itchyny
Copy link
Owner

itchyny commented Sep 21, 2021

You don't need to worry about stream as argument of internal functions because they are processed before called.
Don't break {x:1,y:2} | join(",").

func.go Outdated
}
}
return strings.Join(ss, sep)
case map[string]interface{}:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object case got a bit messy. It does the same thing as .[], each in sorted key order which is different from jq.

@itchyny
Copy link
Owner

itchyny commented Sep 21, 2021

This growing complexity is the reason I stopped making it internal. I don't like the duplicates with the executor and pathValue to be used by internal functions. So how about def join($x): [.[]] | _join($x); and making _join internal? Or if type != "array" then [.[]] end | _join($x) if we avoid cloning arrays.

@wader
Copy link
Contributor Author

wader commented Sep 21, 2021

I agree, i've revised the implementation to your second suggestion. By avoid cloning the array do you mean implement our own version of strings.Join so that we don't have to copy it?

@wader
Copy link
Contributor Author

wader commented Sep 21, 2021

Also i noticed that the benchmarks were all wrong, i accidentally used []string as input so it was just measuring the functions throwing error, I was a bit surprised how fast the current one was with long arrays.

func.go Outdated

if len(ss) == 1 {
return ss[0]
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to allow [1] | join(1), but would be nice to move this check earlier somehow

@wader
Copy link
Contributor Author

wader commented Sep 21, 2021

Updated to use own string builder, long when from 57242 to 48389

func.go Outdated
sl += 5
}
case int, float64, *big.Int:
sl += 10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guessed length, should be better?

@itchyny
Copy link
Owner

itchyny commented Sep 21, 2021

By avoid cloning the array do you mean implement our own version of strings.Join so that we don't have to copy it?

No, use strings.Join. I just mean [.[]] | _join($x) clones the input array.

@wader
Copy link
Contributor Author

wader commented Sep 21, 2021

Ok switching back to strings.Join

Avoids exponential string append

Before:
BenchmarkJoinShort
BenchmarkJoinShort-4   	   26947	     44399 ns/op
BenchmarkJoinLong
BenchmarkJoinLong-4    	     145	   7295904 ns/op

After:
BenchmarkJoinShort
BenchmarkJoinShort-4   	   81789	     14456 ns/op
BenchmarkJoinLong
BenchmarkJoinLong-4    	   22002	     53791 ns/op
Copy link
Owner

@itchyny itchyny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@itchyny itchyny merged commit a85cce8 into itchyny:main Sep 22, 2021
@wader
Copy link
Contributor Author

wader commented Sep 22, 2021

👍 appreciate your comments and suggestions

@wader wader deleted the internal-join branch September 22, 2021 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants