Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-38328: Speed up the creation time of constant list and set literals. #17114

Merged
merged 12 commits into from
Nov 26, 2019

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Nov 12, 2019

At @methane's request, here is an alternate PR that takes place in the compiler, not the peephole optimizer. It would replace both #16498 and #16878.

I think I like this one better.

https://bugs.python.org/issue38328

@brandtbucher
Copy link
Member Author

@benjaminp, care to take a look at this when you have a chance?

@methane
Copy link
Member

methane commented Nov 25, 2019

Would you merge master and make regen-all?

@brandtbucher
Copy link
Member Author

Thanks @methane. This branch is no longer stale.

Python/compile.c Outdated
@@ -3655,6 +3656,27 @@ starunpack_helper(struct compiler *c, asdl_seq *elts,
{
Py_ssize_t n = asdl_seq_LEN(elts);
Py_ssize_t i, nsubitems = 0, nseen = 0;
if (n > 1 && are_all_items_const(elts, 0, n)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is n > 1 really good threshold? How about n > 2?

$ python3 -m timeit -s "a=frozenset((1,2))" -- "{*a}"
5000000 loops, best of 5: 63.4 nsec per loop
$ python3 -m timeit "{1,2}"
5000000 loops, best of 5: 61.5 nsec per loop

$ python3 -m timeit "[*(1,2)]"
5000000 loops, best of 5: 46.5 nsec per loop
$ python3 -m timeit "[1,2]"
5000000 loops, best of 5: 46.6 nsec per loop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only chose n > 1 because it always emits fewer opcodes than before the change. If you think n > 2 is a better heuristic, I'm fine with that.

Copy link
Member Author

@brandtbucher brandtbucher Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your first example is only slower because it pays the price of a LOAD_NAME instead of a LOAD_CONST (just a guess, though). It's not a direct comparison, unfortunately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fewer opcodes is not always more efficient.
BUILD_LIST_UNPACK calls PyList_New(0) and PyList_Extend(iterable). It causes over-allocation.
On the other hand, BUILD_LIST just calls PyList_New(n).

So I think we should be conservative about choosing the threshold.

Copy link
Member Author

@brandtbucher brandtbucher Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @methane. I think this makes sense.

I've bumped the threshold to your recommended value of n > 2. Anything else?

@methane methane merged commit 6dd9b64 into python:master Nov 26, 2019
@brandtbucher
Copy link
Member Author

Thanks for the guidance and merge @methane!

jacobneiltaylor pushed a commit to jacobneiltaylor/cpython that referenced this pull request Dec 5, 2019
shihai1991 pushed a commit to shihai1991/cpython that referenced this pull request Jan 31, 2020
@brandtbucher brandtbucher deleted the compile-constant-collections branch July 21, 2022 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants