Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update PythonZipper action to use CommandLineItem.CapturingMapFn #15037

Conversation

alex-torok
Copy link
Contributor

@alex-torok alex-torok commented Mar 14, 2022

Summary

This PR attempts to address #14890.

This updates the PythonZipper action to use a CommandLineItem.CapturingMapFn to defer expanding arguments until after analysis. I used CapturingMapFn, as it seemed like the only way to go about implementing the arg filtering that is done by the current code to exclude the executable and zipFile artifacts.

Test Plan

Use the example from #14890 of looking at memory usage in tensorflow before/after this patch is applied on top of version 5.0.0

Initial Setup

git clone [email protected]:tensorflow/tensorflow.git
cd tensorflow
python3 -m venv venv
source venv/bin/activate
pip install numpy

View memory usage at 5.0.0:

# STARTUP_FLAGS from https://bazel.build/rules/performance#enabling-memory-tracking
$ bazel ${STARTUP_FLAGS[@]} build --nobuild //tensorflow/python/...
$ bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390    413,338,632      241,154
py_library                           1,102           0      2,097,152        1,903
py_binary                               33         198      8,914,840      270,146
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
635MB

View memory usage at 5.0.0 with this patch applied:

$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390     65,323,312       38,111
py_library                           1,102           0      2,359,576        2,141
py_binary                               33         198        524,400       15,890
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
283MB

Ensure that the generated actions have not changed:

$ bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_out
$ ~/code/bazel/bazel-bin/src/bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_patched_out
$ diff /tmp/bazel_5_out /tmp/bazel_5_patched_out
# note: no diff output

@google-cla
Copy link

google-cla bot commented Mar 14, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

For more information, open the CLA check for this pull request.

@alex-torok alex-torok changed the title Update PythonZiper action to use CommandLineItem.CapturingMapFn Update PythonZipper action to use CommandLineItem.CapturingMapFn Mar 14, 2022
@ckolli5 ckolli5 added team-Rules-Python Native rules for Python team-Build-Language and removed team-Rules-Python Native rules for Python labels Mar 22, 2022
@alex-torok
Copy link
Contributor Author

alex-torok commented Apr 11, 2022

Apologies for the review request spam. It appears that force-pushing and updating the base branch caused some CODEOWNERS auto-requests to go through, and I don't have permission to remove the requests.

edit: If anyone sees this and has permissions to remove review requests, please remove all of them.

@thomasvl thomasvl removed their request for review April 11, 2022 18:26
@coeuvre coeuvre removed the request for review from a team April 11, 2022 20:15
@aiuto
Copy link
Contributor

aiuto commented Apr 11, 2022

@rickeylev WDYT?

@rickeylev
Copy link
Contributor

LGTM, I guess! I'm not familiar with the Java APIs to do this, but it looks like it does the deferring of expansion.

@bazel-io bazel-io closed this in 9610ae8 Apr 12, 2022
philsc pushed a commit to philsc/bazel that referenced this pull request May 12, 2022
### Summary

This PR attempts to address bazelbuild#14890.

This updates the PythonZipper action to use a `CommandLineItem.CapturingMapFn` to defer expanding arguments until after analysis. I used `CapturingMapFn`, as it seemed like the only way to go about implementing the arg filtering that is done by the current code to exclude the `executable` and `zipFile` artifacts.

### Test Plan

Use the example from bazelbuild#14890 of looking at memory usage in tensorflow before/after this patch is applied on top of version `5.0.0`

Initial Setup
```
git clone [email protected]:tensorflow/tensorflow.git
cd tensorflow
python3 -m venv venv
source venv/bin/activate
pip install numpy
```

View memory usage at 5.0.0:
```bash
# STARTUP_FLAGS from https://bazel.build/rules/performance#enabling-memory-tracking
$ bazel ${STARTUP_FLAGS[@]} build --nobuild //tensorflow/python/...
$ bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390    413,338,632      241,154
py_library                           1,102           0      2,097,152        1,903
py_binary                               33         198      8,914,840      270,146
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
635MB
```

View memory usage at 5.0.0 with this patch applied:
```bash
$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390     65,323,312       38,111
py_library                           1,102           0      2,359,576        2,141
py_binary                               33         198        524,400       15,890
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
283MB
```

Ensure that the generated actions have not changed:
```bash
$ bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_out
$ ~/code/bazel/bazel-bin/src/bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_patched_out
$ diff /tmp/bazel_5_out /tmp/bazel_5_patched_out
# note: no diff output
```

Closes bazelbuild#15037.

PiperOrigin-RevId: 441257695
(cherry picked from commit 9610ae8)
philsc pushed a commit to philsc/bazel that referenced this pull request May 18, 2022
### Summary

This PR attempts to address bazelbuild#14890.

This updates the PythonZipper action to use a `CommandLineItem.CapturingMapFn` to defer expanding arguments until after analysis. I used `CapturingMapFn`, as it seemed like the only way to go about implementing the arg filtering that is done by the current code to exclude the `executable` and `zipFile` artifacts.

### Test Plan

Use the example from bazelbuild#14890 of looking at memory usage in tensorflow before/after this patch is applied on top of version `5.0.0`

Initial Setup
```
git clone [email protected]:tensorflow/tensorflow.git
cd tensorflow
python3 -m venv venv
source venv/bin/activate
pip install numpy
```

View memory usage at 5.0.0:
```bash
# STARTUP_FLAGS from https://bazel.build/rules/performance#enabling-memory-tracking
$ bazel ${STARTUP_FLAGS[@]} build --nobuild //tensorflow/python/...
$ bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390    413,338,632      241,154
py_library                           1,102           0      2,097,152        1,903
py_binary                               33         198      8,914,840      270,146
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
635MB
```

View memory usage at 5.0.0 with this patch applied:
```bash
$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390     65,323,312       38,111
py_library                           1,102           0      2,359,576        2,141
py_binary                               33         198        524,400       15,890
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
283MB
```

Ensure that the generated actions have not changed:
```bash
$ bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_out
$ ~/code/bazel/bazel-bin/src/bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_patched_out
$ diff /tmp/bazel_5_out /tmp/bazel_5_patched_out
# note: no diff output
```

Closes bazelbuild#15037.

PiperOrigin-RevId: 441257695
(cherry picked from commit 9610ae8)
ckolli5 pushed a commit that referenced this pull request May 19, 2022
)

### Summary

This PR attempts to address #14890.

This updates the PythonZipper action to use a `CommandLineItem.CapturingMapFn` to defer expanding arguments until after analysis. I used `CapturingMapFn`, as it seemed like the only way to go about implementing the arg filtering that is done by the current code to exclude the `executable` and `zipFile` artifacts.

### Test Plan

Use the example from #14890 of looking at memory usage in tensorflow before/after this patch is applied on top of version `5.0.0`

Initial Setup
```
git clone [email protected]:tensorflow/tensorflow.git
cd tensorflow
python3 -m venv venv
source venv/bin/activate
pip install numpy
```

View memory usage at 5.0.0:
```bash
# STARTUP_FLAGS from https://bazel.build/rules/performance#enabling-memory-tracking
$ bazel ${STARTUP_FLAGS[@]} build --nobuild //tensorflow/python/...
$ bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390    413,338,632      241,154
py_library                           1,102           0      2,097,152        1,903
py_binary                               33         198      8,914,840      270,146
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
635MB
```

View memory usage at 5.0.0 with this patch applied:
```bash
$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390     65,323,312       38,111
py_library                           1,102           0      2,359,576        2,141
py_binary                               33         198        524,400       15,890
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
283MB
```

Ensure that the generated actions have not changed:
```bash
$ bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_out
$ ~/code/bazel/bazel-bin/src/bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_patched_out
$ diff /tmp/bazel_5_out /tmp/bazel_5_patched_out
# note: no diff output
```

Closes #15037.

PiperOrigin-RevId: 441257695
(cherry picked from commit 9610ae8)

Co-authored-by: Alex Torok <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants