Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update PythonZipper action to use CommandLineItem.CapturingMapFn #15472

Conversation

philsc
Copy link
Contributor

@philsc philsc commented May 12, 2022

Summary

This PR attempts to address #14890.

This updates the PythonZipper action to use a CommandLineItem.CapturingMapFn to defer expanding arguments until after analysis. I used CapturingMapFn, as it seemed like the only way to go about implementing the arg filtering that is done by the current code to exclude the executable and zipFile artifacts.

Test Plan

Use the example from #14890 of looking at memory usage in tensorflow before/after this patch is applied on top of version 5.0.0

Initial Setup

git clone [email protected]:tensorflow/tensorflow.git
cd tensorflow
python3 -m venv venv
source venv/bin/activate
pip install numpy

View memory usage at 5.0.0:

# STARTUP_FLAGS from https://bazel.build/rules/performance#enabling-memory-tracking
$ bazel ${STARTUP_FLAGS[@]} build --nobuild //tensorflow/python/...
$ bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390    413,338,632      241,154
py_library                           1,102           0      2,097,152        1,903
py_binary                               33         198      8,914,840      270,146
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
635MB

View memory usage at 5.0.0 with this patch applied:

$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390     65,323,312       38,111
py_library                           1,102           0      2,359,576        2,141
py_binary                               33         198        524,400       15,890
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
283MB

Ensure that the generated actions have not changed:

$ bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_out
$ ~/code/bazel/bazel-bin/src/bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_patched_out
$ diff /tmp/bazel_5_out /tmp/bazel_5_patched_out
# note: no diff output

(cherry picked from commit 9610ae8)

@philsc philsc requested a review from ckolli5 as a code owner May 12, 2022 02:48
@philsc
Copy link
Contributor Author

philsc commented May 12, 2022

I have no clue what that windows failure is about. The error message is not meaningful to me.

@sgowroji sgowroji added the team-Rules-Python Native rules for Python label May 13, 2022
@ckolli5
Copy link

ckolli5 commented May 18, 2022

@philsc Could you please rebase your branch with the latest release-5.2.0 branch? Thanks!

### Summary

This PR attempts to address bazelbuild#14890.

This updates the PythonZipper action to use a `CommandLineItem.CapturingMapFn` to defer expanding arguments until after analysis. I used `CapturingMapFn`, as it seemed like the only way to go about implementing the arg filtering that is done by the current code to exclude the `executable` and `zipFile` artifacts.

### Test Plan

Use the example from bazelbuild#14890 of looking at memory usage in tensorflow before/after this patch is applied on top of version `5.0.0`

Initial Setup
```
git clone [email protected]:tensorflow/tensorflow.git
cd tensorflow
python3 -m venv venv
source venv/bin/activate
pip install numpy
```

View memory usage at 5.0.0:
```bash
# STARTUP_FLAGS from https://bazel.build/rules/performance#enabling-memory-tracking
$ bazel ${STARTUP_FLAGS[@]} build --nobuild //tensorflow/python/...
$ bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390    413,338,632      241,154
py_library                           1,102           0      2,097,152        1,903
py_binary                               33         198      8,914,840      270,146
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
635MB
```

View memory usage at 5.0.0 with this patch applied:
```bash
$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390     65,323,312       38,111
py_library                           1,102           0      2,359,576        2,141
py_binary                               33         198        524,400       15,890
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
283MB
```

Ensure that the generated actions have not changed:
```bash
$ bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_out
$ ~/code/bazel/bazel-bin/src/bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_patched_out
$ diff /tmp/bazel_5_out /tmp/bazel_5_patched_out
# note: no diff output
```

Closes bazelbuild#15037.

PiperOrigin-RevId: 441257695
(cherry picked from commit 9610ae8)
@philsc philsc force-pushed the cherry-pick-5.2-reduce-python-memory-usage branch from d74cdd8 to 4ccbaba Compare May 18, 2022 21:11
@philsc
Copy link
Contributor Author

philsc commented May 18, 2022

Done

@ckolli5 ckolli5 merged commit 41df581 into bazelbuild:release-5.2.0 May 19, 2022
@philsc philsc deleted the cherry-pick-5.2-reduce-python-memory-usage branch May 19, 2022 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Rules-Python Native rules for Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants