Add log upload support for build target failed using teatime.py script#648
Add log upload support for build target failed using teatime.py script#648erman-gurses wants to merge 1 commit into
Conversation
2ff3bc3 to
26f7875
Compare
|
Is this ready to review or do you just want initial feedback as this is still marked as a draft. Regarding the TODO in the PR description: I would rather replace |
It is ready to review - I just make it ready PR from WIP - I can test with boto3. |
da48855 to
920bed8
Compare
@marbre @ScottTodd boto3 gives an error on Windows side: |
|
Is the workflow installing the package? It could be either explicitly installed with |
Sounds good! I can install it on the workflow. |
Taking a quick look: Let's not solve too many problems at once! For now I would replace uploading files via AWS cli by using boto3. The AWS Lambda is a different thing and will replace |
Sounds good to me! I can raise another PR for AWS Lambda. So I am using the indexer as it was and just updating AWS cli by using boto3. |
8ddb393 to
0d2ed8a
Compare
The last tests using boto3: |
00b136b to
4ef02aa
Compare
|
@erman-gurses if you want a code review, please batch your changes and comments more or make the PR smaller. I can't follow the stream of commits and multiple review threads. If you are still working on this, please keep the changes local or convert to a draft PR. |
@ScottTodd you are so right - will be more careful for that. My changes are done for now to address Marius's comments. |
|
(Where by "If you are still working on this," I mean "if this is not yet ready for review") |
Tried to say I addressed all comments from Marius and would like to take additional comments if it is needed. |
bda1285 to
d53aa90
Compare
d53aa90 to
06ebfad
Compare
547d2da to
c590e63
Compare
79f8994 to
e2934dc
Compare
|
@ScottTodd, regarding this PR - any other comments that I can address? |
It's in my queue. Patience :) |
Thanks @ScottTodd - forgot how busy you are |
ScottTodd
left a comment
There was a problem hiding this comment.
I'm still not sure about the architecture here. Probably want more feedback from Marius when he gets back too.
| file=sys.stderr, | ||
| ) | ||
|
|
||
| self.log_dir = os.getenv("LOG_DIR", "build/logs") |
There was a problem hiding this comment.
There should be a single source of truth for each configuration option. The self.log_path and self.log_file values are already defined above and set via the file argument. There should be no need to also specify an environment variable.
Where this gets tricky is if the logs are nested and self.log_path.parent is not actually the root that should be indexed and uploaded.
| self.s3_bucket = os.getenv("TEATIME_S3_BUCKET") | ||
| if not self.s3_bucket and self.upload_to_s3: | ||
| print( | ||
| "warning: TEATIME_S3_BUCKET is not set (S3 upload will likely fail)", | ||
| file=sys.stderr, | ||
| ) | ||
|
|
||
| self.s3_subdir = os.getenv("TEATIME_S3_SUBDIR") | ||
| if not self.s3_subdir and self.upload_to_s3: | ||
| print( | ||
| "warning: TEATIME_S3_SUBDIR is not set (S3 upload will likely fail)", | ||
| file=sys.stderr, | ||
| ) |
There was a problem hiding this comment.
These should be errors (sys.exit), not warnings.
| # Call coordinate_index_and_logs if indexing is enabled. | ||
| if ( | ||
| not self.skip_index | ||
| and self.upload_to_s3 | ||
| and self.s3_bucket | ||
| and self.log_dir | ||
| ): | ||
| self.coordinate_index_and_logs() | ||
|
|
||
| def coordinate_index_and_logs(self): | ||
| # Determine paths | ||
| log_dir = self.log_dir | ||
| amdgpu_family = os.getenv("AMDGPU_FAMILIES") |
There was a problem hiding this comment.
I'm not totally convinced that this indexing code should be in teatime.py. Also, as mentioned before, the logging script should not need to know low level details about individual build options like AMDGPU_FAMILIES.
There are two scenarios to design for:
- CI usage, where we run a controlled sequence of commands (install deps, configure credentials, load from cache, build, report statistics, upload results, trigger other jobs)
- Developer usage, where a developer just runs a CMake build.
The goal with this sort of integration work is to make it easier for both scenarios to upload logs to the cloud. At the moment, teatime.py is only responsible for creating individual log files. The cloud logging also has the wrinkle of having a generated static index page for the logs directory. What we really should have IMO is a server with dynamic index pages or some server-side processing of the uploaded logs, instead of pushing that responsibility on the client here.
We could look into having a CMake POST_BUILD (https://cmake.org/cmake/help/latest/command/add_custom_command.html) command that generates and uploads the index page after building therock-archives... or we could add a new helper target like therock-logs-upload. Those would integrate into the build system itself at some "finalization" point so developers could get their cmake build command to automatically upload in a similar way to how Bazel/Blaze can upload to resultstore/sponge, but that would still be only working around the fact that these index pages are statically generated.
There was a problem hiding this comment.
Chatting with @marbre . We can move the index generation to an AWS lambda so whenever logs are uploaded to the bucket the script will run server-side. There are other ways to solve that it in the cloud, but the general approach is that we shouldn't need to generate index pages explicitly on the client side at all.
There was a problem hiding this comment.
Sounds good, so no index generation in the workflow.
| p.add_argument( | ||
| "--skip-index", action="store_true", help="Skip indexing and uploading logs" | ||
| ) |
There was a problem hiding this comment.
I don't see value in having this argument in teatime.py. It's a big mode switch to have cmake/therock_subproject.cmake run one command and then .github/workflows/build_*_packages.yml run another command. At that point just have the workflow files run the index generator and upload script directly.
If keeping it, flipping the condition from opt-out to opt-in would make more sense to me. That way, running the script on a single file would not generate an index page.
| indexing/upload scripts. | ||
| * AMDGPU_FAMILIES: Required by `create_log_index.py` to organize logs. | ||
|
|
||
| CI systems can set `TEATIME_LABEL_GH_GROUP=1` in the environment, which will |
There was a problem hiding this comment.
All environment variables should be explained in the same section
| def get_indexer_path() -> Path: | ||
| """ | ||
| Resolve to the local third-party/indexer/indexer.py copy. | ||
| """ | ||
| indexer_path = ( | ||
| Path(__file__).resolve().parent.parent / "third-party/indexer/indexer.py" | ||
| ) | ||
| if indexer_path.is_file(): | ||
| log(f"[INFO] Using bundled indexer.py: {indexer_path}") | ||
| return indexer_path | ||
| else: | ||
| log(f"[ERROR] Bundled indexer.py not found at: {indexer_path}") | ||
| sys.exit(2) |
There was a problem hiding this comment.
The else clause should never be hit. I would drop the error handling here.
| - name: Configure AWS Credentials | ||
| if: always() | ||
| uses: aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722 # v4.1.0 |
There was a problem hiding this comment.
@geomin12 we're getting a lot of these if: always() and if: ${{ !cancelled() }} conditions in these workflows. Let's see if we can restructure the flow a bit through migrating to more scripts such that the flow is easier to understand.
Here in particular, if fetching sources fails, we'd still configure AWS credentials and try to upload logs, but we wouldn't try to build. In that case I would prefer for the workflow to fail fast and not continue to try cleanup steps that will fail.
| # Step 2: S3 upload using --bucket and --subdir | ||
| try: | ||
| upload_script = ( | ||
| Path(__file__).resolve().parent.parent | ||
| / "build_tools" | ||
| / "upload_logs_to_s3.py" | ||
| ) | ||
| print(f"[TEATIME] Uploading logs to s3://{self.s3_bucket}/{self.s3_subdir}") | ||
| subprocess.run( | ||
| [ | ||
| sys.executable, | ||
| str(upload_script), | ||
| "--log-dir", | ||
| str(log_dir), | ||
| "--bucket", | ||
| self.s3_bucket, | ||
| "--subdir", | ||
| self.s3_subdir, | ||
| ], | ||
| check=True, | ||
| ) | ||
| except subprocess.CalledProcessError as e: | ||
| print(f"[WARN] Log upload failed: {e}", file=sys.stderr) | ||
| if os.getenv("TEATIME_FAIL_ON_UPLOAD_ERROR") == "1": | ||
| raise | ||
| except Exception as e: | ||
| print(f"[WARN] Unexpected error during log upload: {e}", file=sys.stderr) | ||
| if os.getenv("TEATIME_FAIL_ON_UPLOAD_ERROR") == "1": | ||
| raise |
There was a problem hiding this comment.
We don't need to subprocess.run Python scripts we control from other Python scripts we control. Could switch to a Python library model where we import the file and call a function directly.
|
Closing this since outdated - will be open if there is any remaining development is needed. |
This PR integrates logging mechanism with
teatime.pyscript.Also handles the TODO below:
# TODO: Fork indexer.py locally to avoid relying on an external GitHub source at runtime.