Add log upload support for build target failed using teatime.py script by erman-gurses · Pull Request #648 · ROCm/TheRock

erman-gurses · 2025-05-17T00:38:48Z

This PR integrates logging mechanism with teatime.py script.
Also handles the TODO below:
# TODO: Fork indexer.py locally to avoid relying on an external GitHub source at runtime.

erman-gurses · 2025-05-21T17:48:57Z

Last Tests:
Linux: https://github.com/ROCm/TheRock/actions/runs/15155118087
Windows: https://github.com/ROCm/TheRock/actions/runs/15155125988

marbre · 2025-05-21T17:53:34Z

Is this ready to review or do you just want initial feedback as this is still marked as a draft. Regarding the TODO in the PR description: I would rather replace indexer.py with an AWS Lambda and use boto3.

erman-gurses · 2025-05-21T17:55:26Z

Is this ready to review or do you just want initial feedback as this is still marked as a draft. Regarding the TODO in the PR description: I would rather replace indexer.py with an AWS Lambda and use boto3.

It is ready to review - I just make it ready PR from WIP - I can test with boto3.

erman-gurses · 2025-05-21T20:10:45Z

Is this ready to review or do you just want initial feedback as this is still marked as a draft. Regarding the TODO in the PR description: I would rather replace indexer.py with an AWS Lambda and use boto3.

It is ready to review - I just make it ready PR from WIP - I can test with boto3.

@marbre @ScottTodd boto3 gives an error on Windows side:

Traceback (most recent call last):
  File "C:\home\runner\_work\TheRock\TheRock\build_tools\create_log_index.py", line 6, in <module>
    import boto3
ModuleNotFoundError: No module named 'boto3'
[WARN] create_log_index.py failed: Command '['C:\\home\\runner\\_work\\_tool\\Python\\3.12.10\\x64\\python3.exe', 'C:\\home\\runner\\_work\\TheRock\\TheRock\\build_tools\\create_log_index.py', '--build-dir', 'B:\\tmpbuild', '--amdgpu-family', 'gfx110X-dgpu']' returned non-zero exit status 1.

ScottTodd · 2025-05-21T20:24:24Z

Is the workflow installing the package? It could be either explicitly installed with pip install boto3 or we could include it in a requirements.txt file.

erman-gurses · 2025-05-21T20:26:25Z

Is the workflow installing the package? It could be either explicitly installed with pip install boto3 or we could include it in a requirements.txt file.

Sounds good! I can install it on the workflow.

marbre · 2025-05-21T23:38:47Z

Is this ready to review or do you just want initial feedback as this is still marked as a draft. Regarding the TODO in the PR description: I would rather replace indexer.py with an AWS Lambda and use boto3.

It is ready to review - I just make it ready PR from WIP - I can test with boto3.

Taking a quick look: Let's not solve too many problems at once! For now I would replace uploading files via AWS cli by using boto3. The AWS Lambda is a different thing and will replace indexer.py but that should be done in a separate PR. What I was saying is that the TODO note is probably not what we aim for but as said that's something for a follow up.

erman-gurses · 2025-05-21T23:44:15Z

Is this ready to review or do you just want initial feedback as this is still marked as a draft. Regarding the TODO in the PR description: I would rather replace indexer.py with an AWS Lambda and use boto3.

It is ready to review - I just make it ready PR from WIP - I can test with boto3.

Taking a quick look: Let's not solve too many problems at once! For now I would replace uploading files via AWS cli by using boto3. The AWS Lambda is a different thing and will replace indexer.py but that should be done in a separate PR. What I was saying is that the TODO note is probably not what we aim for but as said that's something for a follow up.

Sounds good to me! I can raise another PR for AWS Lambda. So I am using the indexer as it was and just updating AWS cli by using boto3.

erman-gurses · 2025-05-22T03:26:20Z

Is this ready to review or do you just want initial feedback as this is still marked as a draft. Regarding the TODO in the PR description: I would rather replace indexer.py with an AWS Lambda and use boto3.

It is ready to review - I just make it ready PR from WIP - I can test with boto3.

Taking a quick look: Let's not solve too many problems at once! For now I would replace uploading files via AWS cli by using boto3. The AWS Lambda is a different thing and will replace indexer.py but that should be done in a separate PR. What I was saying is that the TODO note is probably not what we aim for but as said that's something for a follow up.

Sounds good to me! I can raise another PR for AWS Lambda. So I am using the indexer as it was and just updating AWS cli by using boto3.

The last tests using boto3:
Linux: https://github.com/ROCm/TheRock/actions/runs/15177259483
Windows: https://github.com/ROCm/TheRock/actions/runs/15177261545

ScottTodd · 2025-05-22T20:17:57Z

@erman-gurses if you want a code review, please batch your changes and comments more or make the PR smaller. I can't follow the stream of commits and multiple review threads.

If you are still working on this, please keep the changes local or convert to a draft PR.

erman-gurses · 2025-05-22T20:21:40Z

@erman-gurses if you want a code review, please batch your changes and comments more or make the PR smaller. I can't follow the stream of commits and multiple review threads.

If you are still working on this, please keep the changes local or convert to a draft PR.

@ScottTodd you are so right - will be more careful for that. My changes are done for now to address Marius's comments.

ScottTodd · 2025-05-22T20:25:00Z

(Where by "If you are still working on this," I mean "if this is not yet ready for review")

erman-gurses · 2025-05-22T20:27:34Z

(Where by "If you are still working on this," I mean "if this is not yet ready for review")

Tried to say I addressed all comments from Marius and would like to take additional comments if it is needed.
(Needed to fix one more thing - INDEXER_PATH)

erman-gurses · 2025-05-27T07:19:32Z

Windows: https://github.com/ROCm/TheRock/actions/runs/15268948168
Linux: https://github.com/ROCm/TheRock/actions/runs/15269107625
Latest passed tests.

erman-gurses · 2025-05-28T23:08:13Z

@ScottTodd, regarding this PR - any other comments that I can address?

ScottTodd · 2025-05-28T23:12:14Z

@ScottTodd, regarding this PR - any other comments that I can address?

It's in my queue. Patience :)

erman-gurses · 2025-05-28T23:13:13Z

@ScottTodd, regarding this PR - any other comments that I can address?

It's in my queue. Patience :)

Thanks @ScottTodd - forgot how busy you are

ScottTodd

I'm still not sure about the architecture here. Probably want more feedback from Marius when he gets back too.

ScottTodd · 2025-05-29T16:59:09Z

+                file=sys.stderr,
+            )
+
+        self.log_dir = os.getenv("LOG_DIR", "build/logs")


There should be a single source of truth for each configuration option. The self.log_path and self.log_file values are already defined above and set via the file argument. There should be no need to also specify an environment variable.

Where this gets tricky is if the logs are nested and self.log_path.parent is not actually the root that should be indexed and uploaded.

ScottTodd · 2025-05-29T17:03:36Z

+        self.s3_bucket = os.getenv("TEATIME_S3_BUCKET")
+        if not self.s3_bucket and self.upload_to_s3:
+            print(
+                "warning: TEATIME_S3_BUCKET is not set (S3 upload will likely fail)",
+                file=sys.stderr,
+            )
+
+        self.s3_subdir = os.getenv("TEATIME_S3_SUBDIR")
+        if not self.s3_subdir and self.upload_to_s3:
+            print(
+                "warning: TEATIME_S3_SUBDIR is not set (S3 upload will likely fail)",
+                file=sys.stderr,
+            )


These should be errors (sys.exit), not warnings.

ScottTodd · 2025-05-29T18:34:48Z

+        # Call coordinate_index_and_logs if indexing is enabled.
+        if (
+            not self.skip_index
+            and self.upload_to_s3
+            and self.s3_bucket
+            and self.log_dir
+        ):
+            self.coordinate_index_and_logs()
+
+    def coordinate_index_and_logs(self):
+        # Determine paths
+        log_dir = self.log_dir
+        amdgpu_family = os.getenv("AMDGPU_FAMILIES")


I'm not totally convinced that this indexing code should be in teatime.py. Also, as mentioned before, the logging script should not need to know low level details about individual build options like AMDGPU_FAMILIES.

There are two scenarios to design for:

CI usage, where we run a controlled sequence of commands (install deps, configure credentials, load from cache, build, report statistics, upload results, trigger other jobs)

Developer usage, where a developer just runs a CMake build.

The goal with this sort of integration work is to make it easier for both scenarios to upload logs to the cloud. At the moment, teatime.py is only responsible for creating individual log files. The cloud logging also has the wrinkle of having a generated static index page for the logs directory. What we really should have IMO is a server with dynamic index pages or some server-side processing of the uploaded logs, instead of pushing that responsibility on the client here.

We could look into having a CMake POST_BUILD (https://cmake.org/cmake/help/latest/command/add_custom_command.html) command that generates and uploads the index page after building therock-archives... or we could add a new helper target like therock-logs-upload. Those would integrate into the build system itself at some "finalization" point so developers could get their cmake build command to automatically upload in a similar way to how Bazel/Blaze can upload to resultstore/sponge, but that would still be only working around the fact that these index pages are statically generated.

Chatting with @marbre . We can move the index generation to an AWS lambda so whenever logs are uploaded to the bucket the script will run server-side. There are other ways to solve that it in the cloud, but the general approach is that we shouldn't need to generate index pages explicitly on the client side at all.

Sounds good, so no index generation in the workflow.

ScottTodd · 2025-05-29T18:36:57Z

+    p.add_argument(
+        "--skip-index", action="store_true", help="Skip indexing and uploading logs"
+    )


I don't see value in having this argument in teatime.py. It's a big mode switch to have cmake/therock_subproject.cmake run one command and then .github/workflows/build_*_packages.yml run another command. At that point just have the workflow files run the index generator and upload script directly.

If keeping it, flipping the condition from opt-out to opt-in would make more sense to me. That way, running the script on a single file would not generate an index page.

ScottTodd · 2025-05-29T18:43:00Z

+  indexing/upload scripts.
+* AMDGPU_FAMILIES: Required by `create_log_index.py` to organize logs.

 CI systems can set `TEATIME_LABEL_GH_GROUP=1` in the environment, which will


All environment variables should be explained in the same section

ScottTodd · 2025-05-29T18:44:13Z

+def get_indexer_path() -> Path:
+    """
+    Resolve to the local third-party/indexer/indexer.py copy.
+    """
+    indexer_path = (
+        Path(__file__).resolve().parent.parent / "third-party/indexer/indexer.py"
+    )
+    if indexer_path.is_file():
+        log(f"[INFO] Using bundled indexer.py: {indexer_path}")
+        return indexer_path
+    else:
+        log(f"[ERROR] Bundled indexer.py not found at: {indexer_path}")
+        sys.exit(2)


The else clause should never be hit. I would drop the error handling here.

ScottTodd · 2025-05-29T18:47:22Z

+      - name: Configure AWS Credentials
+        if: always()
+        uses: aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722 # v4.1.0


@geomin12 we're getting a lot of these if: always() and if: ${{ !cancelled() }} conditions in these workflows. Let's see if we can restructure the flow a bit through migrating to more scripts such that the flow is easier to understand.

Here in particular, if fetching sources fails, we'd still configure AWS credentials and try to upload logs, but we wouldn't try to build. In that case I would prefer for the workflow to fail fast and not continue to try cleanup steps that will fail.

ScottTodd · 2025-05-29T18:57:45Z

+        # Step 2: S3 upload using --bucket and --subdir
+        try:
+            upload_script = (
+                Path(__file__).resolve().parent.parent
+                / "build_tools"
+                / "upload_logs_to_s3.py"
+            )
+            print(f"[TEATIME] Uploading logs to s3://{self.s3_bucket}/{self.s3_subdir}")
+            subprocess.run(
+                [
+                    sys.executable,
+                    str(upload_script),
+                    "--log-dir",
+                    str(log_dir),
+                    "--bucket",
+                    self.s3_bucket,
+                    "--subdir",
+                    self.s3_subdir,
+                ],
+                check=True,
+            )
+        except subprocess.CalledProcessError as e:
+            print(f"[WARN] Log upload failed: {e}", file=sys.stderr)
+            if os.getenv("TEATIME_FAIL_ON_UPLOAD_ERROR") == "1":
+                raise
+        except Exception as e:
+            print(f"[WARN] Unexpected error during log upload: {e}", file=sys.stderr)
+            if os.getenv("TEATIME_FAIL_ON_UPLOAD_ERROR") == "1":
+                raise


We don't need to subprocess.run Python scripts we control from other Python scripts we control. Could switch to a Python library model where we import the file and call a function directly.

erman-gurses · 2026-03-18T00:17:16Z

Closing this since outdated - will be open if there is any remaining development is needed.

github-project-automation Bot added this to TheRock Triage May 17, 2025

github-project-automation Bot moved this to TODO in TheRock Triage May 17, 2025

erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch 3 times, most recently from 2ff3bc3 to 26f7875 Compare May 20, 2025 17:53

erman-gurses requested review from ScottTodd and marbre May 21, 2025 17:49

erman-gurses marked this pull request as ready for review May 21, 2025 17:52

erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch 2 times, most recently from da48855 to 920bed8 Compare May 21, 2025 19:59

erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch from 8ddb393 to 0d2ed8a Compare May 22, 2025 00:34

marbre requested changes May 22, 2025

View reviewed changes

erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch from 00b136b to 4ef02aa Compare May 22, 2025 16:42

erman-gurses requested a review from marbre May 22, 2025 21:38

erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch 3 times, most recently from bda1285 to d53aa90 Compare May 27, 2025 07:05

erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch from d53aa90 to 06ebfad Compare May 27, 2025 07:12

ScottTodd requested changes May 27, 2025

View reviewed changes

ScottTodd mentioned this pull request May 27, 2025

Add indexer.py under third-party/indexer/ #717

Merged

erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch 3 times, most recently from 547d2da to c590e63 Compare May 28, 2025 19:32

erman-gurses closed this May 28, 2025

erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch from 79f8994 to e2934dc Compare May 28, 2025 21:27

github-project-automation Bot moved this from TODO to Done in TheRock Triage May 28, 2025

erman-gurses reopened this May 28, 2025

Clean rebase

c70a0fd

erman-gurses requested a review from ScottTodd May 28, 2025 21:39

ScottTodd requested changes May 29, 2025

View reviewed changes

ScottTodd mentioned this pull request May 29, 2025

[ci] Reorganizing artifact upload #730

Merged

ScottTodd mentioned this pull request Jun 13, 2025

[ci] Organizing TheRock build_linux_packages #782

Merged

erman-gurses closed this Mar 18, 2026

erman-gurses deleted the users/erman-gurses/upload-failed-logs-using-teatime branch March 27, 2026 20:42

Conversation

erman-gurses commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erman-gurses commented May 21, 2025

Uh oh!

marbre commented May 21, 2025

Uh oh!

erman-gurses commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erman-gurses commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ScottTodd commented May 21, 2025

Uh oh!

erman-gurses commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marbre commented May 21, 2025

Uh oh!

erman-gurses commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erman-gurses commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ScottTodd commented May 22, 2025 • edited by erman-gurses Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erman-gurses commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ScottTodd commented May 22, 2025

Uh oh!

erman-gurses commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erman-gurses commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erman-gurses commented May 28, 2025

Uh oh!

ScottTodd commented May 28, 2025

Uh oh!

erman-gurses commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ScottTodd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

erman-gurses commented May 17, 2025 •

edited

Loading

erman-gurses commented May 21, 2025 •

edited

Loading

erman-gurses commented May 21, 2025 •

edited

Loading

erman-gurses commented May 21, 2025 •

edited

Loading

erman-gurses commented May 21, 2025 •

edited

Loading

erman-gurses commented May 22, 2025 •

edited

Loading

ScottTodd commented May 22, 2025 •

edited by erman-gurses

Loading

erman-gurses commented May 22, 2025 •

edited

Loading

erman-gurses commented May 22, 2025 •

edited

Loading

erman-gurses commented May 27, 2025 •

edited

Loading

erman-gurses commented May 28, 2025 •

edited

Loading