Skip to content

Add log upload support for build target failed using teatime.py script#648

Closed
erman-gurses wants to merge 1 commit into
mainfrom
users/erman-gurses/upload-failed-logs-using-teatime
Closed

Add log upload support for build target failed using teatime.py script#648
erman-gurses wants to merge 1 commit into
mainfrom
users/erman-gurses/upload-failed-logs-using-teatime

Conversation

@erman-gurses
Copy link
Copy Markdown
Contributor

@erman-gurses erman-gurses commented May 17, 2025

This PR integrates logging mechanism with teatime.py script.
Also handles the TODO below:
# TODO: Fork indexer.py locally to avoid relying on an external GitHub source at runtime.

@erman-gurses erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch 3 times, most recently from 2ff3bc3 to 26f7875 Compare May 20, 2025 17:53
@erman-gurses
Copy link
Copy Markdown
Contributor Author

@erman-gurses erman-gurses requested review from ScottTodd and marbre May 21, 2025 17:49
@erman-gurses erman-gurses marked this pull request as ready for review May 21, 2025 17:52
@marbre
Copy link
Copy Markdown
Member

marbre commented May 21, 2025

Is this ready to review or do you just want initial feedback as this is still marked as a draft. Regarding the TODO in the PR description: I would rather replace indexer.py with an AWS Lambda and use boto3.

@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 21, 2025

Is this ready to review or do you just want initial feedback as this is still marked as a draft. Regarding the TODO in the PR description: I would rather replace indexer.py with an AWS Lambda and use boto3.

It is ready to review - I just make it ready PR from WIP - I can test with boto3.

@erman-gurses erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch 2 times, most recently from da48855 to 920bed8 Compare May 21, 2025 19:59
@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 21, 2025

Is this ready to review or do you just want initial feedback as this is still marked as a draft. Regarding the TODO in the PR description: I would rather replace indexer.py with an AWS Lambda and use boto3.

It is ready to review - I just make it ready PR from WIP - I can test with boto3.

@marbre @ScottTodd boto3 gives an error on Windows side:

Traceback (most recent call last):
  File "C:\home\runner\_work\TheRock\TheRock\build_tools\create_log_index.py", line 6, in <module>
    import boto3
ModuleNotFoundError: No module named 'boto3'
[WARN] create_log_index.py failed: Command '['C:\\home\\runner\\_work\\_tool\\Python\\3.12.10\\x64\\python3.exe', 'C:\\home\\runner\\_work\\TheRock\\TheRock\\build_tools\\create_log_index.py', '--build-dir', 'B:\\tmpbuild', '--amdgpu-family', 'gfx110X-dgpu']' returned non-zero exit status 1.

@ScottTodd
Copy link
Copy Markdown
Member

Is the workflow installing the package? It could be either explicitly installed with pip install boto3 or we could include it in a requirements.txt file.

@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 21, 2025

Is the workflow installing the package? It could be either explicitly installed with pip install boto3 or we could include it in a requirements.txt file.

Sounds good! I can install it on the workflow.

@marbre
Copy link
Copy Markdown
Member

marbre commented May 21, 2025

Is this ready to review or do you just want initial feedback as this is still marked as a draft. Regarding the TODO in the PR description: I would rather replace indexer.py with an AWS Lambda and use boto3.

It is ready to review - I just make it ready PR from WIP - I can test with boto3.

Taking a quick look: Let's not solve too many problems at once! For now I would replace uploading files via AWS cli by using boto3. The AWS Lambda is a different thing and will replace indexer.py but that should be done in a separate PR. What I was saying is that the TODO note is probably not what we aim for but as said that's something for a follow up.

@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 21, 2025

Is this ready to review or do you just want initial feedback as this is still marked as a draft. Regarding the TODO in the PR description: I would rather replace indexer.py with an AWS Lambda and use boto3.

It is ready to review - I just make it ready PR from WIP - I can test with boto3.

Taking a quick look: Let's not solve too many problems at once! For now I would replace uploading files via AWS cli by using boto3. The AWS Lambda is a different thing and will replace indexer.py but that should be done in a separate PR. What I was saying is that the TODO note is probably not what we aim for but as said that's something for a follow up.

Sounds good to me! I can raise another PR for AWS Lambda. So I am using the indexer as it was and just updating AWS cli by using boto3.

@erman-gurses erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch from 8ddb393 to 0d2ed8a Compare May 22, 2025 00:34
@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 22, 2025

Is this ready to review or do you just want initial feedback as this is still marked as a draft. Regarding the TODO in the PR description: I would rather replace indexer.py with an AWS Lambda and use boto3.

It is ready to review - I just make it ready PR from WIP - I can test with boto3.

Taking a quick look: Let's not solve too many problems at once! For now I would replace uploading files via AWS cli by using boto3. The AWS Lambda is a different thing and will replace indexer.py but that should be done in a separate PR. What I was saying is that the TODO note is probably not what we aim for but as said that's something for a follow up.

Sounds good to me! I can raise another PR for AWS Lambda. So I am using the indexer as it was and just updating AWS cli by using boto3.

The last tests using boto3:
Linux: https://github.com/ROCm/TheRock/actions/runs/15177259483
Windows: https://github.com/ROCm/TheRock/actions/runs/15177261545

Comment thread .github/workflows/build_linux_packages.yml Outdated
Comment thread build_tools/create_log_index.py Outdated
Comment thread build_tools/create_log_index.py Outdated
Comment thread build_tools/teatime.py Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread build_tools/teatime.py Outdated
Comment thread build_tools/teatime.py
@erman-gurses erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch from 00b136b to 4ef02aa Compare May 22, 2025 16:42
@ScottTodd
Copy link
Copy Markdown
Member

ScottTodd commented May 22, 2025

@erman-gurses if you want a code review, please batch your changes and comments more or make the PR smaller. I can't follow the stream of commits and multiple review threads.

If you are still working on this, please keep the changes local or convert to a draft PR.

@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 22, 2025

@erman-gurses if you want a code review, please batch your changes and comments more or make the PR smaller. I can't follow the stream of commits and multiple review threads.

If you are still working on this, please keep the changes local or convert to a draft PR.

@ScottTodd you are so right - will be more careful for that. My changes are done for now to address Marius's comments.

@ScottTodd
Copy link
Copy Markdown
Member

(Where by "If you are still working on this," I mean "if this is not yet ready for review")

@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 22, 2025

(Where by "If you are still working on this," I mean "if this is not yet ready for review")

Tried to say I addressed all comments from Marius and would like to take additional comments if it is needed.
(Needed to fix one more thing - INDEXER_PATH)

@erman-gurses erman-gurses requested a review from marbre May 22, 2025 21:38
@erman-gurses erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch 3 times, most recently from bda1285 to d53aa90 Compare May 27, 2025 07:05
@erman-gurses erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch from d53aa90 to 06ebfad Compare May 27, 2025 07:12
@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 27, 2025

Comment thread .github/workflows/build_linux_packages.yml Outdated
Comment thread build_tools/create_log_index.py Outdated
Comment thread build_tools/teatime.py Outdated
Comment thread build_tools/teatime.py Outdated
Comment thread build_tools/create_log_index.py Outdated
Comment thread build_tools/teatime.py Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread build_tools/teatime.py Outdated
Comment thread build_tools/teatime.py Outdated
@erman-gurses erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch 3 times, most recently from 547d2da to c590e63 Compare May 28, 2025 19:32
@erman-gurses erman-gurses force-pushed the users/erman-gurses/upload-failed-logs-using-teatime branch from 79f8994 to e2934dc Compare May 28, 2025 21:27
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage May 28, 2025
@erman-gurses erman-gurses reopened this May 28, 2025
@erman-gurses erman-gurses requested a review from ScottTodd May 28, 2025 21:39
@erman-gurses
Copy link
Copy Markdown
Contributor Author

@ScottTodd, regarding this PR - any other comments that I can address?

@ScottTodd
Copy link
Copy Markdown
Member

@ScottTodd, regarding this PR - any other comments that I can address?

It's in my queue. Patience :)

@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 28, 2025

@ScottTodd, regarding this PR - any other comments that I can address?

It's in my queue. Patience :)

Thanks @ScottTodd - forgot how busy you are

Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure about the architecture here. Probably want more feedback from Marius when he gets back too.

Comment thread build_tools/teatime.py
file=sys.stderr,
)

self.log_dir = os.getenv("LOG_DIR", "build/logs")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a single source of truth for each configuration option. The self.log_path and self.log_file values are already defined above and set via the file argument. There should be no need to also specify an environment variable.

Where this gets tricky is if the logs are nested and self.log_path.parent is not actually the root that should be indexed and uploaded.

Comment thread build_tools/teatime.py
Comment on lines +108 to +120
self.s3_bucket = os.getenv("TEATIME_S3_BUCKET")
if not self.s3_bucket and self.upload_to_s3:
print(
"warning: TEATIME_S3_BUCKET is not set (S3 upload will likely fail)",
file=sys.stderr,
)

self.s3_subdir = os.getenv("TEATIME_S3_SUBDIR")
if not self.s3_subdir and self.upload_to_s3:
print(
"warning: TEATIME_S3_SUBDIR is not set (S3 upload will likely fail)",
file=sys.stderr,
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be errors (sys.exit), not warnings.

Comment thread build_tools/teatime.py
Comment on lines +147 to +159
# Call coordinate_index_and_logs if indexing is enabled.
if (
not self.skip_index
and self.upload_to_s3
and self.s3_bucket
and self.log_dir
):
self.coordinate_index_and_logs()

def coordinate_index_and_logs(self):
# Determine paths
log_dir = self.log_dir
amdgpu_family = os.getenv("AMDGPU_FAMILIES")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally convinced that this indexing code should be in teatime.py. Also, as mentioned before, the logging script should not need to know low level details about individual build options like AMDGPU_FAMILIES.

There are two scenarios to design for:

  1. CI usage, where we run a controlled sequence of commands (install deps, configure credentials, load from cache, build, report statistics, upload results, trigger other jobs)
  2. Developer usage, where a developer just runs a CMake build.

The goal with this sort of integration work is to make it easier for both scenarios to upload logs to the cloud. At the moment, teatime.py is only responsible for creating individual log files. The cloud logging also has the wrinkle of having a generated static index page for the logs directory. What we really should have IMO is a server with dynamic index pages or some server-side processing of the uploaded logs, instead of pushing that responsibility on the client here.

We could look into having a CMake POST_BUILD (https://cmake.org/cmake/help/latest/command/add_custom_command.html) command that generates and uploads the index page after building therock-archives... or we could add a new helper target like therock-logs-upload. Those would integrate into the build system itself at some "finalization" point so developers could get their cmake build command to automatically upload in a similar way to how Bazel/Blaze can upload to resultstore/sponge, but that would still be only working around the fact that these index pages are statically generated.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatting with @marbre . We can move the index generation to an AWS lambda so whenever logs are uploaded to the bucket the script will run server-side. There are other ways to solve that it in the cloud, but the general approach is that we shouldn't need to generate index pages explicitly on the client side at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, so no index generation in the workflow.

Comment thread build_tools/teatime.py
Comment on lines +295 to +297
p.add_argument(
"--skip-index", action="store_true", help="Skip indexing and uploading logs"
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see value in having this argument in teatime.py. It's a big mode switch to have cmake/therock_subproject.cmake run one command and then .github/workflows/build_*_packages.yml run another command. At that point just have the workflow files run the index generator and upload script directly.

If keeping it, flipping the condition from opt-out to opt-in would make more sense to me. That way, running the script on a single file would not generate an index page.

Comment thread build_tools/teatime.py
indexing/upload scripts.
* AMDGPU_FAMILIES: Required by `create_log_index.py` to organize logs.

CI systems can set `TEATIME_LABEL_GH_GROUP=1` in the environment, which will
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All environment variables should be explained in the same section

Comment on lines +24 to +36
def get_indexer_path() -> Path:
"""
Resolve to the local third-party/indexer/indexer.py copy.
"""
indexer_path = (
Path(__file__).resolve().parent.parent / "third-party/indexer/indexer.py"
)
if indexer_path.is_file():
log(f"[INFO] Using bundled indexer.py: {indexer_path}")
return indexer_path
else:
log(f"[ERROR] Bundled indexer.py not found at: {indexer_path}")
sys.exit(2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else clause should never be hit. I would drop the error handling here.

Comment on lines +105 to +107
- name: Configure AWS Credentials
if: always()
uses: aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722 # v4.1.0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geomin12 we're getting a lot of these if: always() and if: ${{ !cancelled() }} conditions in these workflows. Let's see if we can restructure the flow a bit through migrating to more scripts such that the flow is easier to understand.

Here in particular, if fetching sources fails, we'd still configure AWS credentials and try to upload logs, but we wouldn't try to build. In that case I would prefer for the workflow to fail fast and not continue to try cleanup steps that will fail.

Comment thread build_tools/teatime.py
Comment on lines +190 to +218
# Step 2: S3 upload using --bucket and --subdir
try:
upload_script = (
Path(__file__).resolve().parent.parent
/ "build_tools"
/ "upload_logs_to_s3.py"
)
print(f"[TEATIME] Uploading logs to s3://{self.s3_bucket}/{self.s3_subdir}")
subprocess.run(
[
sys.executable,
str(upload_script),
"--log-dir",
str(log_dir),
"--bucket",
self.s3_bucket,
"--subdir",
self.s3_subdir,
],
check=True,
)
except subprocess.CalledProcessError as e:
print(f"[WARN] Log upload failed: {e}", file=sys.stderr)
if os.getenv("TEATIME_FAIL_ON_UPLOAD_ERROR") == "1":
raise
except Exception as e:
print(f"[WARN] Unexpected error during log upload: {e}", file=sys.stderr)
if os.getenv("TEATIME_FAIL_ON_UPLOAD_ERROR") == "1":
raise
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to subprocess.run Python scripts we control from other Python scripts we control. Could switch to a Python library model where we import the file and call a function directly.

@erman-gurses
Copy link
Copy Markdown
Contributor Author

Closing this since outdated - will be open if there is any remaining development is needed.

@erman-gurses erman-gurses deleted the users/erman-gurses/upload-failed-logs-using-teatime branch March 27, 2026 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants