Skip to content

Add log upload support for build target failed#587

Merged
erman-gurses merged 22 commits into
mainfrom
users/erman-gurses/upload-failes-logs-in-workflow
May 16, 2025
Merged

Add log upload support for build target failed#587
erman-gurses merged 22 commits into
mainfrom
users/erman-gurses/upload-failes-logs-in-workflow

Conversation

@erman-gurses
Copy link
Copy Markdown
Contributor

@erman-gurses erman-gurses commented May 9, 2025

This PR adds the log upload support when the build target failed within the workflow. It is not using teatime.py script

@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 9, 2025

The question is how would we test this? Maybe, I can purposely break purposely in the build and see how it behaves.

@erman-gurses
Copy link
Copy Markdown
Contributor Author

@erman-gurses erman-gurses requested a review from marbre May 10, 2025 02:49
Comment thread .github/workflows/build_linux_packages.yml Outdated
Comment on lines 147 to 150
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create an empty directory and then upload it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it does not exist? Isn't it possible?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there aren't any files to upload, don't upload any files?

Generally, think through the process here and what the state should be at each step:

  1. Job starts. Do we set up some metadata here and create a folder in S3 with that metadata now? Did a prior job already do that and are we just writing into an already established location?
  2. Job installs requirements, initializes caches, etc.
  3. Job starts building. Build logs start to get produced. Build artifacts start to get produced.
  4. Job finishes building. Artifacts and logs are uploaded.

What should happen if the build fails? Should we upload partial artifacts? What about logs?
What should happen if the job is cancelled? Should we upload partial artifacts? What about logs?

Notice that with log (and artifact) streaming, the answers change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen if the build fails? Should we upload partial artifacts? What about logs?
What should happen if the job is cancelled? Should we upload partial artifacts? What about logs?

  1. So if the build fails in the beginning, there would not be any log to upload - I assume the time even the build/logs does not exist
  2. If the build fails in the middle, there must be some log to upload so build/logs exists and upload whatever it exist for logs.
  3. If the job is canceled, in the middle of the building - we still should upload what we have as log.

Copy link
Copy Markdown
Contributor Author

@erman-gurses erman-gurses May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am searching the case 3 if it is possible - but @ScottTodd what is your opinion?

Copy link
Copy Markdown
Contributor Author

@erman-gurses erman-gurses May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from my small research, always() does work for the user triggered cancelations If I understand correctly.

https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/evaluate-expressions-in-workflows-and-actions#always

Causes the step to always execute, and returns true, even when canceled. The always expression is best used at the step level or on tasks that you expect to run even when a job is canceled. For example, you can use always to send logs even when a job is canceled.

Copy link
Copy Markdown
Contributor Author

@erman-gurses erman-gurses May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the canceling case - worked fine and uploaded the logs: https://github.com/ROCm/TheRock/actions/runs/14985473348/job/42098511841

Comment thread .github/workflows/build_linux_packages.yml Outdated
@erman-gurses erman-gurses requested a review from ScottTodd May 12, 2025 16:45
@erman-gurses erman-gurses force-pushed the users/erman-gurses/upload-failes-logs-in-workflow branch 2 times, most recently from 5452f72 to 1ec90cd Compare May 13, 2025 01:28
@erman-gurses
Copy link
Copy Markdown
Contributor Author

Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread .github/workflows/build_linux_packages.yml Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread .github/workflows/build_linux_packages.yml Outdated
@erman-gurses erman-gurses requested a review from marbre May 13, 2025 16:46
Copy link
Copy Markdown
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some issues that need to be addressed before this can be merged. You might want to ask @ScottTodd for approval if you need another review today.

Comment thread .github/workflows/build_linux_packages.yml Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread .github/workflows/build_linux_packages.yml Outdated
Comment thread .github/workflows/build_linux_packages.yml Outdated
Copy link
Copy Markdown
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a missing terminator thus the workflow probably fails with an bash syntax error. Furthermore, with #608 landed this now also needs to address Windows.

Comment thread .github/workflows/build_linux_packages.yml Outdated
Comment thread .github/workflows/build_linux_packages.yml Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread .github/workflows/build_linux_packages.yml Outdated
@erman-gurses erman-gurses force-pushed the users/erman-gurses/upload-failes-logs-in-workflow branch from 5c12c12 to 87b107d Compare May 15, 2025 00:49
@erman-gurses erman-gurses requested review from ScottTodd and marbre May 15, 2025 01:21
@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 15, 2025

@ScottTodd @marbre
The last test completed fine (after I addressed all of the comments) with proper log files:
https://github.com/ROCm/TheRock/actions/runs/15035331614

@marbre, currently, Build Windows Packages on the main branch does not complete successfully.
https://github.com/ROCm/TheRock/actions/runs/15035119910
If you want, I can raise another PR for Build Windows Packages after it has full functionality. What do you think?

@marbre
Copy link
Copy Markdown
Member

marbre commented May 15, 2025

@marbre, currently, Build Windows Packages on the main branch does not complete successfully. https://github.com/ROCm/TheRock/actions/runs/15035119910 If you want, I can raise another PR for Build Windows Packages after it has full functionality. What do you think?

It does succeed on the main branch on push, see https://github.com/ROCm/TheRock/actions/runs/15031073217/job/42243505016. However, there is obviously a permission issue when manually dispatching the workflow. This should be fixed with #622 but shouldn't block to add Windows specific adjustments to this PR anyway.

Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread build_tools/create_log_index.py Outdated
@erman-gurses erman-gurses force-pushed the users/erman-gurses/upload-failes-logs-in-workflow branch from 75f2fba to 5a16be1 Compare May 15, 2025 17:50
@ScottTodd
Copy link
Copy Markdown
Member

Aside: I'm seeing lots of formatting commits pushed individually. If you locally set up pre-commit (or the specific formatters manually), that will let you automatically format your commits before you push.

@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 15, 2025

Aside: I'm seeing lots of formatting commits pushed individually. If you locally set up pre-commit (or the specific formatters manually), that will let you automatically format your commits before you push.

Yes, I will figure out automatically call black formatting. locally set up pre-commit sounds more comprehensive - I can also try that - thanks for pointing out.

@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 15, 2025

@marbre, currently, Build Windows Packages on the main branch does not complete successfully. https://github.com/ROCm/TheRock/actions/runs/15035119910 If you want, I can raise another PR for Build Windows Packages after it has full functionality. What do you think?

It does succeed on the main branch on push, see https://github.com/ROCm/TheRock/actions/runs/15031073217/job/42243505016. However, there is obviously a permission issue when manually dispatching the workflow. This should be fixed with #622 but shouldn't block to add Windows specific adjustments to this PR anyway.

#622 helps a lot - making progress on Windows side.

@erman-gurses erman-gurses force-pushed the users/erman-gurses/upload-failes-logs-in-workflow branch 2 times, most recently from bf66621 to c1bc0a1 Compare May 15, 2025 23:10
@erman-gurses erman-gurses requested a review from ScottTodd May 16, 2025 02:52
@erman-gurses erman-gurses force-pushed the users/erman-gurses/upload-failes-logs-in-workflow branch from 6b64687 to eacea5b Compare May 16, 2025 17:53
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workflow changes LGTM. Just some Python style comments now.

Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
Comment thread build_tools/upload_logs_to_s3.py Outdated
@ScottTodd
Copy link
Copy Markdown
Member

When responding to review feedback, you can batch your commit pushes to avoid re-triggering the CI so many times. Each push starts several hours of build jobs. Pushes to cancel previous jobs, but CI time is not cheap.

@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 16, 2025

When responding to review feedback, you can batch your commit pushes to avoid re-triggering the CI so many times. Each push starts several hours of build jobs. Pushes to cancel previous jobs, but CI time is not cheap.

Sure thing I can do that.

@erman-gurses erman-gurses requested a review from ScottTodd May 16, 2025 20:02
@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented May 16, 2025

Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good enough to me now. A few lingering comments for future work.

Comment on lines +20 to +21
def normalize_path(p: Path) -> str:
return str(p).replace("\\", "/") if is_windows() else str(p)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably use as_posix() here instead of defining a new helper function, if the indexer script really needs a unix style path. Fine as-is for now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh - did not know that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of forking indexer.py I propose to rather implement a solution based on boto3 which can then be used in an AWS Lambda.

Comment thread build_tools/create_log_index.py
@erman-gurses erman-gurses merged commit c7190c1 into main May 16, 2025
5 checks passed
@erman-gurses erman-gurses deleted the users/erman-gurses/upload-failes-logs-in-workflow branch May 16, 2025 23:09
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Upload build logs even if a build target fails (not using teatime.py)

3 participants