Skip to content

Add indexer.py under third-party/indexer/#717

Merged
erman-gurses merged 6 commits into
mainfrom
users/erman-gurses/add-indexer
May 27, 2025
Merged

Add indexer.py under third-party/indexer/#717
erman-gurses merged 6 commits into
mainfrom
users/erman-gurses/add-indexer

Conversation

@erman-gurses
Copy link
Copy Markdown
Contributor

@erman-gurses erman-gurses commented May 27, 2025

Adding the indexer from https://github.com/joshbrunty/Indexer/ under third-party/indexer/, sourced from commit joshbrunty/Indexer@6d8cbfd.

@erman-gurses
Copy link
Copy Markdown
Contributor Author

First time handling third-party. Is this way we want?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First time handling third-party. Is this way we want?

Close. I would also add a README.md and a LICENSE file... which would be the full contents of https://github.com/joshbrunty/Indexer/. Here's an example third_party folder in another project: https://github.com/iree-org/iree/tree/main/third_party/nccl . Could also directly use a git submodule, if we don't need to carry local patches.

@stellaraccident may have other suggestions too. For context, we've been downloading this file then running it CI workflows to generate logs index pages:

# TODO: Move to script
- name: Create Index Files
if: always()
run: |
curl --silent --fail --show-error --location \
https://raw.githubusercontent.com/joshbrunty/Indexer/6d8cbfd15d3853b482e6a49f2d875ded9188b721/indexer.py \
--output build/indexer.py
python build/indexer.py -f '*.tar.xz*' build/artifacts/

I've been suggesting that we pull the file in directly and ideally fork it so it does exactly what we want: #648 (comment) . @marbre has also proposed a new implementation: #587 (comment)

Instead of forking indexer.py I propose to rather implement a solution based on boto3 which can then be used in an AWS Lambda.

I want to at least stop downloading something only in a CI workflow, so scripts work locally without extra setup. Refactoring/replacing can come later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info. Added as submodule.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does fetch_sources.py automatically pull that submodule down, or does it need an explicit git submodule update --init? I'm leaning towards forking the files instead of a submodule given that our other submodule deps are using by the source build, while this is an optional side script... could be convinced either way though. Regardless, the longer term solutions are to drop our dependency on this entirely. It's just 400 lines of Python and we will want to heavily change its behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked fetch_sources.py did not fetch it. I can go for the fork option. Will add manually the code , license, and README.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the indexer from https://raw.githubusercontent.com/joshbrunty/Indexer/6d8cbfd15d3853b482e6a49f2d875ded9188b721/indexer.py under third-party/indexer/ for TheRock project.

I would just say https://github.com/joshbrunty/Indexer/. That's a URL for humans, unlike the raw.githubusercontent.com URL. You can specify the commit hash also (joshbrunty/Indexer@6d8cbfd) if you want to be exact.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Made a few tweaks to the PR description myself.

@erman-gurses erman-gurses requested a review from ScottTodd May 27, 2025 21:06
@erman-gurses erman-gurses force-pushed the users/erman-gurses/add-indexer branch 2 times, most recently from 0e323d7 to a7d0b06 Compare May 27, 2025 21:22
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option for this README file is to replace it with a link to the upstream https://github.com/joshbrunty/Indexer/. I don't feel too strongly either way.

# Indexer

This is a fork of https://github.com/joshbrunty/Indexer/, a Python script used to generate
an .html index of file within a selected directory.

## Local modifications

None yet.

Copy link
Copy Markdown
Contributor Author

@erman-gurses erman-gurses May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. For me, either the same as well, I can keep it as is for now. If we want, I can update it.

@erman-gurses erman-gurses merged commit e9539fa into main May 27, 2025
5 checks passed
@erman-gurses erman-gurses deleted the users/erman-gurses/add-indexer branch May 27, 2025 21:35
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage May 27, 2025
@erman-gurses erman-gurses restored the users/erman-gurses/add-indexer branch May 27, 2025 22:42
@erman-gurses erman-gurses deleted the users/erman-gurses/add-indexer branch May 27, 2025 22:43
@marbre
Copy link
Copy Markdown
Member

marbre commented Jun 2, 2025

Instead of forking indexer.py, we also have https://github.com/ROCm/TheRock/blob/main/build_tools/packaging/python/generate_release_index.py which works different in the sense that it doesn't index local files but rather indexes a subdirectory in an S3 bucket. indexer.py in contrast requires that all files to index are locally available which is not the case in all of our use cases.

Furthermore, I am not sure if third-party is the best location as the dependencies in third-party are for building TheRock whereas indexer.py is only used in CI (if that hasn't changed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants