-
Notifications
You must be signed in to change notification settings - Fork 7k
[core] Add .rayignore #58500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Add .rayignore #58500
Conversation
Signed-off-by: iamjustinhsu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the .rayignore file, providing a dedicated way to specify files to be excluded from uploads to a Ray cluster, separate from .gitignore. The implementation correctly handles the new .rayignore file, respects the RAY_RUNTIME_ENV_IGNORE_GITIGNORE environment variable, and maintains backward compatibility by considering both files by default. The changes are well-tested. I've found a couple of minor issues: a misleading docstring and a typo in a log message. Overall, this is a great addition that improves the usability of runtime environments.
Signed-off-by: iamjustinhsu <[email protected]>
edoakes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for picking this up!
If possible, let's add a log statement to the effect of: "loading exclude paths from .gitignore and .rayignore", but modify the message so it tells you that it's reading from one/the other/both
| and returns True if the path should be excluded based on the ignore | ||
| patterns in the respective ignore file. | ||
| """ | ||
| ignore_gitignore = os.environ.get(RAY_RUNTIME_ENV_IGNORE_GITIGNORE, "0") == "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I generally prefer to try to follow a more functional style and keep external dependencies (like env vars) higher in the call stack. in this case, that means adding an explicit, required argument to this function, like include_gitignore and then populating that argument based on the environment variable at the callsite.
this makes it easier to test, makes the code more self-documenting, and reduces "surprise" for future developers, who may not realize that something deep in the call stack is controlled by an environment variable
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Update test for new function signature
The test calls _dir_travel(root, [exclude_spec], handler) without passing the required include_gitignore parameter. The function signature was updated to require this parameter, but this test call wasn't updated. The test should pass include_gitignore=not ignore_gitignore based on the test's setup.
python/ray/tests/test_runtime_env_packaging.py#L1035-L1036
ray/python/ray/tests/test_runtime_env_packaging.py
Lines 1035 to 1036 in 8b1e060
| _dir_travel(root, [exclude_spec], handler) |
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
|
@ray-project/ray-docs do u think u can take a look at my changes? |
bveeramani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc changes LGTM.
| Note: Setting a local directory per-task or per-actor is currently unsupported; it can only be set per-job (i.e., in ``ray.init()``). | ||
|
|
||
| Note: If the local directory contains a ``.gitignore`` file, the files and paths specified there are not uploaded to the cluster. You can disable this by setting the environment variable `RAY_RUNTIME_ENV_IGNORE_GITIGNORE=1` on the machine doing the uploading. | ||
| Note: By default, if the local directory contains a ``.gitignore`` and/or ``.rayignore`` file, the files and paths specified in both will not be uploaded to the cluster. To disable the ``.gitignore`` from being considered, set ``RAY_RUNTIME_ENV_IGNORE_GITIGNORE=1`` on the machine doing the uploading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: In accordance with technical writing style guide:
- Active tense
- Avoiding future tense
- Contractions
Not sure if "Ray" is the right subject in "Ray doesn't upload". Might need to check me on that.
| Note: By default, if the local directory contains a ``.gitignore`` and/or ``.rayignore`` file, the files and paths specified in both will not be uploaded to the cluster. To disable the ``.gitignore`` from being considered, set ``RAY_RUNTIME_ENV_IGNORE_GITIGNORE=1`` on the machine doing the uploading. | |
| Note: By default, if the local directory contains a ``.gitignore`` and/or ``.rayignore`` file, Ray doesn't upload the specified files to the cluster. To disable the ``.gitignore`` from being considered, set ``RAY_RUNTIME_ENV_IGNORE_GITIGNORE=1`` on the machine doing the uploading. |
| Note: For option (1), by default, if the local directory contains a ``.gitignore`` and/or ``.rayignore`` file, the files and paths specified in both will not be uploaded to the cluster. To disable the ``.gitignore`` from being considered, set ``RAY_RUNTIME_ENV_IGNORE_GITIGNORE=1`` on the machine doing the uploading. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion here
Signed-off-by: iamjustinhsu <[email protected]>
## Description ### Status Quo Previously, `.gitignore` files handled both uploading to cluster _and_ uploading to github. This PR essentially allows the ability to break those 2 functionalities apart by creating a `.rayignore` file which will handle uploading to cluster. ### Purpose Any path or file specified in `.rayignore` will be ignored when uploading to the cluster. This is useful for local development when you don't want random files being uploaded and taking up space. ### How it works By default, directories containing both `.gitignore` and `.rayignore` will both be considered (so existing behavior is preserved). To make `.gitignore` only ignore files uploaded to github, and `.rayignore` only ignore files uploaded to cluster (essentially making them independent of each other), you can use the existing `RAY_RUNTIME_ENV_IGNORE_GITIGNORE` and set that to `1` ## Related issues ray-project#53648 ## Additional information Since `.rayignore` is part of the ray ecosystem, I did not create an env var to disable ignoring all-together. If users do not want to ignore files, they can leave `.rayignore` empty, or not create the file at all. --------- Signed-off-by: iamjustinhsu <[email protected]>
## Description ### Status Quo Previously, `.gitignore` files handled both uploading to cluster _and_ uploading to github. This PR essentially allows the ability to break those 2 functionalities apart by creating a `.rayignore` file which will handle uploading to cluster. ### Purpose Any path or file specified in `.rayignore` will be ignored when uploading to the cluster. This is useful for local development when you don't want random files being uploaded and taking up space. ### How it works By default, directories containing both `.gitignore` and `.rayignore` will both be considered (so existing behavior is preserved). To make `.gitignore` only ignore files uploaded to github, and `.rayignore` only ignore files uploaded to cluster (essentially making them independent of each other), you can use the existing `RAY_RUNTIME_ENV_IGNORE_GITIGNORE` and set that to `1` ## Related issues ray-project#53648 ## Additional information Since `.rayignore` is part of the ray ecosystem, I did not create an env var to disable ignoring all-together. If users do not want to ignore files, they can leave `.rayignore` empty, or not create the file at all. --------- Signed-off-by: iamjustinhsu <[email protected]> Signed-off-by: YK <[email protected]>
## Description ### Status Quo Previously, `.gitignore` files handled both uploading to cluster _and_ uploading to github. This PR essentially allows the ability to break those 2 functionalities apart by creating a `.rayignore` file which will handle uploading to cluster. ### Purpose Any path or file specified in `.rayignore` will be ignored when uploading to the cluster. This is useful for local development when you don't want random files being uploaded and taking up space. ### How it works By default, directories containing both `.gitignore` and `.rayignore` will both be considered (so existing behavior is preserved). To make `.gitignore` only ignore files uploaded to github, and `.rayignore` only ignore files uploaded to cluster (essentially making them independent of each other), you can use the existing `RAY_RUNTIME_ENV_IGNORE_GITIGNORE` and set that to `1` ## Related issues ray-project#53648 ## Additional information Since `.rayignore` is part of the ray ecosystem, I did not create an env var to disable ignoring all-together. If users do not want to ignore files, they can leave `.rayignore` empty, or not create the file at all. --------- Signed-off-by: iamjustinhsu <[email protected]>
Description
Status Quo
Previously,
.gitignorefiles handled both uploading to cluster and uploading to github. This PR essentially allows the ability to break those 2 functionalities apart by creating a.rayignorefile which will handle uploading to cluster.Purpose
Any path or file specified in
.rayignorewill be ignored when uploading to the cluster. This is useful for local development when you don't want random files being uploaded and taking up space.How it works
By default, directories containing both
.gitignoreand.rayignorewill both be considered (so existing behavior is preserved). To make.gitignoreonly ignore files uploaded to github, and.rayignoreonly ignore files uploaded to cluster (essentially making them independent of each other), you can use the existingRAY_RUNTIME_ENV_IGNORE_GITIGNOREand set that to1Related issues
#53648
Additional information
Since
.rayignoreis part of the ray ecosystem, I did not create an env var to disable ignoring all-together. If users do not want to ignore files, they can leave.rayignoreempty, or not create the file at all.