-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inspect training data without data indices #593
Conversation
os.environ["FS_LOCAL_RANK"] = "1" | ||
|
||
for step in steps: | ||
dataloader = build_train_dataloader(cfg, world_size=world_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would have to rebuild the indices file every time, right? That could be slow, but we could probably avoid rebuilding for every rank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting FS_LOCAL_RANK=1
avoids rebuilding the indices file every time since it's only down for local FS rank 0.
# Set FS_LOCAL_RANK to a non-zero number so that global data indices are not rewritten
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Could you just add a comment explaining that for future reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
os.environ["FS_LOCAL_RANK"] = "1" | ||
|
||
for step in steps: | ||
dataloader = build_train_dataloader(cfg, world_size=world_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Could you just add a comment explaining that for future reference?
This PR updates the
inspect_train_data.py
script to enable inspecting training data when the device data indices are not present. Our runs save these indices locally but not in remote storage. The implementation has the following advantages:The implementation is such that the script will default to the original behavior when data indices are present.