-
Couldn't load subscription status.
- Fork 45
Metric Logging updates 4/N - better actor name #351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric Logging updates 4/N - better actor name #351
Conversation
…estamp_logging_diff2
…estamp_logging_diff3
|
could you add some more comments in the description about what this PR is enabling? |
yes! sorry. I should have marked as a draft. I am doing a 2.5/4.0 before i ask you to review it |
| self.timestamp = datetime.now(pytz.UTC).timestamp() | ||
|
|
||
|
|
||
| def get_actor_name_with_rank() -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to observability/utils.py
| if process_name is None: | ||
| process_name = detect_actor_name_from_call_stack() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get name here and pass it to
local_fetcher_actor = proc.spawn(
"local_fetcher_actor", LocalFetcherActor, global_logger, process_name
)
this function is called in provisioner.py, and thats how we get the process_name for every wandb run
src/forge/observability/utils.py
Outdated
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def detect_actor_name_from_call_stack() -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main file to review
|
in the near future the mesh might hold a name. When this happens, we can delete the function to use the call stack and just get it from the mesh. The rest of the PR stands. |
…estamp_logging_diff3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few small comments but stamping to unblock
| # Spawn services first (triggers registrations via provisioner hook) | ||
| trainer = await TrainActor.options(**service_config).as_service() | ||
| generator = await GeneratorActor.options(**service_config).as_service() | ||
| trainer = await TrainActor.options( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb q but what are we doing with this toy_rl stuff anyways? (Like is there some reason we're still keeping it around?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i still find it useful, e.g. i dont need to have gpus available to test it. Eventually this should become an integration test.
src/forge/observability/metrics.py
Outdated
| Timestamp is automatically set to current EST time if not provided. | ||
| Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: give an actual docstring here (otherwise i can just read this info 5 lines below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
| if process_name is None: | ||
| ctx = context() | ||
| process_name = ctx.actor_instance.actor_id.actor_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a small thing but it's not immediately clear why we do this here vs get_proc_name_with_rank in other places. (after looking at the code i think it's a global vs local thing, but imo this could be more clearly documented)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_proc_name_with_rank returns it with replica id and rank. Here i just want the process name, so i would have to parse it :/
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #351 +/- ##
==========================================
- Coverage 73.68% 65.14% -8.54%
==========================================
Files 81 82 +1
Lines 7729 7850 +121
==========================================
- Hits 5695 5114 -581
- Misses 2034 2736 +702 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Felipe Mello <[email protected]>
)" This reverts commit 1f45470.
…#351)" (meta-pytorch#429) This reverts commit 633b219.
…#351)" (meta-pytorch#429) This reverts commit 633b219.
Co-authored-by: Felipe Mello <[email protected]>
Co-authored-by: Felipe Mello <[email protected]>
When logging per rank, we need good naming to make it easier to debug. Now provisioner.py::get_proc_mesh provides a mesh_name (#378), so i am just using it.
User can also pass a process name as input. Thats how we get the
Controllername.The process_name then goes:
LocalFetcherActor->MetricCollector->backend.init-->wandb.init(name)