Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 24 additions & 4 deletions docs/training/callbacks.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,12 @@ Megatron Bridge provides a lightweight callback system for injecting custom logi
Subclass {py:class}`bridge.training.callbacks.Callback` and override event methods:

```python
import time

from megatron.bridge.training.callbacks import Callback
from megatron.bridge.training.gpt_step import forward_step
from megatron.bridge.training.pretrain import pretrain
from megatron.bridge.recipes.qwen import qwen25_500m_pretrain_config

class MyCallback(Callback):
def on_train_start(self, context):
Expand All @@ -25,8 +29,11 @@ class MyCallback(Callback):
elapsed = time.time() - context.user_state['start_time']
print(f"Training completed in {elapsed:.2f}s")

# Create a config that fits on a single GPU
config = qwen25_500m_pretrain_config()

# Pass callbacks to pretrain
pretrain(config, forward_step_func, callbacks=[MyCallback()])
pretrain(config, forward_step, callbacks=[MyCallback()])
```

### Functional Callbacks
Expand All @@ -35,7 +42,9 @@ Register functions directly with {py:class}`bridge.training.callbacks.CallbackMa

```python
from megatron.bridge.training.callbacks import CallbackManager
from megatron.bridge.training import pretrain
from megatron.bridge.training.gpt_step import forward_step
from megatron.bridge.training.pretrain import pretrain
from megatron.bridge.recipes.qwen import qwen25_500m_pretrain_config

def log_step(context):
step = context.state.train_state.step
Expand All @@ -45,20 +54,31 @@ def log_step(context):
callback_manager = CallbackManager()
callback_manager.register("on_train_step_end", log_step)

pretrain(config, forward_step_func, callbacks=callback_manager)
# Create a config that fits on a single GPU
config = qwen25_500m_pretrain_config()

pretrain(config, forward_step, callbacks=callback_manager)
```

### Mixing Both Patterns

Both registration patterns can be combined:

```python
from megatron.bridge.training.callbacks import CallbackManager
from megatron.bridge.training.gpt_step import forward_step
from megatron.bridge.training.pretrain import pretrain
from megatron.bridge.recipes.qwen import qwen25_500m_pretrain_config

manager = CallbackManager()
manager.add(MyCallback())
manager.add([TimingCallback(), MetricsCallback()])
manager.register("on_eval_end", lambda ctx: print("Evaluation complete!"))

pretrain(config, forward_step_func, callbacks=manager)
# Create a config that fits on a single GPU
config = qwen25_500m_pretrain_config()

pretrain(config, forward_step, callbacks=manager)
```
Comment on lines 67 to 82
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Mixing example still isn’t copy‑paste runnable.

This snippet references MyCallback, TimingCallback, and MetricsCallback without definitions/imports in the block. Given the PR objective (“directly runnable when copy/pasted”), please either add minimal class definitions in this block or replace with fully in‑block example callbacks.

🤖 Prompt for AI Agents
In `@docs/training/callbacks.md` around lines 67 - 82, The example uses undefined
callbacks (MyCallback, TimingCallback, MetricsCallback) which breaks copy‑paste;
update the code block to include minimal in‑block callback definitions or
replace their usages with simple inline callbacks that implement the expected
interface used by CallbackManager (e.g., define a small class MyCallback with
the relevant hook methods or add lambdas/anonymous functions) so the snippet
from CallbackManager, manager.register(...), and pretrain(config, forward_step,
callbacks=manager) is fully runnable without external imports.


## Available Events
Expand Down
Loading