Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 3rdparty/Megatron-LM
Submodule Megatron-LM updated 134 files
143 changes: 143 additions & 0 deletions GIT_COMMIT_SKILLS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Git Commit Workflow

## Standard Workflow for Committing Changes

### 1. Check Current Branch
```bash
git branch --show-current
```

### 2. Create Feature Branch (REQUIRED if on main)
**Important**: You must create a feature branch before committing if you're on `main`. Never commit directly to `main`.

Check your current branch:
```bash
CURRENT_BRANCH=$(git branch --show-current)
if [ "$CURRENT_BRANCH" = "main" ]; then
git checkout -b feature/your-feature-name
fi
```

Or manually:
```bash
git checkout -b feature/your-feature-name
```

### 3. Stage Your Changes
```bash
git add <file1> <file2> ...
# Or stage all modified files:
git add -u
```

### 4. Run Pre-commit Hooks
```bash
export PATH="/Users/yuya/Library/Python/3.9/bin:$PATH"
pre-commit run
Comment on lines +35 to +36
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded user-specific PATH leaks a local username and won't work for other contributors.

Lines 35 and 79 both contain /Users/yuya/Library/Python/3.9/bin — this is a macOS-specific path tied to a single developer's machine. It won't work for anyone else and inadvertently exposes a username.

If pre-commit is installed properly (e.g., via uv or pip), it should already be on $PATH. Remove the export PATH=... line or replace it with a generic instruction.

Proposed fix
 ### 4. Run Pre-commit Hooks
 ```bash
-export PATH="/Users/yuya/Library/Python/3.9/bin:$PATH"
 pre-commit run

And similarly in the example section (line 79):
```diff
 # 4. Run pre-commit
-export PATH="/Users/yuya/Library/Python/3.9/bin:$PATH"
 pre-commit run
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export PATH="/Users/yuya/Library/Python/3.9/bin:$PATH"
pre-commit run
pre-commit run
🤖 Prompt for AI Agents
In `@GIT_COMMIT_SKILLS.md` around lines 35 - 36, Remove the hardcoded,
user-specific PATH export and replace it with a generic instruction: delete the
line containing export PATH="/Users/yuya/Library/Python/3.9/bin:$PATH" and
instead state that contributors should ensure pre-commit is installed and
available on their PATH (e.g., install via pip/pipx or add their own user bin to
PATH), leaving the pre-commit run line unchanged so examples use the generic
pre-commit run command.

```
This will run all pre-commit hooks (formatting, linting, etc.) before committing.

### 5. Commit with Sign-off
```bash
git commit -s -m "[module] type: Your descriptive commit message"
```
The `-s` flag adds a Signed-off-by line to the commit message. See [Commit and PR Title Format](#commit-and-pr-title-format) for proper formatting.

### 6. Push to Remote
```bash
git push
```

### 7. Check for Existing PR
```bash
gh pr list --head <your-branch-name> --json number --jq '.[0].number'
```

### 8. Trigger CI Testing
If a PR exists, comment on it to trigger CI:
```bash
COMMIT_HASH=$(git rev-parse HEAD)
gh pr comment <PR_NUMBER> --body "/ok to test $COMMIT_HASH"
```

## Example Workflow

```bash
# 1. Check branch
CURRENT_BRANCH=$(git branch --show-current)
echo "Current branch: $CURRENT_BRANCH"

# 2. Create feature branch if on main (REQUIRED)
if [ "$CURRENT_BRANCH" = "main" ]; then
git checkout -b feature/your-feature-name
fi

# 3. Stage changes
git add tests/unit_tests/models/gemma_vl/test_gemma3_vl_bridge.py

# 4. Run pre-commit
export PATH="/Users/yuya/Library/Python/3.9/bin:$PATH"
pre-commit run

# 5. Commit with sign-off
git commit -s -m "[test] fix: Fix gemma3_vl bridge test for image_token_id default"

# 6. Push
git push

# 7. Check for PR and trigger CI
PR_NUMBER=$(gh pr list --head feature/provider-bridge-refactor-3 --json number --jq '.[0].number')
COMMIT_HASH=$(git rev-parse HEAD)
gh pr comment $PR_NUMBER --body "/ok to test $COMMIT_HASH"
```

## Commit and PR Title Format

Format your commit messages and PR titles as:

```text
[{modules}] {type}: {description}
```

### Modules
Use the most relevant ones, separate multiple with `,`:
- `model` - Model implementations and bridges
- `recipe` - Training recipes
- `training` - Training loop and utilities
- `data` - Data loading and processing
- `ckpt` - Checkpoint conversion and saving
- `peft` - Parameter-efficient fine-tuning (LoRA, etc.)
- `perf` - Performance optimizations
- `ci` - CI/CD configuration
- `doc` - Documentation
- `test` - Tests
- `build` - Build system and dependencies
- `misc` - Other changes

### Types
- `feat` - New feature
- `fix` - Bug fix
- `refactor` - Code refactoring without changing functionality
- `chore` - Maintenance tasks
- `test` - Adding or updating tests

### Breaking Changes
If your PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title.

### Examples
```text
[model] feat: Add Qwen3 model bridge
[recipe, doc] feat: Add Llama 3.1 70B recipe with documentation
[ckpt] fix: Handle missing keys in HF checkpoint conversion
[BREAKING][training] refactor: Change optimizer config structure
[ci, build] chore: Update ruff version
[test] fix: Fix gemma3_vl bridge test for image_token_id default
```

## Notes

- **Never commit directly to `main`** - Always create a feature branch first
- Always run `pre-commit run` before committing to catch formatting/linting issues early
- Use descriptive commit messages following the format above
- The `-s` flag is required for DCO (Developer Certificate of Origin) compliance
- If pre-commit modifies files, you may need to stage them again before committing
Empty file added REMOTE_DEBUG_SKILLS.md
Empty file.
12 changes: 9 additions & 3 deletions examples/rl/rlhf_with_bridge.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,15 @@

Run (single GPU)
```bash
export CUDA_VISIBLE_DEVICES=0
python examples/rl/rlhf_with_bridge.py \
uv run python examples/rl/rlhf_with_bridge.py \
--hf-policy-model Qwen/Qwen3-0.6B \
--hf-reward-model distilbert-base-uncased-finetuned-sst-2-english \
--train-iters 5 --mbs 1 --gbs 1 --seq-length 256 --max-new-tokens 32
```

Run (multi-GPU)
```bash
torchrun --nproc_per_node=2 examples/rl/rlhf_with_bridge.py \
uv run python -m torch.distributed.run --nproc_per_node=2 examples/rl/rlhf_with_bridge.py \
--hf-policy-model Qwen/Qwen3-0.6B \
--hf-reward-model distilbert-base-uncased-finetuned-sst-2-english \
--train-iters 20 --mbs 1 --gbs 2 --seq-length 256 --max-new-tokens 32
Expand All @@ -62,6 +61,7 @@
import torch
import torch.nn.functional as F
from megatron.core.pipeline_parallel import get_forward_backward_func
from megatron.core.process_groups_config import ProcessGroupCollection
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

from megatron.bridge import AutoBridge
Expand Down Expand Up @@ -94,6 +94,7 @@ class Args:
global_batch_size: int
train_iters: int
seq_length: int
trust_remote_code: bool = False


def build_config(provider, args: Args) -> ConfigContainer:
Expand Down Expand Up @@ -233,6 +234,7 @@ def main() -> None:
global_batch_size=ns.gbs,
train_iters=ns.train_iters,
seq_length=ns.seq_length,
trust_remote_code=ns.trust_remote_code,
)

# Resolve per-rank device up front for multi-GPU runs
Expand Down Expand Up @@ -296,13 +298,17 @@ def main() -> None:
initialize_megatron(cfg=cfg)
set_jit_fusion_options(cfg.model, cfg.train.micro_batch_size)

# Get process group collection after initialization
pg_collection = ProcessGroupCollection.use_mpu_process_groups()

# Build model + optimizer + scheduler
model_list = get_model(
cfg.model,
cfg.ddp,
overlap_param_gather_with_optimizer_step=False,
use_torch_fsdp2=cfg.dist.use_torch_fsdp2,
data_parallel_random_init=cfg.rng.data_parallel_random_init,
pg_collection=pg_collection,
)
model = model_list[0]
optimizer, scheduler = setup_optimizer(
Expand Down
2 changes: 1 addition & 1 deletion tutorials/training/reduced_precision_training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@
"# - train_text_document.bin (document/sequence data)\n",
"# - train_text_document.idx (document/sequence metadata)\n",
"\n",
"MEGATRON_LM_PATH=/opt/megatron-lm/\n",
"MEGATRON_LM_PATH=/opt/Megatron-Bridge/3rdparty/Megatron-LM/\n",
"\n",
"echo \"Tokenizing training data...\"\n",
"python3 $MEGATRON_LM_PATH/tools/preprocess_data.py \\\n",
Expand Down
Loading