feature: update wandb callback to add checkpoints #1

parambharat · 2023-01-05T15:34:45Z

What does this PR do?

The PR updates the WandbCallback with the following changes:

Adds on_save method to upload model checkpoints as artifacts.
Changes the default value of environment variable WANDB_WATCH from gradients to false. This enables quicker training when defaults are used. The user can easily change this behavior by setting the env variable.
Changes the WANDB_LOG_MODEL variable from bool to str allowing for different settings to upload artifacts.
Modifies the class dostring to reflect the above changes.
Fixes broken link to wandb documentation
Changes the wandb run_name from output_dir to wandb auto generated name. this avoids duplication of run names in wandb workspace

Before submitting

Did you read the contributor guideline,
Pull Request section?
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.

Examples

Example colab reflecting all the changes to the WandbCallback
Example Weights & Biases workspace with runs that show different settings.
Example Weights & Biases Artifact created for checkpoints

- add a more descriptive callback documentation. - update wandb document link - change env_var `WANDB_LOG_MODEL` docstring to reflect new changed values. - change env_var `WANDB_WATCH` docstring to reflect new defaults

The default value "gradients" slows down training. Advanced users can turn it on if required. Checking for `WANDB_WATCH != "false"` can be too wide. Restricting it to allowed values instead.

- Change env var `WANDB_LOG_MODEL` from `bool` to `str` - Change logic in `on_train_end` method to save model when `WANDB_LOG_MODEL` is `end` or `checkpoint` - Add logging info to inform user about time taken to save models and checkpoints - Add logic to upload model checkpoint artifacts and metadata in `on_save` method.

Currently, the run names are always the output_dir name if not specified by the user. This leads to duplicated run names in the wandb dashboard. Wandb auto generates run names. We can default to auto generated run names if the run name is the same as the output dir name.

- Update class docstrings - Add backward compatibility and deprecation warning for `WANDB_LOG_MODEL` env var - Change artifacts to have run name instead of model name

…gface#26681) * Draft version of new KV Caching This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks) / StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented in a third-party or in transformers directly * Address numerous PR suggestions 1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic. 2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls. 3. Remove __bool__ and __getitem__ magic as they're confusing. 4. past_key_values.update(key, value, idx) now returns key, value. 5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR. 6. Separate key_cache and value_cache. Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method. * Implement the SinkCache through backward+forward rotations * Integrate (Sink)Cache with Llama FA2 * Set use_legacy_cache=True as default, allows for test passes * Move from/to_legacy_cache to ...Model class * Undo unnecessary newline change * Remove copy utility from deprecated OpenLlama * Match import style * manual rebase with main * Cache class working with generate (#1) * Draft version of new KV Caching This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks) / StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented in a third-party or in transformers directly * Address numerous PR suggestions 1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic. 2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls. 3. Remove __bool__ and __getitem__ magic as they're confusing. 4. past_key_values.update(key, value, idx) now returns key, value. 5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR. 6. Separate key_cache and value_cache. Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method. * Integrate (Sink)Cache with Llama FA2 * Move from/to_legacy_cache to ...Model class * Undo unnecessary newline change * Match import style * working generate * Add tests; Simplify code; Apply changes to Mistral and Persimmon * fix rebase mess * a few more manual fixes * last manual fix * propagate changes to phi * upgrade test * add use_legacy_cache docstring; beef up tests * reintroduce unwanted deletes --------- Co-authored-by: Tom Aarsen <[email protected]> * move import * add default to model_kwargs.get('use_legacy_cache') * correct failing test * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> * apply PR suggestions * fix failing test * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Tom Aarsen <[email protected]> * PR comments * tmp commit * add docstrings * more tests, more docstrings, add to docs * derp * tmp commit * tmp dbg * more dbg * fix beam search bug * cache can be a list of tuples in some models * fix group beam search * all but sinkcache integration tests * fix sink cache and add hard integration test * now also compatible with input_embeds input * PR comments * add Cache support to Phi+FA2 * make fixup --------- Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Patrick von Platen <[email protected]>

* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (huggingface#3) * Make Fix (huggingface#5) * Pr fixes (huggingface#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (huggingface#8) * tokenizer test * format fix * Adding Docs and other minor changes (huggingface#7) * Add modeling tests (huggingface#9) * Smol Fix (huggingface#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (huggingface#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (huggingface#14) * Update chat templates to use the new API (huggingface#15) --------- Co-authored-by: ahmetustun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Matt <[email protected]>

Bharat Ramanathan added 8 commits January 4, 2023 12:55

docs: update docstrings to fix broken links and changes

7d082ff

- add a more descriptive callback documentation. - update wandb document link - change env_var `WANDB_LOG_MODEL` docstring to reflect new changed values. - change env_var `WANDB_WATCH` docstring to reflect new defaults

fix: change default WANDB_WATCH behavior to false.

549f22c

The default value "gradients" slows down training. Advanced users can turn it on if required. Checking for `WANDB_WATCH != "false"` can be too wide. Restricting it to allowed values instead.

format: remove unused formatted string from logging

4d32da6

format: fixup, linting and black formatting fixes

467285f

merge: update branch with upstream changes

dc1500c

fix: change checkpoint to artifact type model

45270ae

- Update class docstrings - Add backward compatibility and deprecation warning for `WANDB_LOG_MODEL` env var - Change artifacts to have run name instead of model name

parambharat closed this Oct 4, 2023

parambharat deleted the feature/update-wandb-callback branch January 10, 2024 11:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: update wandb callback to add checkpoints #1

feature: update wandb callback to add checkpoints #1

Uh oh!

parambharat commented Jan 5, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feature: update wandb callback to add checkpoints #1

feature: update wandb callback to add checkpoints #1

Uh oh!

Conversation

parambharat commented Jan 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Examples

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

parambharat commented Jan 5, 2023 •

edited

Loading