Skip to content

Add unit test to verify target_modules defaults correctly#281

Merged
anhuong merged 7 commits intofoundation-model-stack:mainfrom
willmj:1143-test
Aug 6, 2024
Merged

Add unit test to verify target_modules defaults correctly#281
anhuong merged 7 commits intofoundation-model-stack:mainfrom
willmj:1143-test

Conversation

@willmj
Copy link
Collaborator

@willmj willmj commented Aug 2, 2024

Description of the change

Add unit test for recent change (PR #269) to verify that HF library correctly changes target_modules to the default values.

Related issue number

How to verify the PR

Run unit tests.

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Copy link
Collaborator

@aluu317 aluu317 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @willmj .
This is helpful!

  • I would argue this test isn't just to test a launch script functionality, but rather should exist in test_sft_trainer. It is more about sft_trainer.
  • I would also suggest to add another test in test_config_utils.py to test that when nothing is set in for target_modules by our local peft_config.LoraConfig, HF peft.LoraConfig still has target_modules None set (and not any other value like we set before).
  • comment in line to be more specific/readable for future maintenance.

@willmj
Copy link
Collaborator Author

willmj commented Aug 2, 2024

It seems like the second test you mentioned to add is covered already in test_get_hf_peft_config_returns_lora_config_correctly(). Let me know if there's any additional functionality there that needs testing.

    assert (
        config.target_modules is None
    )  # default value from local peft_config.LoraConfig

@aluu317
Copy link
Collaborator

aluu317 commented Aug 5, 2024

It seems like the second test you mentioned to add is covered already in test_get_hf_peft_config_returns_lora_config_correctly(). Let me know if there's any additional functionality there that needs testing.

    assert (
        config.target_modules is None
    )  # default value from local peft_config.LoraConfig

Yup, this is fine. Thank you!

willmj added 3 commits August 6, 2024 10:39
…or LoRA when set to None from CLI

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
aluu317
aluu317 previously approved these changes Aug 6, 2024
Copy link
Collaborator

@aluu317 aluu317 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good thanks @willmj . I approved but please wait for Anh or Sukriti!

@willmj willmj closed this Aug 6, 2024
@willmj willmj reopened this Aug 6, 2024
src/peft/tuners/lora/model.py#L432
"""
with tempfile.TemporaryDirectory() as tempdir:
TRAIN_KWARGS = {**BASE_LORA_KWARGS, **{"output_dir": tempdir}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of importing from launch_scripts, you can create your own dict using the full set of args we defined within this test.

Suggested change
TRAIN_KWARGS = {**BASE_LORA_KWARGS, **{"output_dir": tempdir}}
TRAIN_KWARGS = {**MODEL_ARGS, **TRAIN_ARGS, **DATA_ARGS, **PEFT_LORA_ARGS, **{"output_dir": tempdir}}

this should be creating the same but keeps the params all in one file to track

Comment on lines 409 to 414
"""Check that if target_modules is not set, or set to None in KWARGS sent to main,
the default value by model type will be using in training.
We use TRAIN_KWARGS from test_launch_script.py to run training with, which has no
target_modules set. During HF training process, the correct default target modules
will be used for model type Llama and "q_proj", "v_proj" will then exist in the
resulting in adapter_config.json.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these details are a little off

set to None in KWARGS sent to main

we are not passing args via KWARGS in this test, we are passing it via JSON

the default value by model type will be using in training.

should specify LoRA tuning

We use TRAIN_KWARGS from test_launch_script.py to run training with, which has no target_modules set.

Why pull the args from test_launch_script? That makes is confusing that if a user changes a param in test_launch_script that it could fail a test in test_sft_trainer

and "q_proj", "v_proj" will then exist in the resulting in adapter_config.json.

I don't think need to specify this since you say the correct target modules used for model type Llama and have HF link and the test shows this

model type Llama

Good to be more specific, use LlamaForCausalLM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking out! Yeah it's definitely confusing to pull from test_launch_script, I didn't want to add too much to test_sft_trainer for just one test, but that makes sense - I'll go ahead and change that.

Good to be more specific, use LlamaForCausalLM

I actually think the model type (the one that HF uses for defaulting) I mean is llama, let me know what you think.

from tests.build.test_launch_script import BASE_LORA_KWARGS
from tests.data import (
EMPTY_DATA,
MALFORMATTED_DATA,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add a test where run parse_arguments and verify that not passing in target_modules gives None still. This can be added to existing test on line 149

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
"""
with tempfile.TemporaryDirectory() as tempdir:
TRAIN_KWARGS = {
**MODEL_ARGS.__dict__,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see since these aren't dicts, you can't combine them as easily. Nice way of combining them! Although a small refactor would be to combine your custom ones

**PEFT_LORA_ARGS.__dict__,
**{"peft_method": "lora", "output_dir": tempdir},

Comment on lines 450 to 454
"""Check that if target_modules is not set, or set to None via JSON, the
default value by model type will be using in LoRA tuning.
We use MODEL_ARGS to run tuning, which has no target_modules set. During HF
training process, the correct default target modules will be used for model
type llama and will then exist in the resulting in adapter_config.json.
Copy link
Collaborator

@anhuong anhuong Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: slight rewording for clarity and making a little more succinct

Suggested change
"""Check that if target_modules is not set, or set to None via JSON, the
default value by model type will be using in LoRA tuning.
We use MODEL_ARGS to run tuning, which has no target_modules set. During HF
training process, the correct default target modules will be used for model
type llama and will then exist in the resulting in adapter_config.json.
"""Check that if target_modules is not set, or set to None via JSON, the
default value by model type will be using in LoRA tuning.
The correct default target modules will be used for model
type llama and will exist in the resulting adapter_config.json.

**{"peft_method": "lora"},
**{"output_dir": tempdir},
}
# TRAIN_KWARGS = {**BASE_LORA_KWARGS, **{"output_dir": tempdir}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove comment

Comment on lines 48 to 81
MODEL_NAME = "Maykeye/TinyLLama-v0"
BASE_KWARGS = {
"model_name_or_path": MODEL_NAME,
"training_data_path": TWITTER_COMPLAINTS_DATA,
"num_train_epochs": 5,
"per_device_train_batch_size": 4,
"per_device_eval_batch_size": 4,
"gradient_accumulation_steps": 4,
"learning_rate": 0.00001,
"weight_decay": 0,
"warmup_ratio": 0.03,
"lr_scheduler_type": "cosine",
"logging_steps": 1,
"include_tokens_per_second": True,
"packing": False,
"response_template": "\n### Label:",
"dataset_text_field": "output",
"use_flash_attn": False,
"torch_dtype": "float32",
"max_seq_length": 4096,
}
BASE_PEFT_KWARGS = {
**BASE_KWARGS,
**{
"peft_method": "pt",
"prompt_tuning_init": "RANDOM",
"num_virtual_tokens": 8,
"prompt_tuning_init_text": "hello",
"save_strategy": "epoch",
"output_dir": "tmp",
},
}
BASE_LORA_KWARGS = {
**BASE_KWARGS,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used so can remove

parser, job_config_lora
)
assert isinstance(tune_config, peft_config.LoraConfig)
assert "target_modules" not in job_config_lora
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can also assert not hassattr(tune_config, "target_modules") so that it verifies the input from job_config_lora and this above assertion verifies that after parse_arguments run, the value is still None for target_modules

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I add this check I get the following error:

>       assert not hasattr(tune_config, "target_modules")
E       AssertionError: assert not True
E        +  where True = hasattr(LoraConfig(r=8, lora_alpha=32, target_modules=None, lora_dropout=0.05), 'target_modules')

I think this is because the attribute target_modules exists but is None. Would this suffice?
assert tune_config.target_modules is None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that makes sense and yes that check looks good! i think you can refactor it down to assert not tune_config.target_modules

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Copy link
Collaborator

@anhuong anhuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Will!

@anhuong anhuong merged commit 06614b6 into foundation-model-stack:main Aug 6, 2024
@willmj willmj deleted the 1143-test branch August 7, 2024 00:39
anhuong added a commit that referenced this pull request Aug 14, 2024
* Set default value of target_modules to be None in LoraConfig

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* Removal of transformers logger and addition of python logger

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* FMT and lint check: Removal of transformers logger and addition of python logger

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: remove lm_head for granite with llama arch models (#258)

* initial code for deleting lm_head

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* fix logic for copying checkpoint

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* fix check that embed_tokens and lm_head weights are the same

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* fix warning assertion

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* fix lm_head check, remove test

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* small fixes from code review

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* fmt

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

---------

Signed-off-by: Anh-Uong <anh.uong@ibm.com>
Co-authored-by: Anh-Uong <anh.uong@ibm.com>
Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* Add config_utils tests

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

* Fix fmt

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

* Separate tests out and use docstrings

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

* Update more field/value checks from HF defaults

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

* Fix: Addition of env var TRANSFORMERS_VERBOSITY check

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* FMT Fix: Addition of env var TRANSFORMERS_VERBOSITY check

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* Add test for tokenizer in lora config (should be ignored)

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

* Adding logging support to accelerate launch

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* FMT_FIX: Adding logging support to accelerate launch

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* bug: On save event added to callback (#256)

* feat: On save event added to callback

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

* fix: Removed additional bracket

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

* fix: Removed additional bracket

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

* fix: Format issues resolved

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

* fix: rebase with upstream and add new line

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

---------

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Co-authored-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* feat: All metric handling changes (#263)

* feat: All metric handling changes

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

* fix: Format issues

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

---------

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

* feat: Configuration to set logging level for trigger log (#241)

* feat: Added the triggered login in the operation

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

* fix: Formatting issues

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

* fix: Added default config

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

* fix: Moved the variable to right scope

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

* fix: Checked added to validate config log level

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

* fix: Removed some unwanted log file

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

---------

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

* limit peft deps until investigate (#274)

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* Data custom collator (#260)

* refactor code to preprocess datasets

Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix formatting

Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* allow input/output in validate args

Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* format input/output JSON and mask

Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* function to return suitable collator

Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* add tests for SFT Trainer input/output format

Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* remove unused functions

Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* add eos token to input/output format

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix tests

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* improve docstrings

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* keeping JSON keys constant

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* support for input/output format

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* formatting fixes

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update rEADME formats

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* formatting README

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

---------

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Revert "limit peft deps until investigate (#274)" (#275)

This reverts commit f57ff63.

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* feat: per process state metric (#239)

Signed-off-by: Harikrishnan Balagopal <harikrishmenon@gmail.com>

* Modify test to pass with target_modules: None

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* Logging changes and unit tests added

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* feat: Add a dockerfile argument to enable aimstack (#261)

* Add a dockerfile argument at the end of final layer to enable aimstack.
Currenlty guarded by a dockerfile argument.

Signed-off-by: Dushyant Behl <dushyantbehl@users.noreply.github.com>

* Set the default value of ENABLE_AIM to false

Signed-off-by: Dushyant Behl <dushyantbehl@users.noreply.github.com>

---------

Signed-off-by: Dushyant Behl <dushyantbehl@users.noreply.github.com>

* Solved conflict with main

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* FMT:Fix Solved conflict with main

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* enabling tests for prompt tuning

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* feat: Support pretokenized (#272)

* feat: support pretokenized datasets

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: rebase with upstream and review commits

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: rebase with upstream and review commits

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: rebase with upstream and review commits

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* consolidate collator code

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* add valuerrors for incorrect args

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* feat: add unit tests for validate_data_args and format_dataset

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* feat: add unit tests for validate_data_args and format_dataset

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* feat: add unit tests for validate_data_args and format_dataset

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* feat: add unit tests for validate_data_args and format_dataset

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

---------

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Alex Brooks <alex.brooks@ibm.com>

* Update packaging requirement from <24,>=23.2 to >=23.2,<25 (#212)

Updates the requirements on [packaging](https://github.com/pypa/packaging) to permit the latest version.
- [Release notes](https://github.com/pypa/packaging/releases)
- [Changelog](https://github.com/pypa/packaging/blob/main/CHANGELOG.rst)
- [Commits](pypa/packaging@23.2...24.1)

---
updated-dependencies:
- dependency-name: packaging
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Anh Uong <anh.uong@ibm.com>

* enabling tests for prompt tuning (#278)

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>
Co-authored-by: Anh Uong <anh.uong@ibm.com>

* fix: do not add special tokens for custom tokenizer (#279)

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* PR changes for changing logger

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: bug where the logger was not being used properly (#286)

Signed-off-by: Hari <harikrishmenon@gmail.com>

* Unit Tests changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* Add functionality to free disk space from Github Actions (#287)

* Add functionality to free disk space from Github Actions

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* Add functionality to free disk space from Github Actions, relocate from build-and-publish.yaml to image.yaml

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* Move freeing space step to before building image

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

---------

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* commented os.environ[LOG_LEVEL] in accelerate.py for testing

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* FIX:FMT

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* Add unit test to verify target_modules defaults correctly (#281)

* Add unit test to verify target_modules defaults correctly

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* Add sft_trainer.main test to ensure target modules properly default for LoRA when set to None from CLI

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fmt

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* Use model_args instead of importing, fix nits

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* Add test to ensure target_modules defaults to None in job config

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* Add additional check, fix nits

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

---------

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* docs: Add documentation on experiment tracking. (#257)

Signed-off-by: Dushyant Behl <dushyantbehl@users.noreply.github.com>

* Ensure additional metadata to trackers don't throw error in happy case. (#290)

Signed-off-by: Dushyant Behl <dushyantbehl@users.noreply.github.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix multiple runid creation bug with accelerate. (#268)

Signed-off-by: Dushyant Behl <dushyantbehl@users.noreply.github.com>

* feat: logging control operation (#264)

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

* Metrics file epoch indexing from 0

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* Revert last commit

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix run evaluation to get base model path (#273)

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* feat: Added additional events such as on_step_begin, on_optimizer_step, on_substep_end (#293)

Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>

* Always update setuptools to latest (#288)

Signed-off-by: James Busche <jbusche@us.ibm.com>
Co-authored-by: Anh Uong <anh.uong@ibm.com>

* Rename all fixtures with correct .jsonl extension (#295)

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Co-authored-by: Anh Uong <anh.uong@ibm.com>

* feat: add save_model_dir flag where final checkpoint saved (#291)

* add save_model_dir flag for final checkpoint

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* remove output_dir logic, add save method

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* update accelerate_launch, remove save tokenizer

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* fix: put back creation of .complete file

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* fix failing tests and add new ones

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* tests: add sft_trainer test to train and save

- small refactor of tests

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* add docs on saving checkpoints and fix help msg

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* update example and note best checkpoint

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* changes based on PR review

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

* add logging to save, fix error out properly

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

---------

Signed-off-by: Anh-Uong <anh.uong@ibm.com>

---------

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Abhishek <maurya.abhishek@ibm.com>
Signed-off-by: Anh-Uong <anh.uong@ibm.com>
Signed-off-by: Angel Luu <angel.luu@us.ibm.com>
Signed-off-by: Padmanabha V Seshadri <seshapad@in.ibm.com>
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Harikrishnan Balagopal <harikrishmenon@gmail.com>
Signed-off-by: Dushyant Behl <dushyantbehl@users.noreply.github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Hari <harikrishmenon@gmail.com>
Signed-off-by: James Busche <jbusche@us.ibm.com>
Co-authored-by: Abhishek <maurya.abhishek@ibm.com>
Co-authored-by: Sukriti Sharma <Ssukriti@users.noreply.github.com>
Co-authored-by: Anh-Uong <anh.uong@ibm.com>
Co-authored-by: Abhishek Maurya <124327945+Abhishek-TAMU@users.noreply.github.com>
Co-authored-by: Angel Luu <angel.luu@us.ibm.com>
Co-authored-by: Angel Luu <an317gel@gmail.com>
Co-authored-by: Padmanabha V Seshadri <seshapad@in.ibm.com>
Co-authored-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Hari <harikrishmenon@gmail.com>
Co-authored-by: Dushyant Behl <dushyantbehl@users.noreply.github.com>
Co-authored-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: James Busche <jbusche@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants