Finetune on unstructured dataset #278

lun-4 · 2023-05-17T02:06:05Z

I noticed that the finetuning scripts assume that the dataset is going to be instructional, but what I plan to do isn't such, so I took it to draft out an implementation of dataset preparation and relevant changes to LLaMA-Adapter to support such tuning.

The dataset preparation was copied from prepare_alpaca.py, which then had a lot of its setup stripped out of the instructional details.

I'm testing this code at the moment by training a model, it seems to be working well, but I'm opening this code to discussion and review beforehand.

lantiga

Thank you @lun-4, looks good!

Could you:

add the same instruction_tuning parameter (defaulting to True) to lora.py and full.py?
also add the corresponding CLI option to the finetune_ scripts?

Thanks a lot!

finetune/adapter.py

This reverts commit 2ede3c0.

lantiga · 2023-05-29T16:18:55Z

Thank you @lun-4, merging! A quick howto would be super appreciated :-)

lun-4 added 2 commits May 16, 2023 22:53

add scripts/prepare_any_text.py

e549c82

make Adapter support non-instruction-tuned datasets

f45e714

lun-4 requested review from awaelchli, carmocca and lantiga as code owners May 17, 2023 02:06

lantiga approved these changes May 18, 2023

View reviewed changes

lun-4 added 2 commits May 20, 2023 00:19

add instruction_tuning parameter to lora and full

84df5af

add CLI option for instruction_tuning

2ede3c0

carmocca mentioned this pull request May 22, 2023

generate.py and generate/full.py could be merged #313

Open

lun-4 requested a review from lantiga May 22, 2023 21:43

awaelchli reviewed May 24, 2023

View reviewed changes

finetune/adapter.py Outdated Show resolved Hide resolved

lun-4 added 2 commits May 28, 2023 23:24

Revert "add CLI option for instruction_tuning"

526b0bc

This reverts commit 2ede3c0.

add instruction_tuning parameter to evaluation scripts

33522d9

lantiga merged commit ffba202 into Lightning-AI:main May 29, 2023

lun-4 mentioned this pull request May 30, 2023

add guide on finetuning any text dataset #344

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Finetune on unstructured dataset #278

Finetune on unstructured dataset #278

Uh oh!

lun-4 commented May 17, 2023

Uh oh!

lantiga left a comment

Uh oh!

Uh oh!

lantiga commented May 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Finetune on unstructured dataset #278

Finetune on unstructured dataset #278

Uh oh!

Conversation

lun-4 commented May 17, 2023

Uh oh!

lantiga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lantiga commented May 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants