Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include the scripts for preprocessing OAST and unit tests for chat sft datasets #7112

Merged
merged 23 commits into from
Aug 7, 2023

Commits on Jul 14, 2023

  1. scripts for sft

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 14, 2023
    Configuration menu
    Copy the full SHA
    06c9490 View commit details
    Browse the repository at this point in the history
  2. fix style

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 14, 2023
    Configuration menu
    Copy the full SHA
    825a9de View commit details
    Browse the repository at this point in the history
  3. adde special token only for huggingface model

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 14, 2023
    Configuration menu
    Copy the full SHA
    8d1bc6a View commit details
    Browse the repository at this point in the history
  4. change default name

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 14, 2023
    Configuration menu
    Copy the full SHA
    871f068 View commit details
    Browse the repository at this point in the history

Commits on Jul 16, 2023

  1. print out error datapoint content

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 16, 2023
    Configuration menu
    Copy the full SHA
    b49e619 View commit details
    Browse the repository at this point in the history
  2. show error id

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 16, 2023
    Configuration menu
    Copy the full SHA
    f3374f7 View commit details
    Browse the repository at this point in the history

Commits on Jul 17, 2023

  1. annotation script working

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    199d575 View commit details
    Browse the repository at this point in the history
  2. try to be compatible with huggingface tokenizer

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    01d49d5 View commit details
    Browse the repository at this point in the history
  3. added examples

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    726effd View commit details
    Browse the repository at this point in the history

Commits on Jul 20, 2023

  1. added lang

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 20, 2023
    Configuration menu
    Copy the full SHA
    39192be View commit details
    Browse the repository at this point in the history

Commits on Jul 21, 2023

  1. added lang

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 21, 2023
    Configuration menu
    Copy the full SHA
    1912e97 View commit details
    Browse the repository at this point in the history

Commits on Jul 23, 2023

  1. text to value special case

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 23, 2023
    Configuration menu
    Copy the full SHA
    d0c18f0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ca1526b View commit details
    Browse the repository at this point in the history

Commits on Jul 25, 2023

  1. configure the slider

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 25, 2023
    Configuration menu
    Copy the full SHA
    5059a9e View commit details
    Browse the repository at this point in the history
  2. annoatation handles lang

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 25, 2023
    Configuration menu
    Copy the full SHA
    cb67e93 View commit details
    Browse the repository at this point in the history
  3. added the unit test for chat sft dataset

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 25, 2023
    Configuration menu
    Copy the full SHA
    8972863 View commit details
    Browse the repository at this point in the history

Commits on Jul 26, 2023

  1. used the file in the test dir

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 26, 2023
    Configuration menu
    Copy the full SHA
    5e9fbc8 View commit details
    Browse the repository at this point in the history

Commits on Jul 27, 2023

  1. fix json error

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 27, 2023
    Configuration menu
    Copy the full SHA
    f737755 View commit details
    Browse the repository at this point in the history
  2. load local tokenizer

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 27, 2023
    Configuration menu
    Copy the full SHA
    d56302d View commit details
    Browse the repository at this point in the history
  3. remove mask count check

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 27, 2023
    Configuration menu
    Copy the full SHA
    199f900 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    cd9331b View commit details
    Browse the repository at this point in the history

Commits on Jul 29, 2023

  1. added HF dataset backend

    Signed-off-by: Yi Dong <[email protected]>
    yidong72 committed Jul 29, 2023
    Configuration menu
    Copy the full SHA
    5016e69 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c53662f View commit details
    Browse the repository at this point in the history