Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Killed when using long list of training data, How to solve it ? #154

Open
1 of 2 tasks
housebaby opened this issue Oct 15, 2024 · 1 comment
Open
1 of 2 tasks

Killed when using long list of training data, How to solve it ? #154

housebaby opened this issue Oct 15, 2024 · 1 comment

Comments

@housebaby
Copy link

System Info

what does split mean hear,as no difference between train or others?

    self.data_list = []
    if split == "train":
        with open(dataset_config.train_data_path, encoding='utf-8') as fin:
            for line in fin:
                data_dict = json.loads(line.strip())
                self.data_list.append(data_dict)
    else:
        with open(dataset_config.val_data_path, encoding='utf-8') as fin:
            for line in fin:
                data_dict = json.loads(line.strip())
                self.data_list.append(data_dict)

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

image

Error logs

image

Expected behavior

If I change the long training list to small list, it works
How to adapt to large training dataset?

@ddlBoJack
Copy link
Collaborator

split means your train, val, and test sets here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants