Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset validation fix for explanation generation #441

Merged
merged 3 commits into from
Jul 11, 2023

Conversation

Abhinav-Naikawadi
Copy link
Contributor

@Abhinav-Naikawadi Abhinav-Naikawadi commented Jul 10, 2023

2 fixes here:

  1. read_file no longer exists in the DatasetLoader class so updated that

  2. There was a subsequent set of errors due to dataset validation expecting an explanation column in the dataset (in seed.csv before explanation column was generated and in test.csv since explanation column isn't supposed to exist in the test dataset). Added a fix to not include the explanation column in all dataset validation. We do already have a separate check for the existence of the explanation column prior to example selection when appropriate here: https://github.com/refuel-ai/autolabel/blob/main/src/autolabel/labeler.py#L107

Also added some minor changes for slight cleanups (typos etc.)

"""
if self.explanation_column in dataset_columns:
dataset_columns.remove(self.explanation_column)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will modify the list being passed into the function - might be difficult for the caller to understand

Is it possible to remove this code from validate_dataset_columns and let expected_columns handle the removal of explanation column?


import pandas as pd
from autolabel.configs import AutolabelConfig
from autolabel.data_loaders.read_datasets import ( # SqlDatasetReader,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was just a black formatting reordering of imports

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants