Dataset validation fix for explanation generation #441
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
2 fixes here:
read_file no longer exists in the DatasetLoader class so updated that
There was a subsequent set of errors due to dataset validation expecting an explanation column in the dataset (in seed.csv before explanation column was generated and in test.csv since explanation column isn't supposed to exist in the test dataset). Added a fix to not include the explanation column in all dataset validation. We do already have a separate check for the existence of the explanation column prior to example selection when appropriate here: https://github.com/refuel-ai/autolabel/blob/main/src/autolabel/labeler.py#L107
Also added some minor changes for slight cleanups (typos etc.)