Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataFrame] can't handle separators in data #5647

Closed
Tracked by #6144
terrajobst opened this issue Sep 11, 2020 · 4 comments
Closed
Tracked by #6144

[DataFrame] can't handle separators in data #5647

terrajobst opened this issue Sep 11, 2020 · 4 comments
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs
Milestone

Comments

@terrajobst
Copy link

Related to dotnet/corefxlab#2968

Looks like DataFrame can't handle CSV where the separator appears in the column data.

Repro

var frame = DataFrame.LoadCsv(fileName);

foreach (var row in frame.Rows)
{
    Console.WriteLine(row[0]);
    Console.WriteLine(row[1]);
    Console.WriteLine(row[2]);
    Console.WriteLine();
}

CSV contents:

Name,Age,Description
Paul,34,"Paul lives in Vermont, VA."
Victor,29,"Victor: Funny guy"
Maria,31,

Expected behavior

Prints the contents of the CSV

Actual behavior

Exception:

Unhandled exception. System.FormatException: Line 2 has less columns than expected
   at Microsoft.Data.Analysis.DataFrame.GuessKind(Int32 col, List`1 read)
   at Microsoft.Data.Analysis.DataFrame.LoadCsv(Stream csvStream, Char separator, Boolean header, String[] columnNames, Type[] dataTypes, Int64 numberOfRowsToRead, Int32 guessRows, Boolean addIndexColumn, Encoding encoding)
   at Microsoft.Data.Analysis.DataFrame.LoadCsv(String filename, Char separator, Boolean header, String[] columnNames, Type[] dataTypes, Int32 numRows, Int32 guessRows, Boolean addIndexColumn, Encoding encoding)
   at ConsoleApp49.Program.Main(String[] args)
@MgSam
Copy link
Contributor

MgSam commented Sep 16, 2020

Maybe CSV parsing should get its own library in the BCL similar to JSON and XML? It is an extremely common need. CSVHelper is a great (and very popular) library but obviously you guys can not reference it in your packages.

@luisquintanilla
Copy link
Contributor

Related to dotnet/corefxlab#2787

@pgovind pgovind transferred this issue from dotnet/corefxlab Mar 6, 2021
@pgovind pgovind added the Microsoft.Data.Analysis All DataFrame related issues and PRs label Mar 6, 2021
@luisquintanilla luisquintanilla changed the title DataFrame can't handle separators in data [DataFrame] can't handle separators in data Aug 1, 2022
@michaelgsharp michaelgsharp added this to the ML.NET Future milestone Aug 8, 2022
@luisquintanilla
Copy link
Contributor

luisquintanilla commented Aug 23, 2022

Unable to repro as of version 0.20.0-preview.22313.1

@dakersnar
Copy link
Contributor

dakersnar commented Aug 23, 2022

As Luis mentioned, this issue appears to have been solved at some point. Regardless, I'm adding unit tests that confirm this behavior here: #6301.

Edit: I closed that PR and included those tests in a larger PR linked below.

@ghost ghost locked as resolved and limited conversation to collaborators Oct 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs
Projects
None yet
Development

No branches or pull requests

6 participants