-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataFrame] LoadCsv throws IndexOutOfRangeException #5654
Comments
Using similar functionality in Pandas does not throw these errors and the dataset loads successfully. |
I looked at this locally and found the issue. The description column has many |
Great! Thanks @pgovind . This is a tricky dataset. I tried using LINQ and IO but ran into a similar issue so I understand the challenges. At least it's a way to see all the edge cases 🙂 |
Make sure only use '\t'. ',' does not work properly. If you have a string stored in a cell as "string1, string2" and you happen to use ',' as the separator, Microsoft.Data.Analysis will interpret it as two cells. This is why I have to check and convert all my comma-separated txts to tab-separated with Excel first. |
Related to #5647 |
I can't comment on #5647, but I'm seeing several folks say this issue is resolved as of a past version and want to share that it just happened to me last night. I attempted to combine csvs that all have the same column headers using the below code and got the below error. I ended up having to use Pandas. But I would prefer to be able to immerse myself in c# more so I can get better using it. If I don't specify a separator, the data will load into a dataframe, but it will be misaligned in many places. Pandas handled it without issue. If it's just me doing something incorrectly or not doing something I should be I would appreciate being pointed in the correct direction. It's difficult to find solid documentation on working with dataframes in c# like this. EDIT: Updating my below code I can get it to read into a dataframe without issue. But I cannot save it to a csv without the issue persisting using DataFrame.SaveCsv(). I tried specifying a separator but it still results in misaligned data in the output file. Context: using a Polyglot notebook //Building one data frame from a folder of CSV files
string folder_path = @"<path/to/folder>";
var files = new DirectoryInfo(folder_path).GetFiles("*.*",SearchOption.TopDirectoryOnly);
var df = DataFrame.LoadCsv(files[0].FullName);
foreach (var file in files.Skip(0))
{
var tempDF = DataFrame.LoadCsv(file.FullName);
df = df.Append(tempDF.Rows,inPlace:true);
}
DataFrame.SaveCsv(df,@"path\to\csv")
|
Using the following dataset: https://www.kaggle.com/austinreese/craigslist-carstrucks-data
The dataset itself is large (1.4 GB) and sparse.
Using the following code:
Throws the following exception:
Here is the stacktrace:
The text was updated successfully, but these errors were encountered: