Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data is dropped from ragged headerless TSV or CSV-lite input #1749

Open
cebamps opened this issue Feb 3, 2025 · 1 comment
Open

Data is dropped from ragged headerless TSV or CSV-lite input #1749

cebamps opened this issue Feb 3, 2025 · 1 comment

Comments

@cebamps
Copy link

cebamps commented Feb 3, 2025

Hello. First of all, thank you for Miller :)

In both TSV and CSV-lite formats, ragged input with implicit headers appears to sometimes drop fields. This seems dependent on the number of fields of the first line of data: when it has N fields, it looks like fields N+2 and beyond are dropped.

This seems specific to --itsv --ragged --hi (or --icsvlite): I cannot reproduce this with --icsv or with explicit headers.

Here are two small reproductions yielding the same outcomes.

Reproduced with mlr 6.13.0.

mlr --t2x --ragged --hi cat << EOF
a	b
a	b	c	d
EOF
mlr --icsvlite --oxtab --ragged --hi cat << EOF
a,b
a,b,c,d
EOF

Actual outcome:

1 a
2 b

1 a
2 b
3 c

Expected:

1 a
2 b

1 a
2 b
3 c
4 d

In the mean time, I have two workarounds for my personal use case:

  • Drop --hi and insert a blank header line at the top of the input.
  • Use --icsv --ifs tab --ragged --hi or --inidx --ifs tab instead of TSV input.
@aborruso
Copy link
Contributor

aborruso commented Feb 3, 2025

  • Use --icsv --ifs tab --ragged --hi or --inidx --ifs tab instead of TSV input.

The same for me: it works using csv

mlr --icsv --oxtab --ragged --hi cat << EOF
a,b
a,b,c,d
EOF

or in this way

mlr --c2x --ifs "\t" --ragged --hi cat << EOF
a	b
a	b	c	d
EOF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants