You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# colA,colB# aaaaa...aaaaa zzzzz...zzzzz \# ... } 10 or 100 rows# aaaaa...aaaaa zzzzz...zzzzz /## \___________/ \___________/# 1000chars 1000chars# 10 rows# "," is used as delimiter
python3 -c "print('colA,colB') ; [print('a'*1000 + ' ' + 'z'*1000) for _ in range(10)]"| csvstat
# => ok# 100 rows# " " is used as delimiter
python3 -c "print('colA,colB') ; [print('a'*1000 + ' ' + 'z'*1000) for _ in range(100)]"| csvstat
# => Row 0 has 3 values, but Table only has 2 columns.
In the latter case, sample is trimmed, losing the header colA,colB, thus white space " " is used as the delimiter.
It was tough for me to figure out this behavior. So how about showing "what delimiter is used" in:
Debug output
$ csvstat -v ...
inferred delimiter: ' '
Error message
$ csvstat -v ...
Row 0 has 3 values, but Table only has 2 columns (delimiter: ' ').
and, how about showing warning of excessing SNIFF_LIMIT?:
$ csvstat -v ...
warning: input (XXX bytes) exceeds SNIFF_LIMIT (YYY bytes), delimiter guessing may be incorrect (NOTE: SNIFF_LIMIT can be changed by -y flag)
warning: guessed delimiter: ' '
Row 0 has 3 values, but Table only has 2 columns.
The text was updated successfully, but these errors were encountered:
wataash
changed the title
Want delimiter to be shown when raise or -v
Want delimiter to be shown on exception
Jan 9, 2019
Hmm, agate raises ValueError for "Row 0 has 3 values, but Table only has 2 columns." type errors in agate/table/__init__.py. We'd have to introduce a new error class (subclass'ing ValueError, in case anyone catches these). We'd also have to handle it all over the place, because we need access to the reader to print the dialect.
Debug output
This is a good idea. As above, we'd have to add it in a lot of places. Happy to merge a PR!
and, how about showing warning of excessing SNIFF_LIMIT?:
The snifflimit was reduced in 1.0.7 to avoid sniffing huge files (which is very slow). So, this warning would now be emitted too frequently to be useful.
In the latter case, sample is trimmed, losing the header
colA,colB
, thus white space " " is used as the delimiter.It was tough for me to figure out this behavior. So how about showing "what delimiter is used" in:
and, how about showing warning of excessing
SNIFF_LIMIT
?:The text was updated successfully, but these errors were encountered: