Skip to content

Unexpected Automatic Date Conversion #2196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matthew-macgregor opened this issue Dec 11, 2020 · 10 comments
Closed

Unexpected Automatic Date Conversion #2196

matthew-macgregor opened this issue Dec 11, 2020 · 10 comments

Comments

@matthew-macgregor
Copy link

Issue Overview

One of our systems which uses sheetjs for parsing csv/xlsx data produced an unexpected result this week. The data which caused the issue was "Mayslanding, NJ 08234", which the library coerced to a date as { t: 'n', v: 2313568, w: '5/1/34' }. My hunch is that the code which attempts to guess if a cell might be a date is too lax given some inputs.

This happens when raw: false, which is the default. We have worked around it by simply setting raw: true and doing our own type conversions. Still, this unexpected result seems likely to trip up others using the default setting.

I am willing to volunteer to submit a PR to fix this issue but would like some feedback from a maintainer before writing any code. My guess is that any changes to this date conversion code could very easily cause other unexpected side effects, so advice would be appreciated.

Steps to Reproduce:

Given the (somewhat fabricated) csv data below, the following code will produce the result:

const wkbk = xlsx.readFile('./datesville.csv')

CSV input:

Line 1
"Januaryville, NJ 08234"
"Februaryville, NJ 08234"
"Marchville, NJ 08234"
"Aprilville, NJ 08234"
"Mayslanding, NJ 08234"
"Juneville, NJ 08234"
"Julyville, NJ 08234"
"Augustville, NJ 08234"
"Septemberville, NJ 08234"
"Octoberville, NJ 08234"
"Novemberlanding, NJ 08330"
"Decembertown, NJ 08330"

Unexpected result:

{
  A1: { t: 's', v: 'Line 1' },
  A2: { t: 'n', v: 2313448, w: '1/1/34' },
  A3: { t: 'n', v: 2313479, w: '2/1/34' },
  A4: { t: 'n', v: 2313507, w: '3/1/34' },
  A5: { t: 'n', v: 2313538, w: '4/1/34' },
  A6: { t: 'n', v: 2313568, w: '5/1/34' },
  A7: { t: 'n', v: 2313599, w: '6/1/34' },
  A8: { t: 'n', v: 2313629, w: '7/1/34' },
  A9: { t: 'n', v: 2313660, w: '8/1/34' },
  A10: { t: 'n', v: 2313691, w: '9/1/34' },
  A11: { t: 'n', v: 2313721, w: '10/1/34' },
  A12: { t: 'n', v: 2348815, w: '11/1/30' },
  A13: { t: 'n', v: 2348845, w: '12/1/30' },
  '!ref': 'A1:A13'
}
@SheetJSDev
Copy link
Contributor

To be clear, this ultimately boils down to how V8 (Chrome/Node) handles dates:

new Date("Mayslanding, NJ 08234"); // Thu May 01 8234 00:00:00 in your timezone

We try to correct for it in fuzzydate by looking for one of the month labels. I suspect the month name regex can be cleaned up a bit to explicitly match the short or long form for names, like

if(s.toLowerCase().match(/\b(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\b/)) return o;
if(s.toLowerCase().match(/\b(january|february|march|april|may|june|july|august|september|october|november|december)\b/)) return o;

@matthew-macgregor
Copy link
Author

I'll take a closer look at those and put together a PR when I get a chance.

@jbull328
Copy link

I am also seeing this when creating an excel file in my project. In our case, the data is a string "1/$1.99" It gets turned into a date with some of the numbers but some randomness as well. 1/2/1999

@matthew-macgregor
Copy link
Author

@jbull328, just to confirm, you're seeing that behavior on creation/output of a document with that data? I am not seeing it on input in xlsx (with several column formats) or csv.

@jbull328
Copy link

Correct @matthew-macgregor we did see that on output. I ended up abandoning using the module.

@SheetJSDev
Copy link
Contributor

@jbull328 If this is showing up when you write a CSV, that's Excel's automatic conversion. For example, consider the CSV

1/3,2/3,3/3

If you write that as plaintext to a file and open in Excel, it will interpret those as dates:

screenshot

In any case that's an issue unrelated to this one.

@flaushi
Copy link

flaushi commented May 13, 2021

I want to confirm that the date parsing strategy of sheetjs is a bit too aggressive for my taste.

Effectively I can only use raw reading mode for csv files, as random decimal numbers are interpreted as date, which is not foreseeable so not reliable.

There are dozens of issues on this topic.

If I switch on raw, this problem disappears, however everything is a string then. I have to parse numbers then.

Is there something in between? So a config that will parse numbers if possible, and leaves the rest as string?

@SheetJSDev
Copy link
Contributor

@flaushi it's aggressive because V8 (chrome/node) is aggressive. fuzzydate attempts some light validation. Unfortunately this story probably ends with a hard-coded list of acceptable date formats, as discussed in #1300 (comment)

@SheetJSDev
Copy link
Contributor

Moving this to #1300

@reviewher
Copy link
Contributor

0.18.1 fixes this issue. To verify in NodeJS:

$ for v in 0.17.4 0.17.5 0.18.0 0.18.1; do npm i "xlsx@$v"; node -pe 'var XLSX = require("xlsx"); [XLSX.version, XLSX.readFile("t.csv").Sheets.Sheet1.A1]'; done
[ '0.17.4', { t: 'n', v: 2313448, w: '1/1/34' } ]
[ '0.17.5', { t: 'n', v: 2313448, w: '1/1/34' } ]
[ '0.18.0', { t: 'n', v: 2313448, w: '1/1/34' } ]
[ '0.18.1', { t: 's', v: 'Januaryville, NJ 08234' } ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants