Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% cpu with wrong date cell #974

Closed
eaglexiang opened this issue Jul 28, 2021 · 8 comments
Closed

100% cpu with wrong date cell #974

eaglexiang opened this issue Jul 28, 2021 · 8 comments
Labels
confirmed This issue can be reproduced

Comments

@eaglexiang
Copy link
Contributor

eaglexiang commented Jul 28, 2021

We are handling excel files uploaded by customers every day, I mean those excel files are created by customers on net.

Lately we found wrong date-style cell may cause 100% cpu issue.

steps to reproduce:

  1. create a blank excel file
  2. set one cell as text format
  3. enter a large number like someone's chinese id-card number 500123199005132245
  4. set the cell as date format
  5. read it with Rows.Columns() of excelize

Program will hang up because of countless time.Add, id-card number(500123199005132245) will be handled as days count.

// Duration is limited to aprox. 290 years
for intPart > MDD {
    durationDays := time.Duration(MDD) * time.Hour * 24
    date = date.Add(durationDays)
    intPart = intPart - MDD
}

I found a block about date in ECMA-376.

For compatibility, a SpreadsheetML application can interpret serial-number values in cells or in formulas
as dates. This subclause describes how serial number values can be converted to date values depending
on the compatibility mode.
A date that can be interpreted as a numeric value is a serial value. This is made up of a signed integer
date component and an unsigned fractional time component. Going forward in time, the date
component of a serial value increases by 1 each day. A serial value represents a UTC date and time, and,
as such, has no timezone information.

It may be right in ECMA-376 but danger in production environments. Stream Reader(Rows) is needed considering large file size and we cannot find a way to avoid the performance issue.

@eaglexiang
Copy link
Contributor Author

eaglexiang commented Jul 28, 2021

time.Time holds seconds since Jan 1 year 1885 with uint64, whose maximum value is 18446744073709551615.

It means the max days count is 18446744073709551615/60/60/24 == 213503982334601.

obviously:

213503982334601 (since 1885)
<
500123199005132245 (since 1904 or 1899)

In another word, unlimited time.Add will cause int64 overflow.

If excelize can check it before loop-time.Time.Add, this issue would be solved :)

@xuri xuri added confirmed This issue can be reproduced in progress Working in progress labels Jul 28, 2021
@xuri xuri closed this as completed in 7dbf88f Jul 28, 2021
@xuri
Copy link
Member

xuri commented Jul 28, 2021

Thanks for your feedback, I have fixed it, please try to use the master branch code, and this patch will be released in the next version.

@xuri xuri removed the in progress Working in progress label Jul 28, 2021
@eaglexiang
Copy link
Contributor Author

Thanks for your work.

@imom0
Copy link

imom0 commented Aug 3, 2021

@eaglexiang 这些日期格式的身份证,是不是WPS来的?我们在业务里也遇到了,很怀疑是WPS做了什么“优化”。

就算这样解决了CPU占用的问题,也还是没拿到身份证原文吧。

@eaglexiang
Copy link
Contributor Author

这些日期格式的身份证,是不是WPS来的?我们在业务里也遇到了,很怀疑是WPS做了什么“优化”。

我们处理的文件来自客户,并不清楚他们使用什么工具编辑这些文件。不过我在这个 issue 里有说复现方式。

就算这样解决了CPU占用的问题,也还是没拿到身份证原文吧。

现在是可以拿到的。

@imom0
Copy link

imom0 commented Aug 3, 2021

这些日期格式的身份证,是不是WPS来的?我们在业务里也遇到了,很怀疑是WPS做了什么“优化”。

我们处理的文件来自客户,并不清楚他们使用什么工具编辑这些文件。不过我在这个 issue 里有说复现方式。

就算这样解决了CPU占用的问题,也还是没拿到身份证原文吧。

现在是可以拿到的。

412822199406130000 为例解析出来 -258738860607/12/26

@eaglexiang
Copy link
Contributor Author

这些日期格式的身份证,是不是WPS来的?我们在业务里也遇到了,很怀疑是WPS做了什么“优化”。

我们处理的文件来自客户,并不清楚他们使用什么工具编辑这些文件。不过我在这个 issue 里有说复现方式。

就算这样解决了CPU占用的问题,也还是没拿到身份证原文吧。

现在是可以拿到的。

412822199406130000 为例解析出来 -258738860607/12/26

刚刚验证了一下,你是对的,的确存在这个问题。但是这个恐怕无法通过 excelize 来解决,除非 excelize 能提供一个绕过格式解析,直接获取单元格原始值的方法 @xuri

@xuri
Copy link
Member

xuri commented Aug 4, 2021

The excel will display ######## if apply date or time number format for the cell which contains Chinese 18 digits identity card number, set the valid number format by SetCellStyle before reading the cell value by Excelize, for example re-set number format for A1:

f, err := excelize.OpenFile("Book1.xlsx")
if err != nil {
    fmt.Println(err)
    return
}
styleID, err := f.NewStyle(&excelize.Style{NumFmt: 1})
if err != nil {
    fmt.Println(err)
}
if err := f.SetCellStyle("Sheet1", "A1", "A1", styleID); err != nil {
    fmt.Println(err)
}

jenbonzhang pushed a commit to jenbonzhang/excelize that referenced this issue Oct 22, 2023
- Escape XML character in the drop list
- Fix incorrect character count limit in the drop list
- Fix Excel time parse issue in some case
- Fix custom number format month parse issue in some case
- Fix corrupted file generated caused by concurrency adding pictures
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed This issue can be reproduced
Projects
None yet
Development

No branches or pull requests

3 participants