-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
high memory occupy when reading 100w lines excel file with latest master version #1096
Comments
Thanks for your feedback, the master branch code support unzip worksheet as temporary files to reduce memory usage, but some worksheets cell not using inline value, the cell value has been storage in the SST (shared string table), and Excelize doesn't support streaming reading/writing SST currently, the memory usage of reading some spreadsheet maybe still seem high, could you provides the attachemts without confidential info, and I can analytics and improvement in the future. |
非常感谢作者能这么快回复哈!
2)『some worksheets cell not using inline value, the cell value has been storage in the SST (shared string table), and Excelize doesn't support streaming reading/writing SST』
|
If you're using Excel and other spreadsheet applications, and when you create a spreadsheet with excelize common API, the cell value with string data types will be stored in the SST by default, but if you're using the streaming API to create a spreadsheet with the streaming writer, all cell value will be stored as inline. I'll consider improving reading performance on working with SST in the next. |
- Unzip shared string table to system temporary file when large inner XML, reduce memory usage about 70% - Remove unnecessary exported variable `XMLHeader`, we can using `encoding/xml` package's `xml.Header` instead of it - Using constant instead of inline text for default XML path - Rename exported option field `WorksheetUnzipMemLimit` to `UnzipXMLSizeLimit` - Unit test and documentation updated
I have optimize memory usage, now support unzip shared string table to system temporary file when large inner XML, reduce memory usage about 60%, please try to upgrade to the master branch code, and this feature will be released in the next version. |
非常感谢,拉取了master代码 重新跑了下上 之前100w行的数据读,内存inused 从470M->175M 降低了很多 (分配的total数据 3.65G也降低了一些) |
Hi @aceshot, I further reduce memory usage for reading SST by about 50%, note that time cost will increase by 30%, if you like please upgrade to the master branch code. |
- Unzip shared string table to system temporary file when large inner XML, reduce memory usage about 70% - Remove unnecessary exported variable `XMLHeader`, we can using `encoding/xml` package's `xml.Header` instead of it - Using constant instead of inline text for default XML path - Rename exported option field `WorksheetUnzipMemLimit` to `UnzipXMLSizeLimit` - Unit test and documentation updated
参考这个:https://github.com/qax-os/excelize/issues/845#issuecomment-922466641。
测试使用了master最新代码(支持临时文件和close版本),但是测试下来,75M左右的100w的Excel文件(单sheet单列 都是文本,64字节的256字符串),内存分配看着累计是有4G,最终持有400M,都挺高的。然后里面提到的93.7%降低,我这边似乎没体会到,是哪里使用有误?还是说因为excel自带很多元数据信息、自带压缩,所以75M文件加载内存就必然会有400M+,有优化空间?
2G内存的ECS测试机,读取过程中 内存立马就90%以上。业务逻辑上,因为行数比较多,for rows.Next()循环读取,外加一些计算处理,整个过程比较花时间。然后发现,读取&处理过程中,内存分配实际都是被进程占有和统计(在HeapInuse),最终体现在进程的RSS中,过程中持续会很高,然后触发告警 (然后内存的降低需要等整个文件读取处理完成 + GO GC释放返回OS)
附
1、本地测试代码:
增加了:
1)defer profile.Start(profile.MemProfile).Stop()
2)做总数统计
2、 defer profile.Start(profile.MemProfile).Stop()输出pprof文件
inuse_space: 400M
allocate_space: 4G
The text was updated successfully, but these errors were encountered: