-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
This is:
- [x] a bug report
- [ ] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)
What is the expected behavior?
Xlsx reader should not read rows and columns filtered out in readFilter
at all.
What is the current behavior?
Xlsx reader first reads all rows and columns referenced in xml source files sheets and then it uses filter on all loaded cells.
But if some xlsx workbook has auto-generated empty max column of 1025 and max row of 1048576 without cells and values (my libreoffice 5.6 do this in some cases - I don't know when and why, but it is doing this - mabye after import from ms excel or csv etc...) - it reads all useless empty rows and columns and so reading and writing a spreadsheet back to xml with writer takes a looong time because of loaded rows and columns number.
What are the steps to reproduce?
in xml source is (for file /xl/worksheets/sheet1.xml):
...
<cols>
<col collapsed="false" hidden="false" max="1" min="1" style="0" width="8.17818181818182"/>
<col collapsed="false" hidden="false" max="2" min="2" style="0" width="55.8981818181818"/>
<col collapsed="false" hidden="false" max="1025" min="3" style="0" width="8.17818181818182"/>
</cols>
...
<sheetData>
<row collapsed="false" customFormat="false" customHeight="false" hidden="false" ht="17.35" outlineLevel="0" r="12">
<c r="B12" s="1"/>
</row>
...
<row collapsed="false" customFormat="false" customHeight="true" hidden="false" ht="19.45" outlineLevel="0" r="43">
<c r="B43" s="6" t="s">
<v>25</v>
</c>
</row>
<row collapsed="false" customFormat="false" customHeight="false" hidden="false" ht="12.8" outlineLevel="0" r="1048576"/>
</sheetData>
...
I use following readFilter to read cells
...
public function readCell($column, $row, $worksheetName = '') {
$r = (int) $row;
// rows max and min below are int typed already
if($r > $this->coords['rows']['max'] ||
$r < $this->coords['rows']['min'])
return false;
$col = sprintf("%04s", $column);
// columns max and min below are sprintf'ed already
if($col > $this->coords['columns']['max'] ||
$col < $this->coords['columns']['min'])
return false;
return true;
}
...
Which versions of PhpSpreadsheet and PHP are affected?
develop branch and at least PHP 5.6
My temp solution
is to extend xlsx reader, and add "my code" in Xlsx Reader
(extending xlsx is in fact copy/pasting almost all xlsx reader class because almost all methods are private and load
method is too long to partially extend)
<?php
namespace PhpOffice\PhpSpreadsheet\Reader
...
class Xlsx extends BaseReader
{
...
public function load($pFilename)
{
...
// row 831
if ((int) ($col['max']) == 16384) {
break;
}
// my code
if ($this->getReadFilter() !== null) {
if (!$this->getReadFilter()->readCell(
Coordinate::stringFromColumnIndex((int) $col['max']),
1,
$docSheet->getTitle())
) {
break;
}
}
...
// row 853
if ($xmlSheet && $xmlSheet->sheetData && $xmlSheet->sheetData->row) {
$cIndex = 1; // Cell Start from 1
foreach ($xmlSheet->sheetData->row as $row) {
// my code
if ($this->getReadFilter() !== null) {
if (!$this->getReadFilter()->readCell('A', $row['r'], $docSheet->getTitle()))
continue;
}
...
}
...
}