Skip to content

Reader\Xlsx incorrect use of readFilter #370

@xklid101

Description

@xklid101

This is:

- [x] a bug report
- [ ] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

Xlsx reader should not read rows and columns filtered out in readFilter at all.

What is the current behavior?

Xlsx reader first reads all rows and columns referenced in xml source files sheets and then it uses filter on all loaded cells.
But if some xlsx workbook has auto-generated empty max column of 1025 and max row of 1048576 without cells and values (my libreoffice 5.6 do this in some cases - I don't know when and why, but it is doing this - mabye after import from ms excel or csv etc...) - it reads all useless empty rows and columns and so reading and writing a spreadsheet back to xml with writer takes a looong time because of loaded rows and columns number.

What are the steps to reproduce?

in xml source is (for file /xl/worksheets/sheet1.xml):

...
<cols>
    <col collapsed="false" hidden="false" max="1" min="1" style="0" width="8.17818181818182"/>
    <col collapsed="false" hidden="false" max="2" min="2" style="0" width="55.8981818181818"/>
    <col collapsed="false" hidden="false" max="1025" min="3" style="0" width="8.17818181818182"/>
</cols>
...
<sheetData>
    <row collapsed="false" customFormat="false" customHeight="false" hidden="false" ht="17.35" outlineLevel="0" r="12">
        <c r="B12" s="1"/>
    </row>
    ...
    <row collapsed="false" customFormat="false" customHeight="true" hidden="false" ht="19.45" outlineLevel="0" r="43">
        <c r="B43" s="6" t="s">
            <v>25</v>
        </c>
    </row>
    <row collapsed="false" customFormat="false" customHeight="false" hidden="false" ht="12.8" outlineLevel="0" r="1048576"/>
</sheetData>
...

I use following readFilter to read cells

...
public function readCell($column, $row, $worksheetName = '') {
    $r = (int) $row;
    // rows max and min below are int typed already
    if($r > $this->coords['rows']['max'] ||
        $r < $this->coords['rows']['min']) 
            return false;

    $col = sprintf("%04s", $column);
    // columns max and min below are sprintf'ed already
    if($col > $this->coords['columns']['max'] ||
        $col < $this->coords['columns']['min'])
            return false;

    return true;
}
...

Which versions of PhpSpreadsheet and PHP are affected?

develop branch and at least PHP 5.6


My temp solution

is to extend xlsx reader, and add "my code" in Xlsx Reader
(extending xlsx is in fact copy/pasting almost all xlsx reader class because almost all methods are private and load method is too long to partially extend)

<?php

namespace PhpOffice\PhpSpreadsheet\Reader
...
class Xlsx extends BaseReader
{
    ...
    public function load($pFilename)
    {
        ...
        // row 831
        if ((int) ($col['max']) == 16384) {
            break;
        }
        // my code
        if ($this->getReadFilter() !== null) {
            if (!$this->getReadFilter()->readCell(
                                            Coordinate::stringFromColumnIndex((int) $col['max']),
                                            1,
                                            $docSheet->getTitle())
                ) {

                break;
            }
        }
        ...
        // row 853
        if ($xmlSheet && $xmlSheet->sheetData && $xmlSheet->sheetData->row) {
            $cIndex = 1; // Cell Start from 1
            foreach ($xmlSheet->sheetData->row as $row) {
                // my code
                if ($this->getReadFilter() !== null) {
                    if (!$this->getReadFilter()->readCell('A', $row['r'], $docSheet->getTitle()))
                        continue;
                }
        ...
    }
    ...
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions