Skip to content

Data Validations are generated one-per-cell instead of per-range #131

@billblume

Description

@billblume

Data Validation elements written by the Xlsx writer are generated one-per-cell instead of by column ranges. This causes two issues:

  1. The generated Xlsx files may be much larger than necessary
  2. Excel has a limit on the number of data validations per worksheet. I don't know what this limit is exactly, but I suspect that it is 64k. If you exceed this limit, excel will drop all data validations for the worksheet.

For us, we had a worksheet with 15,000 rows and 8 data validations spanning 8 different columns. This resulted in 120,000 data validation elements in the Xslx file, causing both issues described above.

Data validations should be generated per range of cells, not per cell.

Here is the fix we implemented for PHPExcel 1.8.1. Given that the data validation generation code is unchanged for PhpSpreadsheet, you should be able to use this fix with few or no changes,

        /* This function is unchanged except for the addition of the call to the new function _mergeDataValidationRanges() */
	private function _writeDataValidations(PHPExcel_Shared_XMLWriter $objWriter = null, PHPExcel_Worksheet $pSheet = null)
	{
		// Datavalidation collection
		$dataValidationCollection = $pSheet->getDataValidationCollection();

		// Write data validations?
		if (!empty($dataValidationCollection)) {
			$dataValidationCollection = $this->_mergeDataValidationRanges($dataValidationCollection);

			$objWriter->startElement('dataValidations');
			$objWriter->writeAttribute('count', count($dataValidationCollection));

			foreach ($dataValidationCollection as $coordinate => $dv) {
				$objWriter->startElement('dataValidation');
                                ...
				$objWriter->writeAttribute('sqref', $coordinate);
                                ...
				$objWriter->endElement();
			}

			$objWriter->endElement();
		}
	}

        /* New function that converts a map of cell-coordinates to data-validations to a map cell-ranges to data-validations. */
	private function _mergeDataValidationRanges(array $dataValidationCollection)
	{
		$hashedDvs = array();

		foreach ($dataValidationCollection as $coord => $dv) {
			list($column, $row) = PHPExcel_Cell::coordinateFromString($coord);
			$row = intval(ltrim($row,'$'));
			$hashCode = $column . '-' . $dv->getHashCode();

			if (! isset($hashedDvs[$hashCode])) {
				$hashedDvs[$hashCode] = (object) array(
					'dv' => $dv,
					'col' => $column,
					'rows' => array($row)
				);
			} else {
				$hashedDvs[$hashCode]->rows[] = $row;
			}
		}

		$mergedDataValidationCollection = array();
		ksort($hashedDvs);

		foreach ($hashedDvs as $hashedDv) {
			sort($hashedDv->rows);
			$rowStart = null;
			$rowEnd = null;
			$ranges = array();

			foreach ($hashedDv->rows as $row) {
				if ($rowStart === null) {
					$rowStart = $row;
					$rowEnd = $row;
				} else if ($rowEnd === $row - 1) {
					$rowEnd = $row;
				} else {
					if ($rowStart == $rowEnd) {
						$ranges[] = $hashedDv->col . $rowStart;
					} else {
						$ranges[] = $hashedDv->col . $rowStart . ':' . $hashedDv->col . $rowEnd;
					}

					$rowStart = $row;
					$rowEnd = $row;
				}
			}

			if ($rowStart !== null) {
				if ($rowStart == $rowEnd) {
					$ranges[] = $hashedDv->col . $rowStart;
				} else {
					$ranges[] = $hashedDv->col . $rowStart . ':' . $hashedDv->col . $rowEnd;
				}
			}

			foreach ($ranges as $range) {
				$mergedDataValidationCollection[$range] = $hashedDv->dv;
			}
		}

		return $mergedDataValidationCollection;
	}

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedwriter/xlsxWriter for MS OfficeOpenXML-format (xlsx) spreadsheet files

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions