You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am unable to unflatten some UK government procurement data because flatten-tool's unflatten routine is internally making use of openpyxl.utils.cell's _get_column_letter routine to convert column indices into strings of the form [A-Z]+.
The issue is that _get_column_letter explicitly enforces the constraint that at most three letters are generated, which corresponds to a constraint of at most 26 + 26 * 26 + 26 * 26 * 26 = 18,278 columns.
Thus, for spreadsheets with more than 18,278 columns, the unflatten routine is failing with a dubious error message about an invalid column index.
Thankfully, the fix is quite simple: the _get_column_letter routine is self-contained and the limitation to three characters is self-imposed and easily removed.
The implementation in its entirety is:
def _get_column_letter(col_idx):
"""Convert a column number into a column letter (3 -> 'C')
Right shift the column col_idx by 26 to find column letters in reverse
order. These numbers are 1-based, and can be converted to ASCII
ordinals by adding 64.
"""
# these indicies corrospond to A -> ZZZ and include all allowed
# columns
if not 1 <= col_idx <= 18278:
raise ValueError("Invalid column index {0}".format(col_idx))
letters = []
while col_idx > 0:
col_idx, remainder = divmod(col_idx, 26)
# check for exact division and borrow if needed
if remainder == 0:
remainder = 26
col_idx -= 1
letters.append(chr(remainder+64))
return ''.join(reversed(letters))
The routine functions for arbitrarily large column indices if you simply remove these four lines.
# these indicies corrospond to A -> ZZZ and include all allowed
# columns
if not 1 <= col_idx <= 18278:
raise ValueError("Invalid column index {0}".format(col_idx))
I recommend avoiding the external dependency on _get_column_letter via an internally implemented function of the form:
def get_column_string(col_idx):
"""Convert a column number into a string of characters between A and Z."""
letters = []
while col_idx > 0:
col_idx, remainder = divmod(col_idx, 26)
# check for exact division and borrow if needed
if remainder == 0:
remainder = 26
col_idx -= 1
letters.append(chr(remainder+64))
return ''.join(reversed(letters))
The text was updated successfully, but these errors were encountered:
I am unable to unflatten some UK government procurement data because flatten-tool's unflatten routine is internally making use of openpyxl.utils.cell's _get_column_letter routine to convert column indices into strings of the form [A-Z]+.
The issue is that _get_column_letter explicitly enforces the constraint that at most three letters are generated, which corresponds to a constraint of at most 26 + 26 * 26 + 26 * 26 * 26 = 18,278 columns.
Thus, for spreadsheets with more than 18,278 columns, the unflatten routine is failing with a dubious error message about an invalid column index.
Thankfully, the fix is quite simple: the _get_column_letter routine is self-contained and the limitation to three characters is self-imposed and easily removed.
The implementation in its entirety is:
The routine functions for arbitrarily large column indices if you simply remove these four lines.
I recommend avoiding the external dependency on _get_column_letter via an internally implemented function of the form:
The text was updated successfully, but these errors were encountered: