Skip to content

ENH: Prevent filling in unstack() #62704

@jlumpe

Description

@jlumpe

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

It's pretty common when using DataFrame.unstack() that the fill_value argument does not need to be used because there is a 1:1 mapping of the cells in the stacked and unstacked tables. This is the case when the dataframe/series was produced with stack().

Oftentimes this is the expectation and if any filling does occur it represents a problem in the data. It would be good if this assumption could be made explicit with an argument that causes an exception to be raised if it is violated.

Feature Description

Add a new Boolean keyword argument named nofill or similar to DataFrame.unstack() and Series.unstack(). The default value should be False. If True and the value for any cell is missing, raise a ValueError instead of substituting the fill value. The error message would ideally contain the index and column of the missing value.

Alternative Solutions

There might be a better solution, but all I can think of are the following:

  1. Check that the index of the table/series prior to calling unstack() is "complete" - contains the full Cartesian product of the labels of the to-be-unstacked level and all unique combinations of the remaining ones. This is somewhat complicated, especially if there are more than 2 levels.
  2. Check for the presence of the fill value in the result. A suitable sentinel value needs to be chosen which is known not to occur in the input, so NA does not always work.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions