-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
It's pretty common when using DataFrame.unstack()
that the fill_value
argument does not need to be used because there is a 1:1 mapping of the cells in the stacked and unstacked tables. This is the case when the dataframe/series was produced with stack()
.
Oftentimes this is the expectation and if any filling does occur it represents a problem in the data. It would be good if this assumption could be made explicit with an argument that causes an exception to be raised if it is violated.
Feature Description
Add a new Boolean keyword argument named nofill
or similar to DataFrame.unstack()
and Series.unstack()
. The default value should be False. If True and the value for any cell is missing, raise a ValueError
instead of substituting the fill value. The error message would ideally contain the index and column of the missing value.
Alternative Solutions
There might be a better solution, but all I can think of are the following:
- Check that the index of the table/series prior to calling
unstack()
is "complete" - contains the full Cartesian product of the labels of the to-be-unstacked level and all unique combinations of the remaining ones. This is somewhat complicated, especially if there are more than 2 levels. - Check for the presence of the fill value in the result. A suitable sentinel value needs to be chosen which is known not to occur in the input, so NA does not always work.
Additional Context
No response