-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
I always found the mechanics of combine_first very unintuitive, and constantly need to look into the docs to see what's happening. I haven't checked the git history, but it seems that the method was a direct response from wesm to a SO question (https://stackoverflow.com/a/9794891). In particular, I think this would be much more intuitive to do with df.update, which is a subset of what #21855 proposes -- it introduces join='outer' for DataFrame.update (currently, only 'left' is supported, but even the source code notes # TODO: Support other joins).
With that new option, df1.combine_first(df2) would be the same as df1.update(df2, join='outer', overwrite=False), only that combine_first has much fewer options and controls (i.e. filter_func and raise_conflict). The only difference is that df.update currently returns None, see #21858.
Since it's quite a well-established function, the deprecation cycle would maybe have to be longer than usual, but I think the update variant is much cleaner, as well as more versatile, than this single-purpose function.