@@ -797,37 +797,52 @@ Assigning a ``Categorical`` to parts of a column of other types will use the val
797797 df.dtypes
798798
799799 .. _categorical.merge :
800+ .. _categorical.concat :
800801
801- Merging
802- ~~~~~~~
802+ Merging / Concatenation
803+ ~~~~~~~~~~~~~~~~~~~~~~~
803804
804- You can concat two ``DataFrames `` containing categorical data together,
805- but the categories of these categoricals need to be the same:
805+ By default, combining ``Series `` or ``DataFrames `` which contain the same
806+ categories results in ``category `` dtype, otherwise results will depend on the
807+ dtype of the underlying categories. Merges that result in non-categorical
808+ dtypes will likely have higher memory usage. Use ``.astype `` or
809+ ``union_categoricals `` to ensure ``category `` results.
806810
807811.. ipython :: python
808812
809- cat = pd.Series([" a" , " b" ], dtype = " category" )
810- vals = [1 , 2 ]
811- df = pd.DataFrame({" cats" : cat, " vals" : vals})
812- res = pd.concat([df, df])
813- res
814- res.dtypes
813+ from pandas.api.types import union_categoricals
815814
816- In this case the categories are not the same, and therefore an error is raised:
815+ # same categories
816+ s1 = pd.Series([' a' , ' b' ], dtype = ' category' )
817+ s2 = pd.Series([' a' , ' b' , ' a' ], dtype = ' category' )
818+ pd.concat([s1, s2])
817819
818- .. ipython :: python
820+ # different categories
821+ s3 = pd.Series([' b' , ' c' ], dtype = ' category' )
822+ pd.concat([s1, s3])
819823
820- df_different = df.copy()
821- df_different[" cats" ].cat.categories = [" c" , " d" ]
822- try :
823- pd.concat([df, df_different])
824- except ValueError as e:
825- print (" ValueError:" , str (e))
824+ # Output dtype is inferred based on categories values
825+ int_cats = pd.Series([1 , 2 ], dtype = " category" )
826+ float_cats = pd.Series([3.0 , 4.0 ], dtype = " category" )
827+ pd.concat([int_cats, float_cats])
828+
829+ pd.concat([s1, s3]).astype(' category' )
830+ union_categoricals([s1.array, s3.array])
826831
827- The same applies to `` df.append(df_different) ``.
832+ The following table summarizes the results of merging `` Categoricals ``:
828833
829- See also the section on :ref: `merge dtypes<merging.dtypes> ` for notes about preserving merge dtypes and performance.
834+ +-------------------+------------------------+----------------------+-----------------------------+
835+ | arg1 | arg2 | identical | result |
836+ +===================+========================+======================+=============================+
837+ | category | category | True | category |
838+ +-------------------+------------------------+----------------------+-----------------------------+
839+ | category (object) | category (object) | False | object (dtype is inferred) |
840+ +-------------------+------------------------+----------------------+-----------------------------+
841+ | category (int) | category (float) | False | float (dtype is inferred) |
842+ +-------------------+------------------------+----------------------+-----------------------------+
830843
844+ See also the section on :ref: `merge dtypes<merging.dtypes> ` for notes about
845+ preserving merge dtypes and performance.
831846
832847.. _categorical.union :
833848
@@ -918,46 +933,6 @@ the resulting array will always be a plain ``Categorical``:
918933 # "b" is coded to 0 throughout, same as c1, different from c2
919934 c.codes
920935
921- .. _categorical.concat :
922-
923- Concatenation
924- ~~~~~~~~~~~~~
925-
926- This section describes concatenations specific to ``category `` dtype. See :ref: `Concatenating objects<merging.concat> ` for general description.
927-
928- By default, ``Series `` or ``DataFrame `` concatenation which contains the same categories
929- results in ``category `` dtype, otherwise results in ``object `` dtype.
930- Use ``.astype `` or ``union_categoricals `` to get ``category `` result.
931-
932- .. ipython :: python
933-
934- # same categories
935- s1 = pd.Series([' a' , ' b' ], dtype = ' category' )
936- s2 = pd.Series([' a' , ' b' , ' a' ], dtype = ' category' )
937- pd.concat([s1, s2])
938-
939- # different categories
940- s3 = pd.Series([' b' , ' c' ], dtype = ' category' )
941- pd.concat([s1, s3])
942-
943- pd.concat([s1, s3]).astype(' category' )
944- union_categoricals([s1.array, s3.array])
945-
946-
947- Following table summarizes the results of ``Categoricals `` related concatenations.
948-
949- +----------+--------------------------------------------------------+----------------------------+
950- | arg1 | arg2 | result |
951- +==========+========================================================+============================+
952- | category | category (identical categories) | category |
953- +----------+--------------------------------------------------------+----------------------------+
954- | category | category (different categories, both not ordered) | object (dtype is inferred) |
955- +----------+--------------------------------------------------------+----------------------------+
956- | category | category (different categories, either one is ordered) | object (dtype is inferred) |
957- +----------+--------------------------------------------------------+----------------------------+
958- | category | not category | object (dtype is inferred) |
959- +----------+--------------------------------------------------------+----------------------------+
960-
961936
962937 Getting data in/out
963938-------------------
0 commit comments