Add section on handling categorical values

Patrick Park · Patrick Park · commit 6b22d1211f31 · 2018-10-11T08:21:40.000-07:00
diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst
@@ -989,6 +989,60 @@ Note that ``df.groupby('A').colname.std().`` is more efficient than
 is only interesting over one column (here ``colname``), it may be filtered 
 *before* applying the aggregation function.
 
+.. _groupby.observed:
+
+Handling of (un)observed Categorical values
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When using a ``Categorical`` grouper (as a single grouper, or as part of multiple groupers), the ``observed`` keyword
+controls whether to return a cartesian product of all possible groupers values (``observed=False``) or only those
+that are observed groupers (``observed=True``).
+
+Show all values:
+
+.. ipython:: python
+
+   pd.Series([1, 1, 1]).groupby(pd.Categorical(['a', 'a', 'a'], categories=['a', 'b']), observed=False).count()
+
+Show only the observed values:
+
+.. ipython:: python
+
+   pd.Series([1, 1, 1]).groupby(pd.Categorical(['a', 'a', 'a'], categories=['a', 'b']), observed=True).count()
+
+The returned dtype of the grouped will *always* include *all* of the categories that were grouped.
+
+.. ipython:: python
+
+   s = pd.Series([1, 1, 1]).groupby(pd.Categorical(['a', 'a', 'a'], categories=['a', 'b']), observed=False).count()
+   s.index.dtype
+
+.. note::
+   Decimal and object columns are also "nuisance" columns. They are excluded from aggregate functions automatically in groupby.
+
+   If you do wish to include decimal or object columns in an aggregation with other non-nuisance data types, you must do so explicitly.
+
+.. ipython:: python
+
+    from decimal import Decimal
+    dec = pd.DataFrame(
+        {'id': [123, 456, 123, 456],
+        'int_column': [1, 2, 3, 4],
+        'dec_column1': [Decimal('0.50'), Decimal('0.15'), Decimal('0.25'), Decimal('0.40')]
+        },
+        columns=['id','int_column','dec_column']
+    )
+
+    # Decimal columns can be sum'd explicitly by themselves...
+    dec.groupby(['id'], as_index=False)['dec_column'].sum()
+
+    # ...but cannot be combined with standard data types or they will be excluded
+    dec.groupby(['id'], as_index=False)['int_column','dec_column'].sum()
+
+    # Use .agg function to aggregate over standard and "nuisance" data types at the same time
+    dec.groupby(['id'], as_index=False).agg({'int_column': 'sum', 'dec_column': 'sum'})
+
+
 .. _groupby.missing:
 
 NA and NaT group handling