API/DOC: Deprecate and Advise against having `np.nan` in Categoricals

This came out of work on https://github.com/pydata/pandas/pull/10729

In the [documentation](http://pandas.pydata.org/pandas-docs/version/0.16.2/categorical.html#missing-data), we mention that

> There are two ways a np.nan can be represented in categorical data: either the value is not available (“missing value”) or np.nan is a valid category.

In the first case, `NaN` is not in `.categories`, and in the second case it is. I think we should only
recommend the first.

The option of `NaN`s in the categories makes the code in #10729 less pleasant that it would be otherwise. I don't think we should error if NaNs are included, just advise against it in the docs. Perhaps a deprecation, but I worry that I'm missing some obvious reason why NaNs were allowed in `.categories`.

@JanSchulz  do you remember the initial reason for allowing either representation?

Some bad things that come out of `NaN` in `.categories`:
- Can't rely on a value of `nan` mapping to a code of `-1`:

``` python
In [2]: s = pd.Categorical(['a', 'b', 'a', np.nan], categories=['a', 'b', np.nan])

In [3]: s
Out[3]:
[a, b, a, NaN]
Categories (3, object): [a, b, NaN]

In [4]: s.categories
Out[4]: Index(['a', 'b', nan], dtype='object')

In [5]: s.codes
Out[5]: array([0, 1, 0, 2], dtype=int8)
```
- potentially have to upcast the index type or mix strings and floats (`nan`) in the `.categories` Index.
- extra code if you want to generically handle Categoricals that may or may not have `NaN` in categories.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API/DOC: Deprecate and Advise against having `np.nan` in Categoricals #10748

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

API/DOC: Deprecate and Advise against having np.nan in Categoricals #10748

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

API/DOC: Deprecate and Advise against having `np.nan` in Categoricals #10748