Skip to content

Commit 998920c

Browse files
committed
Merge branch 'master' into PR_TOOL_MERGE_PR_19762
2 parents 6aea33d + e97be6f commit 998920c

File tree

95 files changed

+2815
-1529
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

95 files changed

+2815
-1529
lines changed

ci/lint.sh

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,15 @@ if [ "$LINT" ]; then
111111
RET=1
112112
fi
113113

114+
# Check for the following code in the extension array base tests
115+
# tm.assert_frame_equal
116+
# tm.assert_series_equal
117+
grep -r -E --include '*.py' --exclude base.py 'tm.assert_(series|frame)_equal' pandas/tests/extension/base
118+
119+
if [ $? = "0" ]; then
120+
RET=1
121+
fi
122+
114123
echo "Check for invalid testing DONE"
115124

116125
# Check for imports from pandas.core.common instead
@@ -156,6 +165,7 @@ if [ "$LINT" ]; then
156165
RET=1
157166
fi
158167
echo "Check for deprecated messages without sphinx directive DONE"
168+
159169
else
160170
echo "NOT Linting"
161171
fi

doc/source/basics.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2312,4 +2312,4 @@ All NumPy dtypes are subclasses of ``numpy.generic``:
23122312
.. note::
23132313

23142314
Pandas also defines the types ``category``, and ``datetime64[ns, tz]``, which are not integrated into the normal
2315-
NumPy hierarchy and wont show up with the above function.
2315+
NumPy hierarchy and won't show up with the above function.

doc/source/dsintro.rst

Lines changed: 14 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -364,6 +364,19 @@ and returns a DataFrame. It operates like the ``DataFrame`` constructor except
364364
for the ``orient`` parameter which is ``'columns'`` by default, but which can be
365365
set to ``'index'`` in order to use the dict keys as row labels.
366366

367+
368+
.. ipython:: python
369+
370+
pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]))
371+
372+
If you pass ``orient='index'``, the keys will be the row labels. In this
373+
case, you can also pass the desired column names:
374+
375+
.. ipython:: python
376+
377+
pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]),
378+
orient='index', columns=['one', 'two', 'three'])
379+
367380
.. _basics.dataframe.from_records:
368381

369382
**DataFrame.from_records**
@@ -378,28 +391,6 @@ dtype. For example:
378391
data
379392
pd.DataFrame.from_records(data, index='C')
380393
381-
.. _basics.dataframe.from_items:
382-
383-
**DataFrame.from_items**
384-
385-
``DataFrame.from_items`` works analogously to the form of the ``dict``
386-
constructor that takes a sequence of ``(key, value)`` pairs, where the keys are
387-
column (or row, in the case of ``orient='index'``) names, and the value are the
388-
column values (or row values). This can be useful for constructing a DataFrame
389-
with the columns in a particular order without having to pass an explicit list
390-
of columns:
391-
392-
.. ipython:: python
393-
394-
pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])])
395-
396-
If you pass ``orient='index'``, the keys will be the row labels. But in this
397-
case you must also pass the desired column names:
398-
399-
.. ipython:: python
400-
401-
pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])],
402-
orient='index', columns=['one', 'two', 'three'])
403394
404395
Column selection, addition, deletion
405396
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -539,7 +530,7 @@ To write code compatible with all versions of Python, split the assignment in tw
539530
you'll need to take care when passing ``assign`` expressions that
540531

541532
* Updating an existing column
542-
* Refering to the newly updated column in the same ``assign``
533+
* Referring to the newly updated column in the same ``assign``
543534

544535
For example, we'll update column "A" and then refer to it when creating "B".
545536

doc/source/gotchas.rst

Lines changed: 59 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -22,22 +22,22 @@ Frequently Asked Questions (FAQ)
2222

2323
DataFrame memory usage
2424
----------------------
25-
The memory usage of a dataframe (including the index)
26-
is shown when accessing the ``info`` method of a dataframe. A
27-
configuration option, ``display.memory_usage`` (see :ref:`options`),
28-
specifies if the dataframe's memory usage will be displayed when
29-
invoking the ``df.info()`` method.
25+
The memory usage of a ``DataFrame`` (including the index) is shown when calling
26+
the :meth:`~DataFrame.info`. A configuration option, ``display.memory_usage``
27+
(see :ref:`the list of options <options.available>`), specifies if the
28+
``DataFrame``'s memory usage will be displayed when invoking the ``df.info()``
29+
method.
3030

31-
For example, the memory usage of the dataframe below is shown
32-
when calling ``df.info()``:
31+
For example, the memory usage of the ``DataFrame`` below is shown
32+
when calling :meth:`~DataFrame.info`:
3333

3434
.. ipython:: python
3535
3636
dtypes = ['int64', 'float64', 'datetime64[ns]', 'timedelta64[ns]',
3737
'complex128', 'object', 'bool']
3838
n = 5000
39-
data = dict([ (t, np.random.randint(100, size=n).astype(t))
40-
for t in dtypes])
39+
data = dict([(t, np.random.randint(100, size=n).astype(t))
40+
for t in dtypes])
4141
df = pd.DataFrame(data)
4242
df['categorical'] = df['object'].astype('category')
4343
@@ -48,7 +48,7 @@ pandas does not count the memory used by values in columns with
4848
``dtype=object``.
4949

5050
Passing ``memory_usage='deep'`` will enable a more accurate memory usage report,
51-
that accounts for the full usage of the contained objects. This is optional
51+
accounting for the full usage of the contained objects. This is optional
5252
as it can be expensive to do this deeper introspection.
5353

5454
.. ipython:: python
@@ -58,11 +58,11 @@ as it can be expensive to do this deeper introspection.
5858
By default the display option is set to ``True`` but can be explicitly
5959
overridden by passing the ``memory_usage`` argument when invoking ``df.info()``.
6060

61-
The memory usage of each column can be found by calling the ``memory_usage``
62-
method. This returns a Series with an index represented by column names
63-
and memory usage of each column shown in bytes. For the dataframe above,
64-
the memory usage of each column and the total memory usage of the
65-
dataframe can be found with the memory_usage method:
61+
The memory usage of each column can be found by calling the
62+
:meth:`~DataFrame.memory_usage` method. This returns a ``Series`` with an index
63+
represented by column names and memory usage of each column shown in bytes. For
64+
the ``DataFrame`` above, the memory usage of each column and the total memory
65+
usage can be found with the ``memory_usage`` method:
6666

6767
.. ipython:: python
6868
@@ -71,18 +71,18 @@ dataframe can be found with the memory_usage method:
7171
# total memory usage of dataframe
7272
df.memory_usage().sum()
7373
74-
By default the memory usage of the dataframe's index is shown in the
75-
returned Series, the memory usage of the index can be suppressed by passing
74+
By default the memory usage of the ``DataFrame``'s index is shown in the
75+
returned ``Series``, the memory usage of the index can be suppressed by passing
7676
the ``index=False`` argument:
7777

7878
.. ipython:: python
7979
8080
df.memory_usage(index=False)
8181
82-
The memory usage displayed by the ``info`` method utilizes the
83-
``memory_usage`` method to determine the memory usage of a dataframe
84-
while also formatting the output in human-readable units (base-2
85-
representation; i.e., 1KB = 1024 bytes).
82+
The memory usage displayed by the :meth:`~DataFrame.info` method utilizes the
83+
:meth:`~DataFrame.memory_usage` method to determine the memory usage of a
84+
``DataFrame`` while also formatting the output in human-readable units (base-2
85+
representation; i.e. 1KB = 1024 bytes).
8686

8787
See also :ref:`Categorical Memory Usage <categorical.memory>`.
8888

@@ -91,17 +91,18 @@ See also :ref:`Categorical Memory Usage <categorical.memory>`.
9191
Using If/Truth Statements with pandas
9292
-------------------------------------
9393

94-
pandas follows the NumPy convention of raising an error when you try to convert something to a ``bool``.
95-
This happens in a ``if`` or when using the boolean operations, ``and``, ``or``, or ``not``. It is not clear
96-
what the result of
94+
pandas follows the NumPy convention of raising an error when you try to convert
95+
something to a ``bool``. This happens in an ``if``-statement or when using the
96+
boolean operations: ``and``, ``or``, and ``not``. It is not clear what the result
97+
of the following code should be:
9798

9899
.. code-block:: python
99100
100101
>>> if pd.Series([False, True, False]):
101102
...
102103
103-
should be. Should it be ``True`` because it's not zero-length? ``False`` because there are ``False`` values?
104-
It is unclear, so instead, pandas raises a ``ValueError``:
104+
Should it be ``True`` because it's not zero-length, or ``False`` because there
105+
are ``False`` values? It is unclear, so instead, pandas raises a ``ValueError``:
105106

106107
.. code-block:: python
107108
@@ -111,9 +112,9 @@ It is unclear, so instead, pandas raises a ``ValueError``:
111112
...
112113
ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().
113114
114-
115-
If you see that, you need to explicitly choose what you want to do with it (e.g., use `any()`, `all()` or `empty`).
116-
or, you might want to compare if the pandas object is ``None``
115+
You need to explicitly choose what you want to do with the ``DataFrame``, e.g.
116+
use :meth:`~DataFrame.any`, :meth:`~DataFrame.all` or :meth:`~DataFrame.empty`.
117+
Alternatively, you might want to compare if the pandas object is ``None``:
117118

118119
.. code-block:: python
119120
@@ -122,15 +123,16 @@ or, you might want to compare if the pandas object is ``None``
122123
>>> I was not None
123124
124125
125-
or return if ``any`` value is ``True``.
126+
Below is how to check if any of the values are ``True``:
126127

127128
.. code-block:: python
128129
129130
>>> if pd.Series([False, True, False]).any():
130131
print("I am any")
131132
>>> I am any
132133
133-
To evaluate single-element pandas objects in a boolean context, use the method ``.bool()``:
134+
To evaluate single-element pandas objects in a boolean context, use the method
135+
:meth:`~DataFrame.bool`:
134136

135137
.. ipython:: python
136138
@@ -161,25 +163,25 @@ See :ref:`boolean comparisons<basics.compare>` for more examples.
161163
Using the ``in`` operator
162164
~~~~~~~~~~~~~~~~~~~~~~~~~
163165

164-
Using the Python ``in`` operator on a Series tests for membership in the
166+
Using the Python ``in`` operator on a ``Series`` tests for membership in the
165167
index, not membership among the values.
166168

167-
.. ipython::
169+
.. ipython:: python
168170
169171
s = pd.Series(range(5), index=list('abcde'))
170172
2 in s
171173
'b' in s
172174
173175
If this behavior is surprising, keep in mind that using ``in`` on a Python
174-
dictionary tests keys, not values, and Series are dict-like.
175-
To test for membership in the values, use the method :func:`~pandas.Series.isin`:
176+
dictionary tests keys, not values, and ``Series`` are dict-like.
177+
To test for membership in the values, use the method :meth:`~pandas.Series.isin`:
176178

177-
.. ipython::
179+
.. ipython:: python
178180
179181
s.isin([2])
180182
s.isin([2]).any()
181183
182-
For DataFrames, likewise, ``in`` applies to the column axis,
184+
For ``DataFrames``, likewise, ``in`` applies to the column axis,
183185
testing for membership in the list of column names.
184186

185187
``NaN``, Integer ``NA`` values and ``NA`` type promotions
@@ -189,12 +191,12 @@ Choice of ``NA`` representation
189191
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
190192

191193
For lack of ``NA`` (missing) support from the ground up in NumPy and Python in
192-
general, we were given the difficult choice between either
194+
general, we were given the difficult choice between either:
193195

194196
- A *masked array* solution: an array of data and an array of boolean values
195-
indicating whether a value is there or is missing
197+
indicating whether a value is there or is missing.
196198
- Using a special sentinel value, bit pattern, or set of sentinel values to
197-
denote ``NA`` across the dtypes
199+
denote ``NA`` across the dtypes.
198200

199201
For many reasons we chose the latter. After years of production use it has
200202
proven, at least in my opinion, to be the best decision given the state of
@@ -226,15 +228,16 @@ arrays. For example:
226228
s2.dtype
227229
228230
This trade-off is made largely for memory and performance reasons, and also so
229-
that the resulting Series continues to be "numeric". One possibility is to use
230-
``dtype=object`` arrays instead.
231+
that the resulting ``Series`` continues to be "numeric". One possibility is to
232+
use ``dtype=object`` arrays instead.
231233

232234
``NA`` type promotions
233235
~~~~~~~~~~~~~~~~~~~~~~
234236

235-
When introducing NAs into an existing Series or DataFrame via ``reindex`` or
236-
some other means, boolean and integer types will be promoted to a different
237-
dtype in order to store the NAs. These are summarized by this table:
237+
When introducing NAs into an existing ``Series`` or ``DataFrame`` via
238+
:meth:`~Series.reindex` or some other means, boolean and integer types will be
239+
promoted to a different dtype in order to store the NAs. The promotions are
240+
summarized in this table:
238241

239242
.. csv-table::
240243
:header: "Typeclass","Promotion dtype for storing NAs"
@@ -289,19 +292,19 @@ integer arrays to floating when NAs must be introduced.
289292

290293
Differences with NumPy
291294
----------------------
292-
For Series and DataFrame objects, ``var`` normalizes by ``N-1`` to produce
293-
unbiased estimates of the sample variance, while NumPy's ``var`` normalizes
294-
by N, which measures the variance of the sample. Note that ``cov``
295-
normalizes by ``N-1`` in both pandas and NumPy.
295+
For ``Series`` and ``DataFrame`` objects, :meth:`~DataFrame.var` normalizes by
296+
``N-1`` to produce unbiased estimates of the sample variance, while NumPy's
297+
``var`` normalizes by N, which measures the variance of the sample. Note that
298+
:meth:`~DataFrame.cov` normalizes by ``N-1`` in both pandas and NumPy.
296299

297300

298301
Thread-safety
299302
-------------
300303

301304
As of pandas 0.11, pandas is not 100% thread safe. The known issues relate to
302-
the ``DataFrame.copy`` method. If you are doing a lot of copying of DataFrame
303-
objects shared among threads, we recommend holding locks inside the threads
304-
where the data copying occurs.
305+
the :meth:`~DataFrame.copy` method. If you are doing a lot of copying of
306+
``DataFrame`` objects shared among threads, we recommend holding locks inside
307+
the threads where the data copying occurs.
305308

306309
See `this link <https://stackoverflow.com/questions/13592618/python-pandas-dataframe-thread-safe>`__
307310
for more information.
@@ -310,7 +313,8 @@ for more information.
310313
Byte-Ordering Issues
311314
--------------------
312315
Occasionally you may have to deal with data that were created on a machine with
313-
a different byte order than the one on which you are running Python. A common symptom of this issue is an error like
316+
a different byte order than the one on which you are running Python. A common
317+
symptom of this issue is an error like:
314318

315319
.. code-block:: python
316320
@@ -320,8 +324,8 @@ a different byte order than the one on which you are running Python. A common sy
320324
321325
To deal
322326
with this issue you should convert the underlying NumPy array to the native
323-
system byte order *before* passing it to Series/DataFrame/Panel constructors
324-
using something similar to the following:
327+
system byte order *before* passing it to ``Series`` or ``DataFrame``
328+
constructors using something similar to the following:
325329

326330
.. ipython:: python
327331

doc/source/tutorials.rst

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -26,32 +26,34 @@ repository <http://github.com/jvns/pandas-cookbook>`_. To run the examples in th
2626
clone the GitHub repository and get IPython Notebook running.
2727
See `How to use this cookbook <https://github.com/jvns/pandas-cookbook#how-to-use-this-cookbook>`_.
2828

29-
- `A quick tour of the IPython Notebook: <http://nbviewer.ipython.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/A%20quick%20tour%20of%20IPython%20Notebook.ipynb>`_
29+
- `A quick tour of the IPython Notebook: <http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/A%20quick%20tour%20of%20IPython%20Notebook.ipynb>`_
3030
Shows off IPython's awesome tab completion and magic functions.
31-
- `Chapter 1: <http://nbviewer.ipython.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%201%20-%20Reading%20from%20a%20CSV.ipynb>`_
31+
- `Chapter 1: <http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%201%20-%20Reading%20from%20a%20CSV.ipynb>`_
3232
Reading your data into pandas is pretty much the easiest thing. Even
3333
when the encoding is wrong!
34-
- `Chapter 2: <http://nbviewer.ipython.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%202%20-%20Selecting%20data%20&%20finding%20the%20most%20common%20complaint%20type.ipynb>`_
34+
- `Chapter 2: <http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%202%20-%20Selecting%20data%20%26%20finding%20the%20most%20common%20complaint%20type.ipynb>`_
3535
It's not totally obvious how to select data from a pandas dataframe.
3636
Here we explain the basics (how to take slices and get columns)
37-
- `Chapter 3: <http://nbviewer.ipython.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%203%20-%20Which%20borough%20has%20the%20most%20noise%20complaints%3F%20%28or%2C%20more%20selecting%20data%29.ipynb>`_
37+
- `Chapter 3: <http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%203%20-%20Which%20borough%20has%20the%20most%20noise%20complaints%20%28or%2C%20more%20selecting%20data%29.ipynb>`_
3838
Here we get into serious slicing and dicing and learn how to filter
3939
dataframes in complicated ways, really fast.
40-
- `Chapter 4: <http://nbviewer.ipython.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%204%20-%20Find%20out%20on%20which%20weekday%20people%20bike%20the%20most%20with%20groupby%20and%20aggregate.ipynb>`_
40+
- `Chapter 4: <http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%204%20-%20Find%20out%20on%20which%20weekday%20people%20bike%20the%20most%20with%20groupby%20and%20aggregate.ipynb>`_
4141
Groupby/aggregate is seriously my favorite thing about pandas
4242
and I use it all the time. You should probably read this.
43-
- `Chapter 5: <http://nbviewer.ipython.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%205%20-%20Combining%20dataframes%20and%20scraping%20Canadian%20weather%20data.ipynb>`_
43+
- `Chapter 5: <http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%205%20-%20Combining%20dataframes%20and%20scraping%20Canadian%20weather%20data.ipynb>`_
4444
Here you get to find out if it's cold in Montreal in the winter
4545
(spoiler: yes). Web scraping with pandas is fun! Here we combine dataframes.
46-
- `Chapter 6: <http://nbviewer.ipython.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%206%20-%20String%20operations%21%20Which%20month%20was%20the%20snowiest%3F.ipynb>`_
46+
- `Chapter 6: <http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%206%20-%20String%20Operations-%20Which%20month%20was%20the%20snowiest.ipynb>`_
4747
Strings with pandas are great. It has all these vectorized string
4848
operations and they're the best. We will turn a bunch of strings
4949
containing "Snow" into vectors of numbers in a trice.
50-
- `Chapter 7: <http://nbviewer.ipython.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%207%20-%20Cleaning%20up%20messy%20data.ipynb>`_
50+
- `Chapter 7: <http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%207%20-%20Cleaning%20up%20messy%20data.ipynb>`_
5151
Cleaning up messy data is never a joy, but with pandas it's easier.
52-
- `Chapter 8: <http://nbviewer.ipython.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%208%20-%20How%20to%20deal%20with%20timestamps.ipynb>`_
52+
- `Chapter 8: <http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%208%20-%20How%20to%20deal%20with%20timestamps.ipynb>`_
5353
Parsing Unix timestamps is confusing at first but it turns out
5454
to be really easy.
55+
- `Chapter 9: <http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%209%20-%20Loading%20data%20from%20SQL%20databases.ipynb>`_
56+
Reading data from SQL databases.
5557

5658

5759
Lessons for new pandas users

doc/source/whatsnew/v0.14.1.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ Performance
145145
~~~~~~~~~~~
146146
- Improvements in dtype inference for numeric operations involving yielding performance gains for dtypes: ``int64``, ``timedelta64``, ``datetime64`` (:issue:`7223`)
147147
- Improvements in Series.transform for significant performance gains (:issue:`6496`)
148-
- Improvements in DataFrame.transform with ufuncs and built-in grouper functions for signifcant performance gains (:issue:`7383`)
148+
- Improvements in DataFrame.transform with ufuncs and built-in grouper functions for significant performance gains (:issue:`7383`)
149149
- Regression in groupby aggregation of datetime64 dtypes (:issue:`7555`)
150150
- Improvements in `MultiIndex.from_product` for large iterables (:issue:`7627`)
151151

0 commit comments

Comments
 (0)