Memory leak in pandas.read_msgpack when reading from string

#### Code Sample (copy-pastable)

```python
from __future__ import division, print_function
import pandas as pd
import numpy as np
import os
import gc
import psutil


def log_memory(label):
    for i in xrange(3):
        gc.collect(i)
    process = psutil.Process(os.getpid())
    mem_usage = process.memory_info().rss / float(2 ** 20)
    print("[Memory usage] {:<25s} {:12.1f} MB".format(
        label, mem_usage
    ))


def generate_test_data(num_partitions=20):
    for i in range(num_partitions):
        N = 10 * 1000 * 1000
        # randomness required, identical files don't have the issue
        df = pd.DataFrame({
            "A": np.random.uniform(0, 1, size=N),
        })
        df.to_msgpack("/tmp/pd_test_{:02d}.msg".format(i), compress='zlib')


def load_msgpack(f):
    data = open(f).read()
    df = pd.read_msgpack(data)
    return df


def load_partitions_sequentially(num_partitions=20):
    for i in range(num_partitions):
        fn = "/tmp/pd_test_{:02d}.msg".format(i)
        df = load_msgpack(fn)
        del df
        log_memory("After partition {}".format(i+1))


log_memory("At initialization")
generate_test_data()
log_memory("After data generation")

load_partitions_sequentially()

```
#### Problem description

There is a memory leak in `pandas.read_msgpack` when reading from a string. Calling `pandas.read_msgpack(str_data)` increases the ref count of `str_data` if and only if `read_msgpack` sees the content of `str_data` for the first time. This implies that there is a memory leak, but only when reading _different_ files -- when reading the same file over and over again `str_data` will only leak once.

The problem does not exist when reading from file handles or `BytesIO`.

#### Output of above example

The output clearly shows the effect of the memory leak when loading data frame partitions sequentially:

```
[Memory usage] At initialization                 39.4 MB
[Memory usage] After data generation             39.9 MB
[Memory usage] After partition 1                185.9 MB
[Memory usage] After partition 2                329.8 MB
[Memory usage] After partition 3                473.7 MB
[Memory usage] After partition 4                617.6 MB
[Memory usage] After partition 5                761.5 MB
[Memory usage] After partition 6                905.4 MB
[Memory usage] After partition 7               1049.3 MB
[Memory usage] After partition 8               1193.2 MB
[Memory usage] After partition 9               1337.1 MB
[Memory usage] After partition 10              1481.0 MB
[Memory usage] After partition 11              1624.9 MB
[Memory usage] After partition 12              1768.8 MB
[Memory usage] After partition 13              1912.7 MB
[Memory usage] After partition 14              2056.6 MB
[Memory usage] After partition 15              2200.4 MB
[Memory usage] After partition 16              2344.3 MB
[Memory usage] After partition 17              2488.2 MB
[Memory usage] After partition 18              2631.7 MB
[Memory usage] After partition 19              2775.6 MB
[Memory usage] After partition 20              2919.5 MB
```

#### Output of ``pd.show_versions()``

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-100-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 36.0.1
Cython: None
numpy: 1.13.0
scipy: None
xarray: None
IPython: 5.4.1
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak in pandas.read_msgpack when reading from string #16647

Code Sample (copy-pastable)

Problem description

Output of above example

Output of `pd.show_versions()`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory leak in pandas.read_msgpack when reading from string #16647

Description

Code Sample (copy-pastable)

Problem description

Output of above example

Output of pd.show_versions()

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Output of `pd.show_versions()`