Description
-
[ x] I have checked that this issue has not already been reported.
-
[ x] I have confirmed this bug exists on the latest version of pandas.
Code Sample, a copy-pastable example
Only one value is returned with this:
def gen():
test = [{'created_at': '2020-08-24T09:30:05Z',
'_id': '5f43889de6a98fd57afce7be'},
{'created_at': '2020-08-23T11:16:09Z',
'_id': '5f44b03799944352493d9317'},
]
for val in test:
yield val
results = gen()
pd.json_normalize(results)
This returns all values though:
results = gen()
list_ = [x for x in results]
pd.json_normalize(list_)
And so does this:
def list_():
final = []
test = [{'created_at': '2020-08-24T09:30:05Z',
'_id': '5f43889de6a98fd57afce7be'},
{'created_at': '2020-08-23T11:16:09Z',
'_id': '5f44b03799944352493d9317'},
]
for val in test:
final.append(val)
return final
results = list_()
pd.json_normalize(results)
Problem description
Using pd.json_normalize() on a generator always seems to reduce the expected results by 1. I first noticed this on a REST API where a column informed me that I should expect 901 results but I kept getting 900 results each time. When I tried to append the results to a list and normalize that, I got the expected 901 results.
Expected Output
Perhaps this is an expected output. It just caused me some headaches earlier and it was not immediately obvious that I was missing one record. I would expect that my example above would result in the same 2 row DataFrame.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : d9fff27
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-1028-azure
Version : #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 49.6.0.post20200814
Cython : 0.29.21
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.17.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.3.0
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 1.0.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.19
tables : 3.6.1
tabulate : 0.8.7
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.48.0