Description
Sent to [email protected] on June 23 ([email protected]); posted here as recommended by Marc Garcia.
Code Sample, a copy-pastable example if possible
path = 'file://localhost/Users/vlb/Learn/DSC_Intro/'
filename = path + 'yelp_dataset/review_test.json'
# read the entire file -- this works
reviews = pd.read_json(filename, lines=True)
reviews.info()
# create a reader to read in chunks -- this part seems to work
review_reader = pd.read_json(StringIO(filename), lines=True, chunksize=1)
type(review_reader)
# But trying to read from the reader throws an error
# ValueError: Unexpected character found when decoding 'false'
for chunk in review_reader:
print(chunk)
Data Samples
Either or both of the following records can be used
{"review_id":"rEITo90tpyKmEfNDp3Ou3A","user_id":"6Fz_nus_OG4gar721OKgZA","business_id":"6lj2BJ4tJeu7db5asGHQ4w","stars":5.0,"useful":0,"funny":0,"cool":0,"text":"We've been a huge Slim's fan since they opened one up in Texas about two years ago when we used to live there. This place never disappoints. They even have great salads and grilled chicken. Plus they have fresh brewed sweet tea, it's the best!","date":"2017-05-26 01:23:19"}
{"review_id":"Amo5gZBvCuPc_tZNpHwtsA","user_id":"DzZ7piLBF-WsJxqosfJgtA","business_id":"qx6WhZ42eDKmBchZDax4dQ","stars":5.0,"useful":1,"funny":0,"cool":0,"text":"Our family LOVES the food here. Quick, friendly, delicious, and a great restaurant to take kids to. 5 stars!","date":"2017-03-27 01:14:37"}
Problem description
Problem description
I am working a tutorial that uses a JSON data file from Yelp. The file is huge, so it needs to be read in chunks.
I get an unexpected error: ValueError: Unexpected character found when decoding 'false'
For testing purposes, I have reduced the dataset to a much smaller file with only 3 lines. I can reproduce the error with that file as well as with a file containing only one (any one) of the three lines.
Note that if I simply read in the entire (test) data set in one go, that works. It's only when I create a reader and try to review the chunks that I get the error.
Expected Output
No errors. A chunk should print.
If there is an error, it should be less opaque than "Unexpected character found when decoding 'false'".
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml.etree: 4.3.2
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.7.0
gcsfs: None