Closed
Description
-
I have searched the [pandas] tag on StackOverflow for similar questions.
-
I have asked my usage related question on StackOverflow.
Question about pandas
Why do I need to use nrows when reading large json line files with chunksize option?
Since version 1.1 I'm having troubles with the function read_json() because even if I specify the option chunksize with the correct value (the value that used to work with pandas v.1.0.5), the file seems to be read at once, with a memroy error in my case. If I add the nrows option this doesn't happen but why? And what is the value you have to specify for the nrows parameter in order to load the entire file? Do you have to know in advance the maximum number of rows? Is there any special value for "all rows" like -1 o 0 ?
Thanks
#this raises a Memory Error (with a 4GB file) - this worked on version 1.0.5
reader = pd.read_json(f"{path}map_records.json",orient='records' ,lines=True, chunksize=100000)
chunks=[chunk[(chunk.bidbasket=="BSKGEOALL00000000001")&(chunk.tipomappa == "AULTIPMPS_GIT")][['bidsubzona','idoriginale','bidciv','bidbasket','tipomappa']] for chunk in reader]
#this works, but it loads up to <nrorws> rows and I have to know the maximum number of rows in advance
reader = pd.read_json(f"{path}map_records.json",orient='records' ,lines=True, chunksize=100000, nrows=20000000)
chunks=[chunk[(chunk.bidbasket=="BSKGEOALL00000000001")&(chunk.tipomappa == "AULTIPMPS_GIT")][['bidsubzona','idoriginale','bidciv','bidbasket','tipomappa']] for chunk in reader]