-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Raise FileNotFoundError
in read_json
if input looks like file path but file is missing
#46718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
876f133
fcccaaf
2b84027
e9b1fe6
72bdb93
50b48ea
87c0490
008cbae
1c38a6b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,6 +52,7 @@ | |
|
||
from pandas.io.common import ( | ||
IOHandles, | ||
_extension_to_compression, | ||
file_exists, | ||
get_handle, | ||
is_fsspec_url, | ||
|
@@ -698,6 +699,9 @@ def _get_data_from_filepath(self, filepath_or_buffer): | |
|
||
This method turns (1) into (2) to simplify the rest of the processing. | ||
It returns input types (2) and (3) unchanged. | ||
|
||
It raises FileNotFoundError if the input is a string ending in | ||
one of .json, .json.gz, .json.bz2, etc. but no such file exists. | ||
""" | ||
# if it is a string but the file does not exist, it might be a JSON string | ||
filepath_or_buffer = stringify_path(filepath_or_buffer) | ||
|
@@ -716,6 +720,14 @@ def _get_data_from_filepath(self, filepath_or_buffer): | |
errors=self.encoding_errors, | ||
) | ||
filepath_or_buffer = self.handles.handle | ||
elif ( | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
isinstance(filepath_or_buffer, str) | ||
and filepath_or_buffer.lower().endswith( | ||
(".json",) + tuple(f".json{c}" for c in _extension_to_compression) | ||
) | ||
and not file_exists(filepath_or_buffer) | ||
): | ||
raise FileNotFoundError(f"File {filepath_or_buffer} does not exist") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you can investigate if this logic can be moved according to #46718 (comment) (such that not just json files raise this) that would be great There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are the safe assumptions to make here about how the file path ends? Is there sth like # Only for write methods
if "r" not in mode and is_path:
check_parent_directory(str(handle))
_supported_formats = ["json", "csv", "xls", "xlsx", ...]
_allowed_extensions = tuple(itertools.chain(
(x,) + tuple(f"{x}{c}" for c in _extension_to_compression)
for x in _supported_formats
))
if (
is_path
and handle.lower().endswith(_allowed_extensions)
and not file_exists(handle)
):
raise FileNotFoundError(f"File {handle} does not exist") There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After read_json knows that it is a file, I think you can just raise inside get_handle when the file doesn't exist: if "r" not in mode and is_path and not file_exists(handle):
raise ... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You mean this, right? if "r" in mode and is_path and not file_exists(handle):
raise ... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually if (
not isinstance(filepath_or_buffer, str)
or is_url(filepath_or_buffer)
or is_fsspec_url(filepath_or_buffer)
or file_exists(filepath_or_buffer)
):
self.handles = get_handle(...) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there were attempts at creating an efficient way of checking whether a str is a legit JSON string. I assume these attempts were unsuccessful - we check the reverse. If that's the case, then moving the check into get_handle, will not work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
So leave PR as is? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Probably. |
||
|
||
return filepath_or_buffer | ||
|
||
|
Uh oh!
There was an error while loading. Please reload this page.