-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
read_json engine keyword and pyarrow integration #49249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start
Thanks, I appreciate it 🙏🏻 Took me 11 days but finally starting to figure it out. |
So I pushed my recent changes but after I successfully synced my fork, I started to have some problems when running the tests:
I think the pyarrow version in the @WillAyd I successfully added engine fixture to all the tests; however, since pyarrows |
Generally you will need to rebuild your cython code each time you pull in code ( |
Hey @mroeschke, Sorry for no updates. Work has been really busy the past two weeks, I haven't had the time to sit and concentrate on the issue. I will start working on it tonight. First I will address your latest comments and we can take it from there. Appreciate the support! Thanks! |
And also I don't know why this says there are conflicts because I don't what it shows in my fork. |
Did you happen to follow these update instructions? https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#updating-your-pull-request |
Yep I checked it out but for me, it didn't work with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice job - keep up the good work on this
I am trying to run the tests, but I keep getting an error, making it hard for me to run tests and update something. After each rebase I also do
I also tried to recreate the development environment, but it didn't work. |
How are you running the tests? And to confirm you created your development environment according to https://pandas.pydata.org/pandas-docs/stable/development/contributing_environment.html? |
Hey @abkosar mind if I push some commits to your PR? I would be very interested to see this feature in the next major release tentatively scheduled for next month |
Hey @mroeschke! Please go ahead, but I was going to do some changes so should I hold off? |
This is fixed too. |
- Added reason to fail to each pyarrow xfail test.
- removed xfail fixture from conftest.py
def test_read_datetime(): | ||
def test_read_jsonl_engine_pyarrow(json_dir_path, engine): | ||
result = read_json( | ||
os.path.join(json_dir_path, "line_delimited.json"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path.join(json_dir_path, "line_delimited.json"), | |
datapath("io", "json", "data", "line_delimited.json") |
And then you can replace json_dir_path
with datapath
and remove json_dir_path
from pandas/tests/io/json/conftest.py
pandas/tests/io/json/conftest.py
Outdated
def engine(request): | ||
if request.param == "pyarrow": | ||
pytest.importorskip("pyarrow.json") | ||
return request.param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Instead of the else, could you do
if request.param == "pyarrow":
pytest.importorskip(...)
return request.param
I pushed a few commits addressing my comments. I think this should be okay for 2.0. Mind taking a look @phofl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good in general, some small comments.
Can you update https://pandas.pydata.org/docs/user_guide/io.html#reading-json as well?
We are also documenting all key-words in the user guide I think, right at the top of the section. Otherwise lgtm |
Can't merge without this, the last open comment (documenting the xfails) was addressed
thx @mroeschke and thx @abkosar |
Thanks @mroeschke for all your patience! Happy to be involved! |
engine
keyword toread_json
to enable reading from pyarrow #48893doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.