Skip to content

Read from HDF with empty where throws an error #26746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 11, 2019
Merged

Read from HDF with empty where throws an error #26746

merged 7 commits into from
Jun 11, 2019

Conversation

BeforeFlight
Copy link
Contributor

@jreback jreback changed the title FIX: issue #26610. Tests and fix. Read from HDF with empty where throws an error Jun 9, 2019
@@ -649,6 +649,7 @@ I/O
- Bug in :func:`read_json` where date strings with ``Z`` were not converted to a UTC timezone (:issue:`26168`)
- Added ``cache_dates=True`` parameter to :meth:`read_csv`, which allows to cache unique dates when they are parsed (:issue:`25990`)
- :meth:`DataFrame.to_excel` now raises a ``ValueError`` when the caller's dimensions exceed the limitations of Excel (:issue:`26051`)
- Bug while selecting from HDF store with `where`='' specified (:issue:`26610`). Now the whole DataFrame will be returned from store.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use :class`HDFStore` , put where='' in double backtikcs. Don't put anything after the issue number (e.g. remove the Now the whole DataFrame ...), you can put that before if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check please - not sure if I get :class correctly.

@@ -95,10 +95,10 @@ def _ensure_term(where, scope_level):
wlist.append(w)
else:
wlist.append(Term(w, scope_level=level))
where = wlist
where = wlist if wlist else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is connected with next request. Check it please.

elif maybe_expression(where):
where = Term(where, scope_level=level)
return where
return where if where != "" else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just do
where if where else None as bool('') and bool([]) evaluate to False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did id at first, but in case some input where is some valid expression as string - get after elif where as Index. So this comparsion throws an The truth value of a Int64Index is ambiguous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then just do

return where if len(where) else None

@@ -4731,6 +4731,34 @@ def test_read_py2_hdf_file_in_py3(self, datapath):
result = store['p']
assert_frame_equal(result, expected)

@pytest.mark.parametrize("call", [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not necessary to parameterize over the method calls, read_hdf is enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified.

with pd.HDFStore(path) as store:
key1 = "df1"
key2 = "df2"
store.put("df1", df[["a"]], "t")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use

result = read_hdf(....)

assert_frame_equal(result, df)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jreback jreback added API Design IO HDF5 read_hdf, HDFStore labels Jun 9, 2019
@@ -649,6 +649,7 @@ I/O
- Bug in :func:`read_json` where date strings with ``Z`` were not converted to a UTC timezone (:issue:`26168`)
- Added ``cache_dates=True`` parameter to :meth:`read_csv`, which allows to cache unique dates when they are parsed (:issue:`25990`)
- :meth:`DataFrame.to_excel` now raises a ``ValueError`` when the caller's dimensions exceed the limitations of Excel (:issue:`26051`)
- Bug while selecting from :class: `pandas.HDFstore` with `where=''` specified (:issue:`26610`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no space in :class:`HDFStore` (also spelling), use double-backticks around where

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Check please. Sorry - have never worked with .rst before.

elif maybe_expression(where):
where = Term(where, scope_level=level)
return where
return where if where != "" else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then just do

return where if len(where) else None

@codecov
Copy link

codecov bot commented Jun 9, 2019

Codecov Report

Merging #26746 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #26746   +/-   ##
=======================================
  Coverage    91.7%    91.7%           
=======================================
  Files         179      179           
  Lines       50767    50767           
=======================================
  Hits        46555    46555           
  Misses       4212     4212
Flag Coverage Δ
#multiple 90.29% <50%> (ø) ⬆️
#single 41.21% <100%> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/io/pytables.py 90.29% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.88% <0%> (-0.12%) ⬇️
pandas/core/indexes/base.py 96.93% <0%> (+0.21%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c7748ca...f75ae8c. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 9, 2019

Codecov Report

Merging #26746 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26746      +/-   ##
==========================================
- Coverage   91.71%   91.71%   -0.01%     
==========================================
  Files         178      178              
  Lines       50776    50776              
==========================================
- Hits        46571    46567       -4     
- Misses       4205     4209       +4
Flag Coverage Δ
#multiple 90.3% <100%> (ø) ⬆️
#single 41.19% <100%> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/io/pytables.py 90.29% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.88% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.84% <0%> (-0.11%) ⬇️
pandas/core/computation/expr.py 97.8% <0%> (+0.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13023c6...59a6815. Read the comment docs.

@BeforeFlight
Copy link
Contributor Author

BeforeFlight commented Jun 9, 2019

return where if len(where) else None fails when where is None.

@BeforeFlight
Copy link
Contributor Author

BeforeFlight commented Jun 10, 2019

In one of the tests the where is assigned as Int64Index directly. Is it designed to (have not found any info on that in manual or API reference)? If so - in case of empty Index the desired result should be empty df (well I suppose; now it's just an error too btw). So it differs from empty list (i.e. list of terms) where desired output will be unfiltered = full df. And this difference should be processed somehow in _ensure_term.

Besides I'm confused with type checking - now where allows all the abc.iterable (almost): isinstance(w, (Expr, string_types)) or is_list_like(w). But at the same time inside _ensure_term only isinstance(where, (list, tuple)) are processed. Other iterables are just passed as is (one may pass range(3), [1,2,3], pd.Series etc.). So - is it intentionally or (as I understand from the API reference - "list of Term (or convertible) objects, optional" - i.e. it is supposed to be an iterable of terms) all is_list_likeshould be hooked inside _ensure_term too? E.g. instead of isinstance(where, (list, tuple)) should be is_list_like(where)?

@jreback jreback added this to the 0.25.0 milestone Jun 10, 2019
@jreback
Copy link
Contributor

jreback commented Jun 10, 2019

lgtm. can you merge master and ping on green.

@BeforeFlight
Copy link
Contributor Author

lgtm. can you merge master and ping on green.

I'm sorry - what is "ping on green"?
Besides what about type checking (in my questions) - leave it as is? Then, for example, if where = pd.Index([]) - the whole df will be returned as for the empty list.

@jreback
Copy link
Contributor

jreback commented Jun 10, 2019

lgtm. can you merge master and ping on green.

I'm sorry - what is "ping on green"?
Besides what about type checking (in my questions) - leave it as is? Then, for example, if where = pd.Index([]) - the whole df will be returned as for the empty list.

ping on green - when the CI is green

where=‘’ is the same as where=None so yes return the entire frame

@BeforeFlight
Copy link
Contributor Author

@jreback all is green.

@jreback jreback merged commit f4b6e92 into pandas-dev:master Jun 11, 2019
@jreback
Copy link
Contributor

jreback commented Jun 11, 2019

thanks @BeforeFlight

@BeforeFlight BeforeFlight deleted the HDF_where_26610 branch June 11, 2019 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read from HDF with empty where throws an error
2 participants