-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REF: make coordinates not a state variable in io.pytables #29805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1634,7 +1634,6 @@ def __init__( | |
self.start = start | ||
self.stop = stop | ||
|
||
self.coordinates = None | ||
if iterator or chunksize is not None: | ||
if chunksize is None: | ||
chunksize = 100000 | ||
|
@@ -1644,14 +1643,12 @@ def __init__( | |
|
||
self.auto_close = auto_close | ||
|
||
def __iter__(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so there is an open issue to have this just subclass BaseIterator which should preserve the correct iterator behavior. I would rather fix it that way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see the issue you're referring to, but before long I'll do a dedicated look through the HDF5 issues to see what can be closed. It isn't clear that subclassing BaseIterator would affect the statefulness that this PR is addressing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. umm, not those. but in any event i think if you subclass BaseIterator this will just work There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you mean collections.abc.Iterator? that would mean we'd have to define There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is a non-standard way of doing things, it needs to inheirt from https://github.com/pandas-dev/pandas/blob/master/pandas/io/common.py#L81 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the link. If I'm reading you right, you are adamant that |
||
|
||
# iterate | ||
def iter_result(self, coordinates): | ||
current = self.start | ||
while current < self.stop: | ||
|
||
stop = min(current + self.chunksize, self.stop) | ||
value = self.func(None, None, self.coordinates[current:stop]) | ||
value = self.func(None, None, coordinates[current:stop]) | ||
current = stop | ||
if value is None or not len(value): | ||
continue | ||
|
@@ -1671,9 +1668,8 @@ def get_result(self, coordinates: bool = False): | |
if not self.s.is_table: | ||
raise TypeError("can only use an iterator or chunksize on a table") | ||
|
||
self.coordinates = self.s.read_coordinates(where=self.where) | ||
|
||
return self | ||
coordinates = self.s.read_coordinates(where=self.where) | ||
return self.iter_result(coordinates) | ||
|
||
# if specified read via coordinates (necessary for multiple selections | ||
if coordinates: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,7 +52,6 @@ | |
) | ||
|
||
from pandas.io import pytables as pytables # noqa: E402 isort:skip | ||
from pandas.io.pytables import TableIterator # noqa: E402 isort:skip | ||
|
||
|
||
_default_compressor = "blosc" | ||
|
@@ -4528,10 +4527,8 @@ def test_read_hdf_iterator(self, setup_path): | |
df.to_hdf(path, "df", mode="w", format="t") | ||
direct = read_hdf(path, "df") | ||
iterator = read_hdf(path, "df", iterator=True) | ||
assert isinstance(iterator, TableIterator) | ||
indirect = next(iterator.__iter__()) | ||
tm.assert_frame_equal(direct, indirect) | ||
iterator.store.close() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think its a problem not closing the iterator; the fact that you had to change this is suspicious There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
def test_read_hdf_errors(self, setup_path): | ||
df = DataFrame(np.random.rand(4, 5), index=list("abcd"), columns=list("ABCDE")) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm on board with the goal, but is this really what we want to do? Wouldn't this mean that the TableIterator class is actually no longer an iterator by definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess. In the case that doesn't go through 1674-1675 below TableIterator is already not really an iterator, so in that sense this makes the name consistently inaccuraet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha sounds good. I'll defer to others more familiar with this code