Skip to content

optmizaton for LRUStoreCache.__contains__ #447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

shikharsg
Copy link
Contributor

Closes #295.

As mentioned in #295, LRUStoreCache.__contains__ made a call to self._keys, which can take long, if for example we have an array with terabytes of data and millions of keys stored on cloud, the call to self._keys will first list all those keys and cache them. This PR removes the self._keys call from __contains__. The call to self._store.keys() still exists in the self.keys method and keys will still be cached, but just not when the __contains__ method is called.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • Docs build locally (e.g., run tox -e docs)
  • AppVeyor and Travis CI passes
  • Test coverage is 100% (Coveralls passes)

@shikharsg shikharsg requested a review from alimanfoo June 30, 2019 22:24
@pep8speaks
Copy link

pep8speaks commented Jun 30, 2019

Hello @shikharsg! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-30 23:02:42 UTC

@shikharsg shikharsg requested a review from a team June 30, 2019 22:25
@martindurant
Copy link
Member

My immediate thought here: what happens if the underlying store has changed since the last caching of the keys?

@shikharsg
Copy link
Contributor Author

shikharsg commented Jun 30, 2019

My immediate thought here: what happens if the underlying store has changed since the last caching of the keys?

If you see the __delitem__ and the __setitem__ methods of LRUStoreCache, if they are called, the cache is invalidated. Are you talking about the case when some other user of the underlying store, changes the store since the last caching of the keys? In which case I guess that was a problem even before this PR right?

Or is there some special case that I have missed?

@martindurant
Copy link
Member

OK, OK, I got what this is doing - sorry I didn't immediately click in - this is for calls on __contains__ of the target store, where you haven't explicitly listed keys.

Is there a case for having the contains_cache be size-limited, since we are thinking there may be a very large number of keys?

I am OK with the implementation, but I haven't looked at the tests yet.

(the point about the target storage changing from some other process does stand, but is general to this caching mechanism, so not relevant to this particular change)

@shikharsg
Copy link
Contributor Author

I initially did think of limiting the size of contains_cache, but then if you see LRUStoreCache, the keys method is still being cached, just not through the contains methods, but through the keys method, i.e. if a call to _keys is made, ALL the keys will be cached. So I was wondering if it makes sense to put a size limit on the contains cache, when there is no size limit for keys cache. I guess we could:

  1. put a size limit on both keys cache and contains cache(the default size for both being unlimited to maintain backwards compatibility)
  2. put a size limit on contains cache and completely remove keys caching

I like the 2nd one, but it would break backwards compatibility so actually I'm not sure which one of the two options we should do.

@joshmoore
Copy link
Member

Hi @shikharsg. Just going through open issues for a review and found this linked from #295. Any thoughts from your side?

@jhamman
Copy link
Member

jhamman commented Dec 7, 2023

This has unfortunately gone stale. Seems like it was quite close. Will it be revived or should we close this?

@jhamman jhamman added the stale label Dec 7, 2023
@jhamman
Copy link
Member

jhamman commented Feb 13, 2024

Closing this out as stale. We'll keep the feature request open.

@jhamman jhamman closed this Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimisation for the __contains__ method of storage.LRUStoreCache
5 participants