Skip to content

Improve performance of very large collections - count expensive (and inaccurate, atm) #290

Open
@joepio

Description

@joepio

Two problems:

  • Counts are slow at the moment
  • Counts do not check authorization

The /commits collection is almost 4000 resources big. Getting a page takes about 200ms. I think the culprit is the counts field, because the server iterates over all resources that are present. Basically, it's performing 4000 read operations.

We could choose not to return the count field, but that would also mean that we can't let the client know how many pages there are. So no count, no max_page.

Also, this count does not take into account the include_external filter.

Also, the count field might give malicious users means to find out whether a resource that they do not have access to has some attribute, by performing multiple queries, at multiple moments in time, and checking if the count increased.

How can we solve this?

Don't have a count

Simple for the server, but it would mean that the Client needs to make assumptions on pagination. Does a next page exist, for example? We need to change the collection model for doing this.

Hope that sled has some clever method for counting

But I would not count (ha-ha) on this being possible.

Keep track of the count per collection

Add a new key to the query_index with a shape like QueryObject - count - {number}.
Every time an atom is added or removed, increment or decrement this count.
Makes updating an atom a little slower, and might become out of sync.

Limit pagination count

We can stop counting after, say, 10 extra pages. The count is maxed out if it's higher than that.

Authorized Query counts

  • Update an index value every time a filter matches an atom, and re-calculate the auth... For every user
  • This costs a lot of resources, probably
  • We could run this periodically

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions