Skip to content
This repository was archived by the owner on Jan 28, 2021. It is now read-only.
This repository was archived by the owner on Jan 28, 2021. It is now read-only.

Implement a new pilosa (server-less) index driver  #302

Closed
@kuba--

Description

@kuba--

So far we have pilosa index driver which uses pilosa client library github.com/pilosa/go-pilosa. This is http client which requires running pilosa as a external http service.

Now, it's possible to use github.com/pilosa/pilosa as a library. Some working (trash-code) prototype you can find here: https://github.com/kuba--/go-mysql-server/tree/noserver-pilosa/sql/index/pilosa

Instead of refactor the current pilosa driver I propose to create a new one (pilosalib) but to keep the same functionality. Having a new driver let us compare results and performance and when we're ready we can just switch to the new driver and get rid of the old one.

Pilosa as a library will create a new files and directories inside a root folder (passed to the NewDriver function). Because of that and to avoid overwriting and loading issues across drivers, we'll have to refactor an index folder structure.
I suggest to add an extra directory (DriverID) under the root folder, so each driver will have a own space, mapping, config files (if needed) and processing file, e.g.:

[root]
|- [pilosalib]
|    |- [db]
|        |-[table]
|           |- id.map #mapping file
|           |- id.cfg # config file is optional for pilosalib
|           |- [idx-sha1(id, expressions)] # pilosa folder   
|           |- [idx-sha1(id, expressions)] # pilosa folder
|
|- [pilosa]
     |- [db]
     |- [table]
         |- id.map # mapping file
         |- id.cfg # config file
         |- id.lock # optional lock/processing file 

In other words, driver will create only following folders under the root: driver_id/db/table.
Mapping file will be renamed to index_id.map.
Config file will be renamed to index_id.cfg.
Processing/Lock file will be renamed to index_id.lock

All other potential subfolders may be created by thirdparties. For instance pilosalib creates following substructure per index:

├── i-d9b85cddd6ac716f0326c32c7ba4bd9ae2aeb558
│       ├── f-daef88b79b2d1e52c779c70d4aa814546a1b10c2
│       │       └── views
│       │                 └── standard
│       │                 └── fragments
│       │                           ├── 0
│       │                           └── 0.cache

where i-d9b85cddd6ac716f0326c32c7ba4bd9ae2aeb558 is an example index name, and f-daef88b79b2d1e52c779c70d4aa814546a1b10c2 is an example field name.

Caveats

  • mapping: Pilosa'a API allows to set up column and row attributes as map[uin64]interface{}, but it will require to implement own storage to satisfy StoreAttr interface or we can try to reuse pilosa's internal boltdb cache implementation. The second approach sounds in theory promising, but our current mapping also relies on boltdb, is more customised and optimised for our needs and gives us more independence (instead of tightly couple with pilosa). Moreover, we don't have to change this component, just reuse it in a new driver.
    So in the first step we don't need to change mapping. In the next step we can think to use column attributes to store colID -> location mapping, but for value -> rowID we may still use external mapping.

  • versioning: The latest pilosa release v1.0.1 has some bugs which are already fixed on the master branch. We can start implementation by vendoring the master and later switch to v1.0.2 release

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformancePerformance improvementsproposalproposal for new additions or changes

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions