Implement a new pilosa (server-less) index driver #302
Description
So far we have pilosa index driver which uses pilosa client library github.com/pilosa/go-pilosa
. This is http client which requires running pilosa as a external http service.
Now, it's possible to use github.com/pilosa/pilosa
as a library. Some working (trash-code) prototype you can find here: https://github.com/kuba--/go-mysql-server/tree/noserver-pilosa/sql/index/pilosa
Instead of refactor the current pilosa driver I propose to create a new one (pilosalib) but to keep the same functionality. Having a new driver let us compare results and performance and when we're ready we can just switch to the new driver and get rid of the old one.
Pilosa as a library will create a new files and directories inside a root
folder (passed to the NewDriver
function). Because of that and to avoid overwriting and loading issues across drivers, we'll have to refactor an index folder structure.
I suggest to add an extra directory (DriverID
) under the root
folder, so each driver will have a own space, mapping, config files (if needed) and processing file, e.g.:
[root]
|- [pilosalib]
| |- [db]
| |-[table]
| |- id.map #mapping file
| |- id.cfg # config file is optional for pilosalib
| |- [idx-sha1(id, expressions)] # pilosa folder
| |- [idx-sha1(id, expressions)] # pilosa folder
|
|- [pilosa]
|- [db]
|- [table]
|- id.map # mapping file
|- id.cfg # config file
|- id.lock # optional lock/processing file
In other words, driver will create only following folders under the root
: driver_id/db/table
.
Mapping file will be renamed to index_id.map
.
Config file will be renamed to index_id.cfg
.
Processing/Lock file will be renamed to index_id.lock
All other potential subfolders may be created by thirdparties. For instance pilosalib creates following substructure per index:
├── i-d9b85cddd6ac716f0326c32c7ba4bd9ae2aeb558
│ ├── f-daef88b79b2d1e52c779c70d4aa814546a1b10c2
│ │ └── views
│ │ └── standard
│ │ └── fragments
│ │ ├── 0
│ │ └── 0.cache
where i-d9b85cddd6ac716f0326c32c7ba4bd9ae2aeb558
is an example index name, and f-daef88b79b2d1e52c779c70d4aa814546a1b10c2
is an example field name.
Caveats
-
mapping: Pilosa'a API allows to set up column and row attributes as
map[uin64]interface{}
, but it will require to implement own storage to satisfyStoreAttr
interface or we can try to reuse pilosa's internal boltdb cache implementation. The second approach sounds in theory promising, but our current mapping also relies on boltdb, is more customised and optimised for our needs and gives us more independence (instead of tightly couple with pilosa). Moreover, we don't have to change this component, just reuse it in a new driver.
So in the first step we don't need to change mapping. In the next step we can think to use column attributes to storecolID -> location
mapping, but forvalue -> rowID
we may still use external mapping. -
versioning: The latest pilosa release v1.0.1 has some bugs which are already fixed on the master branch. We can start implementation by vendoring the master and later switch to v1.0.2 release