Closed
Description
With a growing number of databases now supporting vector data (array of floating-point numbers or quantized int8 (8-bit-integer)) we should explore introducing a Vector
abstraction.
This is mainly to simplify declaration, portability, and default storage options.
In a domain model, one could declare:
class Article {
List<Double> embedding;
List<Float> embedding;
double[] embedding;
Double[] embedding;
CqlVector embedding; // Cassandra
Vector embedding; // MongoDB
}
By using store-specific types, a domain type becomes no longer portable across databases. We aim to provide an answer for the following questions:
- What is the ideal property type to declare a vector?
- How to persist (configure?) the vector if the underlying store provides various storage options?
- How to handle vector data efficiently and optimize for zero-copy and address mutability issues?