Oihana PHP Arango

FaithParam uses ConstantsTrait

The enumeration of the Faiss Library params to use in the "params" option in the vector index definitions.

Tags
see
https://docs.arango.ai/arangodb/stable/develop/http-api/indexes/vector/

Table of Contents

Constants

DEFAULT_N_PROBE  : string = 'defaultNProbe'
How many neighboring centroids to consider for the search results by default.
DIMENSION  : string = 'dimension'
The vector dimension.
FACTORY  : string = 'factory'
You can specify an index factory string that is forwarded to the underlying Faiss library, allowing you to combine different advanced options.
METRIC  : string = 'metric'
Possible values: "cosine", "innerProduct", "l2"
N_LISTS  : string = 'nLists'
The number of Voronoi cells to partition the vector space into, respectively the number of centroids in the index.
TRAINING_ITERATIONS  : string = 'trainingIterations'
The number of iterations in the training process. The default is 25.

Constants

DEFAULT_N_PROBE

How many neighboring centroids to consider for the search results by default.

public string DEFAULT_N_PROBE = 'defaultNProbe'

The larger the number, the slower the search but the better the search results.

The default is 1. You should generally use a higher value here or per query via the nProbe option of the vector similarity functions.

DIMENSION

The vector dimension.

public string DIMENSION = 'dimension'

The attribute to index needs to have this many elements in the array that stores the vector embedding.

FACTORY

You can specify an index factory string that is forwarded to the underlying Faiss library, allowing you to combine different advanced options.

public string FACTORY = 'factory'

Examples:

  • "IVF100_HNSW10,Flat"
  • "IVF100,SQ4"
  • "IVF10_HNSW5,Flat"
  • "IVF100_HNSW5,PQ256x16"

The base index must be an inverted file (IVF) to work with ArangoDB. If you don’t specify an index factory, the value is equivalent to IVF<nLists>,Flat.

For more information on how to create these custom indexes, see the Faiss Wiki.

Tags
see
https://github.com/facebookresearch/faiss/wiki/The-index-factory

METRIC

Possible values: "cosine", "innerProduct", "l2"

public string METRIC = 'metric'

The measure for calculating the vector similarity:

"cosine": Angular similarity. Vectors are automatically normalized before insertion and search. "innerProduct" (introduced in v3.12.6): Similarity in terms of angle and magnitude.

Vectors are not normalized, making it faster than cosine. "l2": Euclidean distance.

N_LISTS

The number of Voronoi cells to partition the vector space into, respectively the number of centroids in the index.

public string N_LISTS = 'nLists'

What value to choose depends on the data distribution and chosen metric.

According to The Faiss library paper , it should be around 15 * sqrt(N) where N is the number of documents in the collection, respectively the number of documents in the shard for cluster deployments.

A bigger value produces more correct results but increases the training time and thus how long it takes to build the index. It cannot be bigger than the number of documents.

Tags
see
https://arxiv.org/abs/2401.08281

TRAINING_ITERATIONS

The number of iterations in the training process. The default is 25.

public string TRAINING_ITERATIONS = 'trainingIterations'

Smaller values lead to a faster index creation but may yield worse search results.

On this page

Search results