Oihana PHP Arango

aqlVectorSearch.php

Table of Contents

Functions

aqlVectorSearch()  : string
Builds a complete AQL approximate nearest-neighbour (ANN) query over a vector index.

Functions

aqlVectorSearch()

Builds a complete AQL approximate nearest-neighbour (ANN) query over a vector index.

aqlVectorSearch(string $collection, string $attribute, string $vector, int $limit[, string $metric = VectorMetric::COSINE ][, int|null $nProbe = null ][, string $docRef = 'doc' ][, string|null $return = null ]) : string

The generated query follows the canonical ANN form:

FOR <docRef> IN <collection>
  SORT APPROX_NEAR_<METRIC>(<docRef>.<attribute>, <vector>) <ASC|DESC>
  LIMIT <limit>
  RETURN <return>

The $metric selects both the AQL function and the sort direction, which is the part developers get wrong most often:

  • 'cosine'APPROX_NEAR_COSINE sorted DESC (closer to 1 is nearer),
  • 'l2'APPROX_NEAR_L2 sorted ASC (closer to 0 is nearer).

The metric must match the metric of the VectorIndex covering $attribute, otherwise the optimiser cannot accelerate the query.

Requires ArangoDB started with the experimental vector index feature.

Example: cosine search with a bound query vector

use function oihana\arango\db\operations\aqlVectorSearch;

$aql = aqlVectorSearch
(
    collection : 'items' ,
    attribute  : 'embedding' ,
    vector     : '@query' ,
    limit      : 10 ,
) ;
// FOR doc IN items SORT APPROX_NEAR_COSINE(doc.embedding,@query) DESC LIMIT 10 RETURN doc

Example: L2 search, custom nProbe, projection and iteration variable

$aql = aqlVectorSearch
(
    collection : 'items' ,
    attribute  : 'embedding' ,
    vector     : '@query' ,
    limit      : 5 ,
    metric     : 'l2' ,
    nProbe     : 20 ,
    docRef     : 'd' ,
    return     : '{ key: d._key, score: APPROX_NEAR_L2(d.embedding, @query) }' ,
) ;
// FOR d IN items SORT APPROX_NEAR_L2(d.embedding,@query,{"nProbe":20}) ASC LIMIT 5
//   RETURN { key: d._key, score: APPROX_NEAR_L2(d.embedding, @query) }
Parameters
$collection : string

The collection to scan (or any AQL iterable expression).

$attribute : string

The document attribute holding the indexed vector (e.g. 'embedding').

$vector : string

The query vector — typically a bind placeholder ('@query') or an AQL array literal.

$limit : int

The number of nearest neighbours to return (the LIMIT).

$metric : string = VectorMetric::COSINE

The similarity metric: 'cosine' (default) or 'l2'. Must match the vector index.

$nProbe : int|null = null

Optional number of neighbouring centroids to probe (higher = more accurate, slower).

$docRef : string = 'doc'

The iteration variable name (default 'doc').

$return : string|null = null

Optional RETURN expression. Defaults to the iteration variable (the whole document).

Tags
throws
InvalidArgumentException

If $metric is neither 'cosine' nor 'l2'.

see
https://docs.arangodb.com/stable/aql/functions/vector/
approxNearCosine()
approxNearL2()
since
1.1.0
author

Marc Alcaraz

Return values
string

The complete AQL ANN query.

On this page

Search results