Oihana PHP Arango

search

Table of Contents

Functions

analyzer()  : string
Set the Analyzer for a search expression.
bm25()  : string
Score documents with the Best Matching 25 algorithm (Okapi BM25).
boost()  : string
Override the boost value of a search sub-expression.
exists()  : string
Match documents where an attribute is present (optionally of a given type).
inRange()  : string
Match documents where an attribute is within a range (index-accelerated).
levenshteinMatch()  : string
Match documents within a (Damerau-)Levenshtein distance of a target string.
minhashMatch()  : string
Match documents with an approximate Jaccard similarity of at least a threshold.
minMatch()  : string
Match documents where at least a minimum number of search expressions are true.
ngramMatch()  : string
Match documents whose attribute has an n-gram similarity above a threshold.
phrase()  : string
Match documents containing a phrase — tokens in the given order.
tfidf()  : string
Score documents with the term frequency–inverse document frequency algorithm (TF-IDF).

Functions

analyzer()

Set the Analyzer for a search expression.

analyzer(string $expr, string $analyzer) : string

Wraps the ArangoDB AQL function ANALYZER(expr, analyzer), which sets the Analyzer used to evaluate the wrapped search expression and all the nested functions that accept an Analyzer argument (so it does not have to be repeated). A nested function passing its own Analyzer takes precedence. Only applicable to queries against arangosearch Views — with search-alias Views and inverted indexes the Analyzer is inferred from the index definition.

The TOKENS() function is an exception: it always requires its own Analyzer argument, even when wrapped, because it is a regular string function.

Example AQL usage:

ANALYZER(PHRASE(doc.text, "foo") OR PHRASE(doc.text, "bar"), "text_en")
Parameters
$expr : string

Any valid search expression (kept raw).

$analyzer : string

Name of the Analyzer (emitted as a quoted string literal).

Tags
example
use function oihana\arango\db\functions\search\analyzer;
use function oihana\arango\db\functions\search\phrase;

$expr = analyzer( phrase( 'doc.text' , 'quick fox' ) , 'text_en' ) ;
// 'ANALYZER(PHRASE(doc.text,"quick fox"),"text_en")'
see
https://docs.arangodb.com/stable/aql/functions/arangosearch/#analyzer
boost()
since
1.2.0
author

Marc Alcaraz

Return values
string

The formatted AQL expression.

bm25()

Score documents with the Best Matching 25 algorithm (Okapi BM25).

bm25(string $doc[, float|null $k = null ][, float|null $b = null ]) : string

Wraps the ArangoDB AQL scoring function BM25(doc, k, b). The first argument must be the document variable emitted by a FOR … IN viewName operation, and the function can only be used together with a SEARCH. Sort descending by the score to get the most relevant documents first.

AQL arguments are positional: when $b is provided without $k, the helper fills $k with the official server default (1.2). The Analyzers used for indexing must have the "frequency" feature enabled (and "norm" for meaningful length normalization), otherwise the score is 0.

Example AQL usage:

FOR doc IN viewName
  SEARCH ...
  SORT BM25(doc) DESC
  RETURN doc
Parameters
$doc : string

The document variable emitted by FOR … IN viewName (kept raw).

$k : float|null = null

Optional term-frequency calibration, >= 0.0 (server default 1.2; 0 = binary model).

$b : float|null = null

Optional text-length scaling in [0.0, 1.0] (server default 0.75; 1 = BM11, 0 = BM15).

Tags
example
use function oihana\arango\db\functions\search\bm25;

echo bm25( 'doc' ) ;              // 'BM25(doc)'
echo bm25( 'doc' , 2.4 , 1.0 ) ;  // 'BM25(doc,2.4,1)'
echo bm25( 'doc' , b: 0.5 ) ;     // 'BM25(doc,1.2,0.5)'
see
https://docs.arangodb.com/stable/aql/functions/arangosearch/#bm25
tfidf()
boost()
since
1.2.0
author

Marc Alcaraz

Return values
string

The formatted AQL expression.

boost()

Override the boost value of a search sub-expression.

boost(string $expr, float|int $boost) : string

Wraps the ArangoDB AQL function BOOST(expr, boost). The boost value is made available to scorer functions (bm25(), tfidf()) so that matches of the wrapped sub-expression weigh more (or less) in the final score. The default boost of any search context is 1.0.

Example AQL usage:

ANALYZER(BOOST(doc.text == "foo", 2.5) OR doc.text == "bar", "text_en")
Parameters
$expr : string

Any valid search expression (kept raw).

$boost : float|int

Numeric boost value.

Tags
example
use function oihana\arango\db\functions\search\boost;

$expr = boost( 'doc.name == "wood"' , 2.5 ) ;
// 'BOOST(doc.name == "wood",2.5)'
see
https://docs.arangodb.com/stable/aql/functions/arangosearch/#boost
analyzer()
bm25()
since
1.2.0
author

Marc Alcaraz

Return values
string

The formatted AQL expression.

exists()

Match documents where an attribute is present (optionally of a given type).

exists(string $path[, string|null $type = null ][, string|null $analyzer = null ]) : string

Wraps the ArangoDB AQL function EXISTS(path[, type[, analyzer]]):

  • exists('doc.text') — the attribute is present;
  • exists('doc.text', 'string') — present and of the given data type ("null", "bool"/"boolean", "numeric", "type", "string", "analyzer", "nested");
  • exists('doc.text', analyzer: 'text_en') — present and indexed by the given Analyzer; the "analyzer" type literal is filled in automatically.

With arangosearch Views, EXISTS() only matches values if the storeValues link property is set to "id" (default "none").

Example AQL usage:

EXISTS(doc.text)
EXISTS(doc.text, "string")
EXISTS(doc.text, "analyzer", "text_en")
Parameters
$path : string

Attribute path expression to test (kept raw).

$type : string|null = null

Optional data type to test for (emitted as a quoted string literal). Defaults to "analyzer" when $analyzer is provided.

$analyzer : string|null = null

Optional Analyzer name (emitted as a quoted string literal).

Tags
example
use function oihana\arango\db\functions\search\exists;

echo exists( 'doc.text' ) ;                       // 'EXISTS(doc.text)'
echo exists( 'doc.text' , 'string' ) ;            // 'EXISTS(doc.text,"string")'
echo exists( 'doc.text' , analyzer: 'text_en' ) ; // 'EXISTS(doc.text,"analyzer","text_en")'
see
https://docs.arangodb.com/stable/aql/functions/arangosearch/#exists
since
1.2.0
author

Marc Alcaraz

Return values
string

The formatted AQL expression.

inRange()

Match documents where an attribute is within a range (index-accelerated).

inRange(string $path, mixed $low, mixed $high, bool $includeLow, bool $includeHigh) : string

Wraps the ArangoDB AQL function IN_RANGE(path, low, high, includeLow, includeHigh). Inside a SEARCH operation it searches more efficiently than the equivalent pair of comparisons combined with AND; the same function also exists as a miscellaneous function outside of SEARCH (hence the constant living in MiscFunction).

low and high can be numbers or strings, but both must share the same data type. Note that string ranges in SEARCH follow byte order, not the Analyzer locale collation.

Example AQL usage:

IN_RANGE(doc.value, 3, 5, true, true)     // 3 <= value <= 5
IN_RANGE(doc.value, "a", "f", true, false) // "a" <= value < "f"
Parameters
$path : string

Attribute path expression to test (kept raw).

$low : mixed

Minimum value of the range (JSON-encoded: strings are quoted, numbers kept raw).

$high : mixed

Maximum value of the range (JSON-encoded, same data type as $low).

$includeLow : bool

Whether the minimum value is included (left-closed interval).

$includeHigh : bool

Whether the maximum value is included (right-closed interval).

Tags
example
use function oihana\arango\db\functions\search\inRange;

echo inRange( 'doc.value' , 3 , 5 , true , true ) ;
// 'IN_RANGE(doc.value,3,5,true,true)'

echo inRange( 'doc.value' , 'a' , 'f' , true , false ) ;
// 'IN_RANGE(doc.value,"a","f",true,false)'
see
https://docs.arangodb.com/stable/aql/functions/arangosearch/#in_range
since
1.2.0
author

Marc Alcaraz

Return values
string

The formatted AQL expression.

levenshteinMatch()

Match documents within a (Damerau-)Levenshtein distance of a target string.

levenshteinMatch(string $path, string $target, int $distance[, bool|null $transpositions = null ][, int|null $maxTerms = null ][, string|null $prefix = null ]) : string

Wraps the ArangoDB AQL function LEVENSHTEIN_MATCH(path, target, distance, transpositions, maxTerms, prefix). By default a Damerau-Levenshtein distance is computed (transpositions count as one operation); pass transpositions: false for a pure Levenshtein distance. The maximum distance is 4 without transpositions and 3 with them.

AQL arguments are positional: when a later option is provided, the helper fills the earlier omitted ones with the official server defaults (transpositions = true, maxTerms = 64) so callers never need to know them. Trailing omitted options are not emitted at all. When using $prefix, the prefix must be removed from $target (the distance is computed on the remainders — see the official documentation).

Example AQL usage:

LEVENSHTEIN_MATCH(doc.text, "quikc", 2, false)        // pure Levenshtein, matches "quick"
LEVENSHTEIN_MATCH(doc.text, "kc", 1, false, 64, "qui") // prefix search
Parameters
$path : string

Attribute path expression to test (kept raw).

$target : string

String to compare against (emitted as a quoted string literal).

$distance : int

Maximum edit distance: 0…4 if $transpositions is false, 0…3 otherwise.

$transpositions : bool|null = null

Optional — false for a pure Levenshtein distance (server default true).

$maxTerms : int|null = null

Optional — number of most relevant terms to consider, 0 for all (server default 64).

$prefix : string|null = null

Optional — known common prefix (emitted as a quoted string literal); improves performance.

Tags
example
use function oihana\arango\db\functions\search\levenshteinMatch;

echo levenshteinMatch( 'doc.text' , 'quikc' , 1 ) ;
// 'LEVENSHTEIN_MATCH(doc.text,"quikc",1)'

echo levenshteinMatch( 'doc.text' , 'quikc' , 2 , false ) ;
// 'LEVENSHTEIN_MATCH(doc.text,"quikc",2,false)'

echo levenshteinMatch( 'doc.text' , 'kc' , 1 , false , prefix: 'qui' ) ;
// 'LEVENSHTEIN_MATCH(doc.text,"kc",1,false,64,"qui")'
see
https://docs.arangodb.com/stable/aql/functions/arangosearch/#levenshtein_match
since
1.2.0
author

Marc Alcaraz

Return values
string

The formatted AQL expression.

minhashMatch()

Match documents with an approximate Jaccard similarity of at least a threshold.

minhashMatch(string $path, string $target, string $analyzer[, float|null $threshold = null ]) : string

Wraps the ArangoDB AQL function MINHASH_MATCH(path, target, threshold, analyzer). The similarity is approximated with the given minhash Analyzer — an efficient first pass for entity resolution (duplicate detection) before an exact JACCARD() computation.

Argument order notice — in AQL the optional threshold sits before the mandatory analyzer; PHP forbids a required parameter after an optional one, so this helper takes the analyzer third and the optional threshold last, then re-orders the emitted AQL arguments.

Example AQL usage:

MINHASH_MATCH(doc.text, "the quick brown fox", 0.5, "myMinHash")
Parameters
$path : string

Attribute path expression to test (kept raw).

$target : string

String to hash and compare against (emitted as a quoted string literal).

$analyzer : string

Name of the minhash Analyzer (emitted as a quoted string literal).

$threshold : float|null = null

Optional similarity threshold in [0.0, 1.0].

Tags
example
use function oihana\arango\db\functions\search\minhashMatch;

echo minhashMatch( 'doc.text' , 'the quick brown fox' , 'myMinHash' , 0.5 ) ;
// 'MINHASH_MATCH(doc.text,"the quick brown fox",0.5,"myMinHash")'

echo minhashMatch( 'doc.text' , 'the quick brown fox' , 'myMinHash' ) ;
// 'MINHASH_MATCH(doc.text,"the quick brown fox","myMinHash")'
see
https://docs.arangodb.com/stable/aql/functions/arangosearch/#minhash_match
ngramMatch()
since
1.2.0
author

Marc Alcaraz

Return values
string

The formatted AQL expression.

minMatch()

Match documents where at least a minimum number of search expressions are true.

minMatch(array<string|int, mixed> $expressions, int $minMatchCount) : string

Wraps the variadic ArangoDB AQL function MIN_MATCH(expr1, ... exprN, minMatchCount). Inside a SEARCH operation it is index-accelerated; the same function also exists as a miscellaneous function outside of SEARCH (hence the constant living in MiscFunction).

Example AQL usage:

MIN_MATCH(doc.text == "quick", doc.text == "brown", doc.text == "fox", 2)
Parameters
$expressions : array<string|int, mixed>

The candidate search expressions (kept raw).

$minMatchCount : int

Minimum number of expressions that must be satisfied.

Tags
example
use function oihana\arango\db\functions\search\minMatch;

$expr = minMatch( [ 'doc.text == "quick"' , 'doc.text == "brown"' , 'doc.text == "fox"' ] , 2 ) ;
// 'MIN_MATCH(doc.text == "quick",doc.text == "brown",doc.text == "fox",2)'
see
https://docs.arangodb.com/stable/aql/functions/arangosearch/#min_match
since
1.2.0
author

Marc Alcaraz

Return values
string

The formatted AQL expression.

ngramMatch()

Match documents whose attribute has an n-gram similarity above a threshold.

ngramMatch(string $path, string $target, string $analyzer[, float|null $threshold = null ]) : string

Wraps the ArangoDB AQL function NGRAM_MATCH(path, target, threshold, analyzer). The n-grams of both the attribute and the target are produced by the given Analyzer (use an ngram Analyzer with preserveOriginal: false and min equal to max, with the "position" and "frequency" features enabled).

Argument order notice — in AQL the optional threshold sits before the mandatory analyzer; PHP forbids a required parameter after an optional one, so this helper takes the analyzer third and the optional threshold last, then re-orders the emitted AQL arguments. When $threshold is null the three-argument AQL form is emitted (server default: 0.7).

Example AQL usage:

NGRAM_MATCH(doc.text, "quick fox", "bigram")           // threshold defaults to 0.7
NGRAM_MATCH(doc.text, "quick blue fox", 0.4, "bigram")
Parameters
$path : string

Attribute path expression to test (kept raw).

$target : string

String to compare against (emitted as a quoted string literal).

$analyzer : string

Name of the ngram Analyzer (emitted as a quoted string literal).

$threshold : float|null = null

Optional similarity threshold in [0.0, 1.0] (server default 0.7).

Tags
example
use function oihana\arango\db\functions\search\ngramMatch;

echo ngramMatch( 'doc.text' , 'quick fox' , 'bigram' ) ;
// 'NGRAM_MATCH(doc.text,"quick fox","bigram")'

echo ngramMatch( 'doc.text' , 'quick blue fox' , 'bigram' , 0.4 ) ;
// 'NGRAM_MATCH(doc.text,"quick blue fox",0.4,"bigram")'
see
https://docs.arangodb.com/stable/aql/functions/arangosearch/#ngram_match
minhashMatch()
since
1.2.0
author

Marc Alcaraz

Return values
string

The formatted AQL expression.

phrase()

Match documents containing a phrase — tokens in the given order.

phrase(string $path, string|array<string|int, mixed> $phrase[, string|null $analyzer = null ]) : string

Wraps the ArangoDB AQL function PHRASE(path, phrasePart, analyzer). The phrase can be:

  • a string — the simple form, emitted as a quoted string literal;
  • an array — the advanced AQL array form, emitted with json_encode, mirroring the official syntax one-to-one: string tokens are quoted, integers act as skipTokens wildcards, and associative arrays become object tokens ({IN_RANGE: …}, {LEVENSHTEIN_MATCH: …}, {STARTS_WITH: …}, {TERM: …}, {TERMS: …}, {WILDCARD: …}).

The Analyzer must have the "position" and "frequency" features enabled, otherwise PHRASE() finds nothing. When $analyzer is omitted, the Analyzer of a wrapping analyzer() call applies (default "identity").

Example AQL usage:

PHRASE(doc.text, "quick fox", "text_en")
PHRASE(doc.text, ["ipsum", 2, "amet"], "text_en")              // 2 wildcard tokens between
PHRASE(doc.text, ["lorem", {STARTS_WITH: ["ips"]}], "text_en") // prefix object token
Parameters
$path : string

Attribute path expression to test (kept raw).

$phrase : string|array<string|int, mixed>

The phrase: a plain string, or the AQL array form (tokens, skipTokens numbers, object tokens).

$analyzer : string|null = null

Optional Analyzer name (emitted as a quoted string literal).

Tags
example
use function oihana\arango\db\functions\search\phrase;

echo phrase( 'doc.text' , 'quick fox' , 'text_en' ) ;
// 'PHRASE(doc.text,"quick fox","text_en")'

echo phrase( 'doc.text' , [ 'ipsum' , 2 , 'amet' ] , 'text_en' ) ;
// 'PHRASE(doc.text,["ipsum",2,"amet"],"text_en")'

echo phrase( 'doc.text' , [ 'lorem' , [ 'STARTS_WITH' => [ 'ips' ] ] ] , 'text_en' ) ;
// 'PHRASE(doc.text,["lorem",{"STARTS_WITH":["ips"]}],"text_en")'
see
https://docs.arangodb.com/stable/aql/functions/arangosearch/#phrase
analyzer()
since
1.2.0
author

Marc Alcaraz

Return values
string

The formatted AQL expression.

tfidf()

Score documents with the term frequency–inverse document frequency algorithm (TF-IDF).

tfidf(string $doc[, bool|null $normalize = null ]) : string

Wraps the ArangoDB AQL scoring function TFIDF(doc, normalize). The first argument must be the document variable emitted by a FOR … IN viewName operation, and the function can only be used together with a SEARCH. Sort descending by the score to get the most relevant documents first.

Example AQL usage:

FOR doc IN viewName
  SEARCH ...
  SORT TFIDF(doc) DESC
  RETURN doc
Parameters
$doc : string

The document variable emitted by FOR … IN viewName (kept raw).

$normalize : bool|null = null

Optional — whether to normalize the score (server default false).

Tags
example
use function oihana\arango\db\functions\search\tfidf;

echo tfidf( 'doc' ) ;        // 'TFIDF(doc)'
echo tfidf( 'doc' , true ) ; // 'TFIDF(doc,true)'
see
https://docs.arangodb.com/stable/aql/functions/arangosearch/#tfidf
bm25()
since
1.2.0
author

Marc Alcaraz

Return values
string

The formatted AQL expression.

On this page

Search results