search
Table of Contents
Functions
- analyzer() : string
- Set the Analyzer for a search expression.
- bm25() : string
- Score documents with the Best Matching 25 algorithm (Okapi BM25).
- boost() : string
- Override the boost value of a search sub-expression.
- exists() : string
- Match documents where an attribute is present (optionally of a given type).
- inRange() : string
- Match documents where an attribute is within a range (index-accelerated).
- levenshteinMatch() : string
- Match documents within a (Damerau-)Levenshtein distance of a target string.
- minhashMatch() : string
- Match documents with an approximate Jaccard similarity of at least a threshold.
- minMatch() : string
- Match documents where at least a minimum number of search expressions are true.
- ngramMatch() : string
- Match documents whose attribute has an n-gram similarity above a threshold.
- phrase() : string
- Match documents containing a phrase — tokens in the given order.
- tfidf() : string
- Score documents with the term frequency–inverse document frequency algorithm (TF-IDF).
Functions
analyzer()
Set the Analyzer for a search expression.
analyzer(string $expr, string $analyzer) : string
Wraps the ArangoDB AQL function ANALYZER(expr, analyzer), which sets the
Analyzer used to evaluate the wrapped search expression and all the
nested functions that accept an Analyzer argument (so it does not have to be
repeated). A nested function passing its own Analyzer takes precedence.
Only applicable to queries against arangosearch Views — with search-alias
Views and inverted indexes the Analyzer is inferred from the index definition.
The TOKENS() function is an exception: it always requires its own Analyzer
argument, even when wrapped, because it is a regular string function.
Example AQL usage:
ANALYZER(PHRASE(doc.text, "foo") OR PHRASE(doc.text, "bar"), "text_en")
Parameters
- $expr : string
-
Any valid search expression (kept raw).
- $analyzer : string
-
Name of the Analyzer (emitted as a quoted string literal).
Tags
Return values
string —The formatted AQL expression.
bm25()
Score documents with the Best Matching 25 algorithm (Okapi BM25).
bm25(string $doc[, float|null $k = null ][, float|null $b = null ]) : string
Wraps the ArangoDB AQL scoring function BM25(doc, k, b). The first argument
must be the document variable emitted by a FOR … IN viewName operation, and
the function can only be used together with a SEARCH. Sort descending
by the score to get the most relevant documents first.
AQL arguments are positional: when $b is provided without $k, the helper
fills $k with the official server default (1.2). The Analyzers used for
indexing must have the "frequency" feature enabled (and "norm" for
meaningful length normalization), otherwise the score is 0.
Example AQL usage:
FOR doc IN viewName
SEARCH ...
SORT BM25(doc) DESC
RETURN doc
Parameters
- $doc : string
-
The document variable emitted by
FOR … IN viewName(kept raw). - $k : float|null = null
-
Optional term-frequency calibration,
>= 0.0(server default1.2;0= binary model). - $b : float|null = null
-
Optional text-length scaling in
[0.0, 1.0](server default0.75;1= BM11,0= BM15).
Tags
Return values
string —The formatted AQL expression.
boost()
Override the boost value of a search sub-expression.
boost(string $expr, float|int $boost) : string
Wraps the ArangoDB AQL function BOOST(expr, boost). The boost value is made
available to scorer functions (bm25(), tfidf()) so that matches
of the wrapped sub-expression weigh more (or less) in the final score. The
default boost of any search context is 1.0.
Example AQL usage:
ANALYZER(BOOST(doc.text == "foo", 2.5) OR doc.text == "bar", "text_en")
Parameters
- $expr : string
-
Any valid search expression (kept raw).
- $boost : float|int
-
Numeric boost value.
Tags
Return values
string —The formatted AQL expression.
exists()
Match documents where an attribute is present (optionally of a given type).
exists(string $path[, string|null $type = null ][, string|null $analyzer = null ]) : string
Wraps the ArangoDB AQL function EXISTS(path[, type[, analyzer]]):
exists('doc.text')— the attribute is present;exists('doc.text', 'string')— present and of the given data type ("null","bool"/"boolean","numeric","type","string","analyzer","nested");exists('doc.text', analyzer: 'text_en')— present and indexed by the given Analyzer; the"analyzer"type literal is filled in automatically.
With arangosearch Views, EXISTS() only matches values if the
storeValues link property is set to "id" (default "none").
Example AQL usage:
EXISTS(doc.text)
EXISTS(doc.text, "string")
EXISTS(doc.text, "analyzer", "text_en")
Parameters
- $path : string
-
Attribute path expression to test (kept raw).
- $type : string|null = null
-
Optional data type to test for (emitted as a quoted string literal). Defaults to
"analyzer"when$analyzeris provided. - $analyzer : string|null = null
-
Optional Analyzer name (emitted as a quoted string literal).
Tags
Return values
string —The formatted AQL expression.
inRange()
Match documents where an attribute is within a range (index-accelerated).
inRange(string $path, mixed $low, mixed $high, bool $includeLow, bool $includeHigh) : string
Wraps the ArangoDB AQL function
IN_RANGE(path, low, high, includeLow, includeHigh). Inside a SEARCH
operation it searches more efficiently than the equivalent pair of
comparisons combined with AND; the same function also exists as a
miscellaneous function outside of SEARCH (hence the constant living in
MiscFunction).
low and high can be numbers or strings, but both must share the same
data type. Note that string ranges in SEARCH follow byte order, not the
Analyzer locale collation.
Example AQL usage:
IN_RANGE(doc.value, 3, 5, true, true) // 3 <= value <= 5
IN_RANGE(doc.value, "a", "f", true, false) // "a" <= value < "f"
Parameters
- $path : string
-
Attribute path expression to test (kept raw).
- $low : mixed
-
Minimum value of the range (JSON-encoded: strings are quoted, numbers kept raw).
- $high : mixed
-
Maximum value of the range (JSON-encoded, same data type as
$low). - $includeLow : bool
-
Whether the minimum value is included (left-closed interval).
- $includeHigh : bool
-
Whether the maximum value is included (right-closed interval).
Tags
Return values
string —The formatted AQL expression.
levenshteinMatch()
Match documents within a (Damerau-)Levenshtein distance of a target string.
levenshteinMatch(string $path, string $target, int $distance[, bool|null $transpositions = null ][, int|null $maxTerms = null ][, string|null $prefix = null ]) : string
Wraps the ArangoDB AQL function
LEVENSHTEIN_MATCH(path, target, distance, transpositions, maxTerms, prefix).
By default a Damerau-Levenshtein distance is computed (transpositions count
as one operation); pass transpositions: false for a pure Levenshtein distance.
The maximum distance is 4 without transpositions and 3 with them.
AQL arguments are positional: when a later option is provided, the helper
fills the earlier omitted ones with the official server defaults
(transpositions = true, maxTerms = 64) so callers never need to know them.
Trailing omitted options are not emitted at all. When using $prefix, the
prefix must be removed from $target (the distance is computed on the
remainders — see the official documentation).
Example AQL usage:
LEVENSHTEIN_MATCH(doc.text, "quikc", 2, false) // pure Levenshtein, matches "quick"
LEVENSHTEIN_MATCH(doc.text, "kc", 1, false, 64, "qui") // prefix search
Parameters
- $path : string
-
Attribute path expression to test (kept raw).
- $target : string
-
String to compare against (emitted as a quoted string literal).
- $distance : int
-
Maximum edit distance:
0…4if$transpositionsisfalse,0…3otherwise. - $transpositions : bool|null = null
-
Optional —
falsefor a pure Levenshtein distance (server defaulttrue). - $maxTerms : int|null = null
-
Optional — number of most relevant terms to consider,
0for all (server default64). - $prefix : string|null = null
-
Optional — known common prefix (emitted as a quoted string literal); improves performance.
Tags
Return values
string —The formatted AQL expression.
minhashMatch()
Match documents with an approximate Jaccard similarity of at least a threshold.
minhashMatch(string $path, string $target, string $analyzer[, float|null $threshold = null ]) : string
Wraps the ArangoDB AQL function MINHASH_MATCH(path, target, threshold, analyzer).
The similarity is approximated with the given minhash Analyzer — an
efficient first pass for entity resolution (duplicate detection) before an
exact JACCARD() computation.
Argument order notice — in AQL the optional threshold sits before the
mandatory analyzer; PHP forbids a required parameter after an optional one,
so this helper takes the analyzer third and the optional threshold
last, then re-orders the emitted AQL arguments.
Example AQL usage:
MINHASH_MATCH(doc.text, "the quick brown fox", 0.5, "myMinHash")
Parameters
- $path : string
-
Attribute path expression to test (kept raw).
- $target : string
-
String to hash and compare against (emitted as a quoted string literal).
- $analyzer : string
-
Name of the
minhashAnalyzer (emitted as a quoted string literal). - $threshold : float|null = null
-
Optional similarity threshold in
[0.0, 1.0].
Tags
Return values
string —The formatted AQL expression.
minMatch()
Match documents where at least a minimum number of search expressions are true.
minMatch(array<string|int, mixed> $expressions, int $minMatchCount) : string
Wraps the variadic ArangoDB AQL function
MIN_MATCH(expr1, ... exprN, minMatchCount). Inside a SEARCH operation it
is index-accelerated; the same function also exists as a miscellaneous
function outside of SEARCH (hence the constant living in
MiscFunction).
Example AQL usage:
MIN_MATCH(doc.text == "quick", doc.text == "brown", doc.text == "fox", 2)
Parameters
- $expressions : array<string|int, mixed>
-
The candidate search expressions (kept raw).
- $minMatchCount : int
-
Minimum number of expressions that must be satisfied.
Tags
Return values
string —The formatted AQL expression.
ngramMatch()
Match documents whose attribute has an n-gram similarity above a threshold.
ngramMatch(string $path, string $target, string $analyzer[, float|null $threshold = null ]) : string
Wraps the ArangoDB AQL function NGRAM_MATCH(path, target, threshold, analyzer).
The n-grams of both the attribute and the target are produced by the given
Analyzer (use an ngram Analyzer with preserveOriginal: false and min
equal to max, with the "position" and "frequency" features enabled).
Argument order notice — in AQL the optional threshold sits before the
mandatory analyzer; PHP forbids a required parameter after an optional one,
so this helper takes the analyzer third and the optional threshold
last, then re-orders the emitted AQL arguments. When $threshold is
null the three-argument AQL form is emitted (server default: 0.7).
Example AQL usage:
NGRAM_MATCH(doc.text, "quick fox", "bigram") // threshold defaults to 0.7
NGRAM_MATCH(doc.text, "quick blue fox", 0.4, "bigram")
Parameters
- $path : string
-
Attribute path expression to test (kept raw).
- $target : string
-
String to compare against (emitted as a quoted string literal).
- $analyzer : string
-
Name of the
ngramAnalyzer (emitted as a quoted string literal). - $threshold : float|null = null
-
Optional similarity threshold in
[0.0, 1.0](server default0.7).
Tags
Return values
string —The formatted AQL expression.
phrase()
Match documents containing a phrase — tokens in the given order.
phrase(string $path, string|array<string|int, mixed> $phrase[, string|null $analyzer = null ]) : string
Wraps the ArangoDB AQL function PHRASE(path, phrasePart, analyzer).
The phrase can be:
- a string — the simple form, emitted as a quoted string literal;
- an array — the advanced AQL array form, emitted with
json_encode, mirroring the official syntax one-to-one: string tokens are quoted, integers act asskipTokenswildcards, and associative arrays become object tokens ({IN_RANGE: …},{LEVENSHTEIN_MATCH: …},{STARTS_WITH: …},{TERM: …},{TERMS: …},{WILDCARD: …}).
The Analyzer must have the "position" and "frequency" features enabled,
otherwise PHRASE() finds nothing. When $analyzer is omitted, the Analyzer
of a wrapping analyzer() call applies (default "identity").
Example AQL usage:
PHRASE(doc.text, "quick fox", "text_en")
PHRASE(doc.text, ["ipsum", 2, "amet"], "text_en") // 2 wildcard tokens between
PHRASE(doc.text, ["lorem", {STARTS_WITH: ["ips"]}], "text_en") // prefix object token
Parameters
- $path : string
-
Attribute path expression to test (kept raw).
- $phrase : string|array<string|int, mixed>
-
The phrase: a plain string, or the AQL array form (tokens, skipTokens numbers, object tokens).
- $analyzer : string|null = null
-
Optional Analyzer name (emitted as a quoted string literal).
Tags
Return values
string —The formatted AQL expression.
tfidf()
Score documents with the term frequency–inverse document frequency algorithm (TF-IDF).
tfidf(string $doc[, bool|null $normalize = null ]) : string
Wraps the ArangoDB AQL scoring function TFIDF(doc, normalize). The first
argument must be the document variable emitted by a FOR … IN viewName
operation, and the function can only be used together with a SEARCH. Sort
descending by the score to get the most relevant documents first.
Example AQL usage:
FOR doc IN viewName
SEARCH ...
SORT TFIDF(doc) DESC
RETURN doc
Parameters
- $doc : string
-
The document variable emitted by
FOR … IN viewName(kept raw). - $normalize : bool|null = null
-
Optional — whether to normalize the score (server default
false).
Tags
Return values
string —The formatted AQL expression.