NgramAnalyzer implements AnalyzerOptions
N-gram analyzer — emits every substring (n-gram) of its input whose length is between `min` and `max` characters, optionally keeping the original token. It is the building block of substring / "as-you-type" autocomplete search: indexing a field with an n-gram analyzer lets a partial term (`ate`) match a longer value (`Atelier`).
Unlike the edgeNgram option nested inside a TextAnalyzer (which
only emits prefixes), a standalone ngram analyzer emits n-grams from
every position. It is typically paired with a text analyzer on the
same field (multiple analyzers per field) so the field serves both
whole-word search and autocomplete.
Example:
$db->createAnalyzer
(
'autocomplete' ,
new NgramAnalyzer
(
min : 2 ,
max : 5 ,
preserveOriginal : true ,
streamType : 'utf8' ,
) ,
[
AnalyzerFeature::FREQUENCY ,
AnalyzerFeature::POSITION ,
] ,
) ;
Tags
Table of Contents
Interfaces
- AnalyzerOptions
- Common contract for every analyzer definition consumable by {@see \oihana\arango\clients\Database::createAnalyzer()} and {@see Analyzer::create()}.
Properties
- $endMarker : string|null
- $max : int
- $min : int
- $preserveOriginal : bool
- $startMarker : string|null
- $streamType : string|null
Methods
- __construct() : mixed
- toArray() : array<string, mixed>
- Returns the `{ type, properties }` fragment of a `POST /_api/analyzer` body corresponding to this analyzer definition.
Properties
$endMarker
public
string|null
$endMarker
= null
$max
public
int
$max
$min
public
int
$min
$preserveOriginal
public
bool
$preserveOriginal
= false
$startMarker
public
string|null
$startMarker
= null
$streamType
public
string|null
$streamType
= null
Methods
__construct()
public
__construct(int $min, int $max[, bool $preserveOriginal = false ][, string|null $startMarker = null ][, string|null $endMarker = null ][, string|null $streamType = null ]) : mixed
Parameters
- $min : int
-
Lower bound of the n-gram length window (inclusive).
- $max : int
-
Upper bound of the n-gram length window (inclusive).
- $preserveOriginal : bool = false
-
Whether to also keep the original (un-split) token in the output stream.
- $startMarker : string|null = null
-
String prepended to the input before n-gram emission, so start-of-token n-grams can be distinguished. Defaults to server's empty string.
- $endMarker : string|null = null
-
String appended to the input before n-gram emission, so end-of-token n-grams can be distinguished. Defaults to server's empty string.
- $streamType : string|null = null
-
Input encoding (see StreamType):
"binary"(byte-wise, server default) or"utf8"(codepoint-wise).
toArray()
Returns the `{ type, properties }` fragment of a `POST /_api/analyzer` body corresponding to this analyzer definition.
public
toArray() : array<string, mixed>