Oihana PHP Arango

NgramAnalyzer implements AnalyzerOptions

Read onlyYes

N-gram analyzer — emits every substring (n-gram) of its input whose length is between `min` and `max` characters, optionally keeping the original token. It is the building block of substring / "as-you-type" autocomplete search: indexing a field with an n-gram analyzer lets a partial term (`ate`) match a longer value (`Atelier`).

Unlike the edgeNgram option nested inside a TextAnalyzer (which only emits prefixes), a standalone ngram analyzer emits n-grams from every position. It is typically paired with a text analyzer on the same field (multiple analyzers per field) so the field serves both whole-word search and autocomplete.

Example:

$db->createAnalyzer
(
    'autocomplete' ,
    new NgramAnalyzer
    (
        min              : 2 ,
        max              : 5 ,
        preserveOriginal : true ,
        streamType       : 'utf8' ,
    ) ,
    [
        AnalyzerFeature::FREQUENCY ,
        AnalyzerFeature::POSITION ,
    ] ,
) ;
Tags
author

Marc Alcaraz (ekameleon)

since
1.5.0

Table of Contents

Interfaces

AnalyzerOptions
Common contract for every analyzer definition consumable by {@see \oihana\arango\clients\Database::createAnalyzer()} and {@see Analyzer::create()}.

Properties

$endMarker  : string|null
$max  : int
$min  : int
$preserveOriginal  : bool
$startMarker  : string|null
$streamType  : string|null

Methods

__construct()  : mixed
toArray()  : array<string, mixed>
Returns the `{ type, properties }` fragment of a `POST /_api/analyzer` body corresponding to this analyzer definition.

Properties

Methods

__construct()

public __construct(int $min, int $max[, bool $preserveOriginal = false ][, string|null $startMarker = null ][, string|null $endMarker = null ][, string|null $streamType = null ]) : mixed
Parameters
$min : int

Lower bound of the n-gram length window (inclusive).

$max : int

Upper bound of the n-gram length window (inclusive).

$preserveOriginal : bool = false

Whether to also keep the original (un-split) token in the output stream.

$startMarker : string|null = null

String prepended to the input before n-gram emission, so start-of-token n-grams can be distinguished. Defaults to server's empty string.

$endMarker : string|null = null

String appended to the input before n-gram emission, so end-of-token n-grams can be distinguished. Defaults to server's empty string.

$streamType : string|null = null

Input encoding (see StreamType): "binary" (byte-wise, server default) or "utf8" (codepoint-wise).

toArray()

Returns the `{ type, properties }` fragment of a `POST /_api/analyzer` body corresponding to this analyzer definition.

public toArray() : array<string, mixed>
Tags
inheritDoc
Return values
array<string, mixed>
On this page

Search results