Oihana PHP Arango

TextAnalyzer implements AnalyzerOptions

Read onlyYes

Full-text analyzer — tokenises on word boundaries, optionally lower-cases, removes stopwords, applies stemming and accent folding, and optionally emits edge n-grams for prefix search.

Combined with the AnalyzerFeature::FREQUENCY and AnalyzerFeature::POSITION features, it is the building block of every ArangoSearch view intended for BM25() / PHRASE() queries.

Example:

$db->createAnalyzer
(
    'text_fr' ,
    new TextAnalyzer
    (
        locale    : 'fr' ,
        case      : 'lower' ,
        accent    : false ,
        stemming  : true ,
        stopwords : [ 'le' , 'la' , 'les' ] ,
        edgeNgram : [ 'min' => 2 , 'max' => 5 , 'preserveOriginal' => true ] ,
    ) ,
    [
        AnalyzerFeature::FREQUENCY ,
        AnalyzerFeature::POSITION ,
        AnalyzerFeature::NORM ,
    ] ,
) ;
Tags
author

Marc Alcaraz (ekameleon)

since
1.0.0

Table of Contents

Interfaces

AnalyzerOptions
Common contract for every analyzer definition consumable by {@see \oihana\arango\clients\Database::createAnalyzer()} and {@see Analyzer::create()}.

Properties

$accent  : bool|null
$case  : string|null
$edgeNgram  : array<string|int, mixed>|null
$locale  : string
$stemming  : bool|null
$stopwords  : array<string|int, mixed>|null
$stopwordsPath  : string|null

Methods

__construct()  : mixed
toArray()  : array<string, mixed>
Returns the `{ type, properties }` fragment of a `POST /_api/analyzer` body corresponding to this analyzer definition.

Properties

$edgeNgram

public array<string|int, mixed>|null $edgeNgram = null

$stopwords

public array<string|int, mixed>|null $stopwords = null

Methods

__construct()

public __construct(string $locale[, string|null $case = null ][, bool|null $accent = null ][, bool|null $stemming = null ][, array<int, string>|null $stopwords = null ][, string|null $stopwordsPath = null ][, array<string, int|bool>|null $edgeNgram = null ]) : mixed
Parameters
$locale : string

BCP 47 / ICU locale tag (e.g. "en", "fr.utf-8").

$case : string|null = null

Case folding strategy ("lower", "upper", "none"). Defaults to server's "lower".

$accent : bool|null = null

Whether to keep diacritics. Defaults to server's false (accents removed).

$stemming : bool|null = null

Whether to apply Snowball stemming. Defaults to server's true.

$stopwords : array<int, string>|null = null

Inline list of stopwords to drop from the token stream.

$stopwordsPath : string|null = null

Path to a newline-separated stopwords file (resolved server-side).

$edgeNgram : array<string, int|bool>|null = null

Edge n-gram options: min / max / preserveOriginal. Setting min > 0 enables edge n-gram emission for prefix search.

toArray()

Returns the `{ type, properties }` fragment of a `POST /_api/analyzer` body corresponding to this analyzer definition.

public toArray() : array<string, mixed>
Tags
inheritDoc
Return values
array<string, mixed>
On this page

Search results