Oihana PHP Arango

PipelineAnalyzer implements AnalyzerOptions

Read onlyYes

Pipeline analyzer — runs an **ordered** chain of sub-analyzers, each fed the output of the previous one. It is the typed way to compose analyzers the server otherwise only exposes individually ({@see NormAnalyzer}, {@see NgramAnalyzer}, {@see StemAnalyzer}, …); without it, the only escape hatch was an untyped `new RawAnalyzer( 'pipeline' , … )`.

Why it matters — case-/accent-insensitive autocomplete. A standalone ngram analyzer normalises neither case nor accents: indexing a field stored in upper case ("L'ABSIE", "ANGLET") yields upper-case n-grams, while a user typing l'ab in lower case produces lower-case n-grams — the two token streams never meet and the autocomplete silently matches nothing. ArangoDB offers no per-type "normalise first" switch on ngram; the clean fix is a pipeline that runs a NormAnalyzer (lower-case + accent fold) before the NgramAnalyzer, so both the indexed values and the query are folded to the same form before the split.

The order of $pipeline is significant — norm must come before ngram, never the reverse.

Example — the normngram autocomplete pipeline:

use oihana\arango\clients\analyzer\NgramAnalyzer ;
use oihana\arango\clients\analyzer\NormAnalyzer ;
use oihana\arango\clients\analyzer\PipelineAnalyzer ;
use oihana\arango\clients\analyzer\enums\AnalyzerFeature ;

$db->createAnalyzer
(
    'autocomplete' ,
    new PipelineAnalyzer
    ([
        new NormAnalyzer ( locale : 'fr' , case : 'lower' , accent : false ) , // 1. fold case + accents
        new NgramAnalyzer( min : 3 , max : 5 , preserveOriginal : true ) ,      // 2. then split
    ]) ,
    [
        AnalyzerFeature::FREQUENCY ,
        AnalyzerFeature::POSITION ,
    ] ,
) ;
Tags
author

Marc Alcaraz (ekameleon)

since
1.5.0

Table of Contents

Interfaces

AnalyzerOptions
Common contract for every analyzer definition consumable by {@see \oihana\arango\clients\Database::createAnalyzer()} and {@see Analyzer::create()}.

Properties

$pipeline  : array<string|int, mixed>

Methods

__construct()  : mixed
toArray()  : array<string, mixed>
Returns the `{ type, properties }` fragment of a `POST /_api/analyzer` body corresponding to this analyzer definition.

Properties

Methods

__construct()

public __construct(array<int, AnalyzerOptions$pipeline) : mixed
Parameters
$pipeline : array<int, AnalyzerOptions>

The ordered chain of sub-analyzers; each member is itself an AnalyzerOptions value object, run in declaration order (e.g. [ new NormAnalyzer(…) , new NgramAnalyzer(…) ]).

Tags
throws
InvalidArgumentException

When the pipeline is empty, or any member is not an AnalyzerOptions.

toArray()

Returns the `{ type, properties }` fragment of a `POST /_api/analyzer` body corresponding to this analyzer definition.

public toArray() : array<string, mixed>
Tags
inheritDoc
Return values
array<string, mixed>
On this page

Search results