PipelineAnalyzer implements AnalyzerOptions
Pipeline analyzer — runs an **ordered** chain of sub-analyzers, each fed the output of the previous one. It is the typed way to compose analyzers the server otherwise only exposes individually ({@see NormAnalyzer}, {@see NgramAnalyzer}, {@see StemAnalyzer}, …); without it, the only escape hatch was an untyped `new RawAnalyzer( 'pipeline' , … )`.
Why it matters — case-/accent-insensitive autocomplete. A standalone
ngram analyzer normalises neither case nor accents: indexing a
field stored in upper case ("L'ABSIE", "ANGLET") yields upper-case
n-grams, while a user typing l'ab in lower case produces lower-case
n-grams — the two token streams never meet and the autocomplete silently
matches nothing. ArangoDB offers no per-type "normalise first" switch on
ngram; the clean fix is a pipeline that runs a NormAnalyzer
(lower-case + accent fold) before the NgramAnalyzer, so both the
indexed values and the query are folded to the same form before the split.
The order of $pipeline is significant — norm must come before
ngram, never the reverse.
Example — the norm → ngram autocomplete pipeline:
use oihana\arango\clients\analyzer\NgramAnalyzer ;
use oihana\arango\clients\analyzer\NormAnalyzer ;
use oihana\arango\clients\analyzer\PipelineAnalyzer ;
use oihana\arango\clients\analyzer\enums\AnalyzerFeature ;
$db->createAnalyzer
(
'autocomplete' ,
new PipelineAnalyzer
([
new NormAnalyzer ( locale : 'fr' , case : 'lower' , accent : false ) , // 1. fold case + accents
new NgramAnalyzer( min : 3 , max : 5 , preserveOriginal : true ) , // 2. then split
]) ,
[
AnalyzerFeature::FREQUENCY ,
AnalyzerFeature::POSITION ,
] ,
) ;
Tags
Table of Contents
Interfaces
- AnalyzerOptions
- Common contract for every analyzer definition consumable by {@see \oihana\arango\clients\Database::createAnalyzer()} and {@see Analyzer::create()}.
Properties
- $pipeline : array<string|int, mixed>
Methods
- __construct() : mixed
- toArray() : array<string, mixed>
- Returns the `{ type, properties }` fragment of a `POST /_api/analyzer` body corresponding to this analyzer definition.
Properties
$pipeline
public
array<string|int, mixed>
$pipeline
Methods
__construct()
public
__construct(array<int, AnalyzerOptions> $pipeline) : mixed
Parameters
- $pipeline : array<int, AnalyzerOptions>
-
The ordered chain of sub-analyzers; each member is itself an AnalyzerOptions value object, run in declaration order (e.g.
[ new NormAnalyzer(…) , new NgramAnalyzer(…) ]).
Tags
toArray()
Returns the `{ type, properties }` fragment of a `POST /_api/analyzer` body corresponding to this analyzer definition.
public
toArray() : array<string, mixed>