TextAnalyzer implements AnalyzerOptions
Full-text analyzer — tokenises on word boundaries, optionally lower-cases, removes stopwords, applies stemming and accent folding, and optionally emits edge n-grams for prefix search.
Combined with the AnalyzerFeature::FREQUENCY
and AnalyzerFeature::POSITION
features, it is the building block of every ArangoSearch view
intended for BM25() / PHRASE() queries.
Example:
$db->createAnalyzer
(
'text_fr' ,
new TextAnalyzer
(
locale : 'fr' ,
case : 'lower' ,
accent : false ,
stemming : true ,
stopwords : [ 'le' , 'la' , 'les' ] ,
edgeNgram : [ 'min' => 2 , 'max' => 5 , 'preserveOriginal' => true ] ,
) ,
[
AnalyzerFeature::FREQUENCY ,
AnalyzerFeature::POSITION ,
AnalyzerFeature::NORM ,
] ,
) ;
Tags
Table of Contents
Interfaces
- AnalyzerOptions
- Common contract for every analyzer definition consumable by {@see \oihana\arango\clients\Database::createAnalyzer()} and {@see Analyzer::create()}.
Properties
- $accent : bool|null
- $case : string|null
- $edgeNgram : array<string|int, mixed>|null
- $locale : string
- $stemming : bool|null
- $stopwords : array<string|int, mixed>|null
- $stopwordsPath : string|null
Methods
- __construct() : mixed
- toArray() : array<string, mixed>
- Returns the `{ type, properties }` fragment of a `POST /_api/analyzer` body corresponding to this analyzer definition.
Properties
$accent
public
bool|null
$accent
= null
$case
public
string|null
$case
= null
$edgeNgram
public
array<string|int, mixed>|null
$edgeNgram
= null
$locale
public
string
$locale
$stemming
public
bool|null
$stemming
= null
$stopwords
public
array<string|int, mixed>|null
$stopwords
= null
$stopwordsPath
public
string|null
$stopwordsPath
= null
Methods
__construct()
public
__construct(string $locale[, string|null $case = null ][, bool|null $accent = null ][, bool|null $stemming = null ][, array<int, string>|null $stopwords = null ][, string|null $stopwordsPath = null ][, array<string, int|bool>|null $edgeNgram = null ]) : mixed
Parameters
- $locale : string
-
BCP 47 / ICU locale tag (e.g.
"en","fr.utf-8"). - $case : string|null = null
-
Case folding strategy (
"lower","upper","none"). Defaults to server's"lower". - $accent : bool|null = null
-
Whether to keep diacritics. Defaults to server's
false(accents removed). - $stemming : bool|null = null
-
Whether to apply Snowball stemming. Defaults to server's
true. - $stopwords : array<int, string>|null = null
-
Inline list of stopwords to drop from the token stream.
- $stopwordsPath : string|null = null
-
Path to a newline-separated stopwords file (resolved server-side).
- $edgeNgram : array<string, int|bool>|null = null
-
Edge n-gram options:
min/max/preserveOriginal. Settingmin > 0enables edge n-gram emission for prefix search.
toArray()
Returns the `{ type, properties }` fragment of a `POST /_api/analyzer` body corresponding to this analyzer definition.
public
toArray() : array<string, mixed>