Oihana PHP Arango

AnalyzerField uses ConstantsTrait

JSON field names exchanged with the ArangoDB analyzer API (`/_api/analyzer`), on both the request side (body of `POST /_api/analyzer`) and the response side (wrapper of `GET /_api/analyzer/{name}` and entries of `GET /_api/analyzer?force=true`).

Two families coexist:

  • Top-level fields (name, type, features, properties) that frame every analyzer payload regardless of its type,
  • Type-specific properties (locale, case, accent, stemming, stopwords, stopwordsPath, edgeNgram, min, max, preserveOriginal, startMarker, endMarker, streamType, pipeline) that nest inside the properties wrapper for text, norm, stem, ngram and pipeline analyzers.
Tags
see
https://docs.arangodb.com/stable/develop/http-api/analyzers/
author

Marc Alcaraz (ekameleon)

since
1.0.0

Table of Contents

Constants

ACCENT  : string = 'accent'
Whether the analyzer should keep diacritics on the input (`text` / `norm` only).
CASE  : string = 'case'
Case folding strategy applied to the input (`text` / `norm` only). Recognised values: `"lower"`, `"upper"`, `"none"`.
EDGE_NGRAM  : string = 'edgeNgram'
Edge n-gram options nested inside the `properties` of a `text` analyzer — carries `min`, `max`, `preserveOriginal` sub-fields.
END_MARKER  : string = 'endMarker'
String appended to the end of the input before n-gram emission (`ngram` only), so end-of-token n-grams can be distinguished.
FEATURES  : string = 'features'
List of analyzer feature toggles — entries of {@see AnalyzerFeature}.
LOCALE  : string = 'locale'
BCP 47 / ICU locale tag (e.g. `"en"`, `"fr.utf-8"`) driving the language-aware behaviour of the analyzer (`text` / `norm` / `stem`).
MAX  : string = 'max'
Upper bound of the n-gram window (inclusive). Lives under the {@see self::EDGE_NGRAM} wrapper for a `text` analyzer, or at the top level of the `properties` for an `ngram` analyzer.
MIN  : string = 'min'
Lower bound of the n-gram window (inclusive). Lives under the {@see self::EDGE_NGRAM} wrapper for a `text` analyzer, or at the top level of the `properties` for an `ngram` analyzer.
NAME  : string = 'name'
Top-level analyzer name. Must be prefixed with the database name when shared across databases (`mydb::myanalyzer`).
PIPELINE  : string = 'pipeline'
Ordered list of sub-analyzers run as a chain (`pipeline` only).
PRESERVE_ORIGINAL  : string = 'preserveOriginal'
Whether the n-gram emitter should also keep the original (un-trimmed) token in the output stream. Lives under the {@see self::EDGE_NGRAM} wrapper for a `text` analyzer, or at the top level of the `properties` for an `ngram` analyzer.
PROPERTIES  : string = 'properties'
Wrapper field carrying the type-specific options of an analyzer. Always an object — empty (`{}`) for the {@see AnalyzerType::IDENTITY} analyzer.
RESULT  : string = 'result'
Wrapper field carrying the list of analyzers in the response of `GET /_api/analyzer`.
START_MARKER  : string = 'startMarker'
String prepended to the start of the input before n-gram emission (`ngram` only), so start-of-token n-grams can be distinguished. Lives at the top level of the `ngram` `properties`.
STEMMING  : string = 'stemming'
Whether the `text` analyzer should apply Snowball-style stemming on the tokens it emits.
STOPWORDS  : string = 'stopwords'
List of stopwords to drop from the token stream (`text` only).
STOPWORDS_PATH  : string = 'stopwordsPath'
Filesystem path to a newline-separated stopwords file (`text` only). The path is resolved server-side.
STREAM_TYPE  : string = 'streamType'
Input encoding the `ngram` analyzer operates on — `"binary"` (byte-wise, the server default) or `"utf8"` (codepoint-wise).
TYPE  : string = 'type'
Analyzer type discriminator — entries of {@see AnalyzerType}.

Constants

ACCENT

Whether the analyzer should keep diacritics on the input (`text` / `norm` only).

public string ACCENT = 'accent'

CASE

Case folding strategy applied to the input (`text` / `norm` only). Recognised values: `"lower"`, `"upper"`, `"none"`.

public string CASE = 'case'

EDGE_NGRAM

Edge n-gram options nested inside the `properties` of a `text` analyzer — carries `min`, `max`, `preserveOriginal` sub-fields.

public string EDGE_NGRAM = 'edgeNgram'

END_MARKER

String appended to the end of the input before n-gram emission (`ngram` only), so end-of-token n-grams can be distinguished.

public string END_MARKER = 'endMarker'

Lives at the top level of the ngram properties.

FEATURES

List of analyzer feature toggles — entries of {@see AnalyzerFeature}.

public string FEATURES = 'features'

Top-level field on every analyzer payload.

LOCALE

BCP 47 / ICU locale tag (e.g. `"en"`, `"fr.utf-8"`) driving the language-aware behaviour of the analyzer (`text` / `norm` / `stem`).

public string LOCALE = 'locale'

MAX

Upper bound of the n-gram window (inclusive). Lives under the {@see self::EDGE_NGRAM} wrapper for a `text` analyzer, or at the top level of the `properties` for an `ngram` analyzer.

public string MAX = 'max'

MIN

Lower bound of the n-gram window (inclusive). Lives under the {@see self::EDGE_NGRAM} wrapper for a `text` analyzer, or at the top level of the `properties` for an `ngram` analyzer.

public string MIN = 'min'

NAME

Top-level analyzer name. Must be prefixed with the database name when shared across databases (`mydb::myanalyzer`).

public string NAME = 'name'

PIPELINE

Ordered list of sub-analyzers run as a chain (`pipeline` only).

public string PIPELINE = 'pipeline'

Lives at the top level of the pipeline properties; each entry is itself a { type, properties } analyzer fragment, fed the output of the previous one. See PipelineAnalyzer.

PRESERVE_ORIGINAL

Whether the n-gram emitter should also keep the original (un-trimmed) token in the output stream. Lives under the {@see self::EDGE_NGRAM} wrapper for a `text` analyzer, or at the top level of the `properties` for an `ngram` analyzer.

public string PRESERVE_ORIGINAL = 'preserveOriginal'

PROPERTIES

Wrapper field carrying the type-specific options of an analyzer. Always an object — empty (`{}`) for the {@see AnalyzerType::IDENTITY} analyzer.

public string PROPERTIES = 'properties'

RESULT

Wrapper field carrying the list of analyzers in the response of `GET /_api/analyzer`.

public string RESULT = 'result'

START_MARKER

String prepended to the start of the input before n-gram emission (`ngram` only), so start-of-token n-grams can be distinguished. Lives at the top level of the `ngram` `properties`.

public string START_MARKER = 'startMarker'

STEMMING

Whether the `text` analyzer should apply Snowball-style stemming on the tokens it emits.

public string STEMMING = 'stemming'

STOPWORDS

List of stopwords to drop from the token stream (`text` only).

public string STOPWORDS = 'stopwords'

STOPWORDS_PATH

Filesystem path to a newline-separated stopwords file (`text` only). The path is resolved server-side.

public string STOPWORDS_PATH = 'stopwordsPath'

STREAM_TYPE

Input encoding the `ngram` analyzer operates on — `"binary"` (byte-wise, the server default) or `"utf8"` (codepoint-wise).

public string STREAM_TYPE = 'streamType'

Lives at the top level of the ngram properties.

TYPE

Analyzer type discriminator — entries of {@see AnalyzerType}.

public string TYPE = 'type'
On this page

Search results