Oihana PHP Arango

minhashMatch.php

Table of Contents

Functions

minhashMatch()  : string
Match documents with an approximate Jaccard similarity of at least a threshold.

Functions

minhashMatch()

Match documents with an approximate Jaccard similarity of at least a threshold.

minhashMatch(string $path, string $target, string $analyzer[, float|null $threshold = null ]) : string

Wraps the ArangoDB AQL function MINHASH_MATCH(path, target, threshold, analyzer). The similarity is approximated with the given minhash Analyzer — an efficient first pass for entity resolution (duplicate detection) before an exact JACCARD() computation.

Argument order notice — in AQL the optional threshold sits before the mandatory analyzer; PHP forbids a required parameter after an optional one, so this helper takes the analyzer third and the optional threshold last, then re-orders the emitted AQL arguments.

Example AQL usage:

MINHASH_MATCH(doc.text, "the quick brown fox", 0.5, "myMinHash")
Parameters
$path : string

Attribute path expression to test (kept raw).

$target : string

String to hash and compare against (emitted as a quoted string literal).

$analyzer : string

Name of the minhash Analyzer (emitted as a quoted string literal).

$threshold : float|null = null

Optional similarity threshold in [0.0, 1.0].

Tags
example
use function oihana\arango\db\functions\search\minhashMatch;

echo minhashMatch( 'doc.text' , 'the quick brown fox' , 'myMinHash' , 0.5 ) ;
// 'MINHASH_MATCH(doc.text,"the quick brown fox",0.5,"myMinHash")'

echo minhashMatch( 'doc.text' , 'the quick brown fox' , 'myMinHash' ) ;
// 'MINHASH_MATCH(doc.text,"the quick brown fox","myMinHash")'
see
https://docs.arangodb.com/stable/aql/functions/arangosearch/#minhash_match
ngramMatch()
since
1.2.0
author

Marc Alcaraz

Return values
string

The formatted AQL expression.

On this page

Search results