String
expressions in JavaScript notation, and returns an object which computes a value for each requested document. Expressions are built of literals, variables and functions. For example, in the expression 5*sqrt(votes)
, 5
is a literal, sqrt
is a function and votes
is a variable.
The JavaScript
parser recognizes a handful of useful functions
, such as max
, min
, sqrt
etc. The javadocs contain the full list of functions as well as operators that the parser recognizes. The variables
are resolved by binding their name to a ValueSource
. For example, in order to parse and execute the above expression, you need to write code similar to this:
Expression expr = JavascriptCompiler.compile("5*sqrt(votes)");
SimpleBindings bindings = new SimpleBindings();
bindings.add(new SortField("votes", SortField.Type.INT));
The votes
variable is bound to the numeric field "votes" (usually, a NumericDocValuesField
). SimpleBindings
let you bind a variable to a SortField
, however internally it is bound to a ValueSource
. Expression
itself returns a ValueSource
, and when your application asks for the value of a document, it computes it based on the formula and the bounded variables' ValueSource
s.
Customizing expressions
JavascriptCompiler
lets you pass a mapping of custom functions, where each function is implemented by a public and static Method, which takes up to 256 double
parameters and returns a double
. You can see a good example here. That, together with variables
, provides great customization capabilities of expressions.
Sometimes customizing expressions is not so straightforward. For example, someone recently asked on the Lucene user-list how to use a multi-valued field in an expression, for the purpose of computing different functions on it (max
, sum
etc.). At first, it looks like a custom function, e.g. maxAll()
(max()
is already taken), which can be embedded in an expression like 5*maxVal(data)
. However, since data
is a variable, and variables are bound to ValueSource
s (which return a single value for a document), we cannot pass all values of data
to maxAll()
.
We can implement a ValueSource
though, which returns the maximum value of the field data
, and bind a variable max.data
to it. Since this isn't a true function
, we cannot use a more natural notation like max(data)
. Perhaps one day Lucene will have built-in support for multi-valued numeric fields, and the expressions module will auto-detect such fields and pass all values to the function (you're welcome to contribute patches!). Until then though, you need to implement a ValueSource
for each such function, but fortunately it is quite trivial. I wrote some prototype code below which demonstrates how to do that.
▶ Multi-valued expressions demo
Nested expressions
Suppose that your documents containlongitude
and latitude
fields and you want to use them to compute a document's relevance, by computing the haversine formula. You can easily do that by compiling an expression such as haversin(40.7143528,-74.0059731,latitude,longitude)
.
Now, what if you want to use the result of that expression in another expression, such as _score + 1/(1+haversin(40.7143528,-74.0059731,latitude,longitude))
? It starts to become somewhat longish to read. Wouldn't it be better if we can encapsulate that in a distance
variable and read the expression _score + 1/(1+distance)
instead? Fortunately, we can do that easily with nested/sub expressions:
Expression distance = JavascriptCompiler.compile(
"haversin(40.7143528,-74.0059731,latitude,longitude)");
Expression expr = JavascriptCompiler.compile("_score + 1/(1+distance)");
SimpleBindings bindings = new SimpleBindings();
bindings.add(new SortField("_score", SortField.Type.SCORE));
bindings.add(new SortField("latitude", SortField.Type.LONG));
bindings.add(new SortField("longitude", SortField.Type.LONG));
bindings.add("distance", distance);
Since a variable can be bound to any ValueSource
, and expressions are in fact ValueSource
s, we can bind the result of an expression to a variable in another expression. Nested expressions also cache their result in case you use them e.g. in multiple clauses of another expression. This could be useful if you have a very expensive expression which needs to be evaluated multiple times per document.
The expressions module is very powerful and lets you code custom ranking/sorting/faceting formulas easily. With custom functions
, variables
and ValueSource
s, it makes it trivial to extend and build upon even further. If you implement useful functions
, you're welcome to contribute them!