`String`

expressions in JavaScript notation, and returns an object which computes a value for each requested document. Expressions are built of literals, variables and functions. For example, in the expression `5*sqrt(votes)`

, `5`

is a literal, `sqrt`

is a function and `votes`

is a variable.
The `JavaScript`

parser recognizes a handful of useful `functions`

, such as `max`

, `min`

, `sqrt`

etc. The javadocs contain the full list of functions as well as operators that the parser recognizes. The `variables`

are resolved by binding their name to a `ValueSource`

. For example, in order to parse and execute the above expression, you need to write code similar to this:

```
Expression expr = JavascriptCompiler.compile("5*sqrt(votes)");
SimpleBindings bindings = new SimpleBindings();
bindings.add(new SortField("votes", SortField.Type.INT));
```

The `votes`

variable is bound to the numeric field "votes" (usually, a `NumericDocValuesField`

). `SimpleBindings`

let you bind a variable to a `SortField`

, however internally it is bound to a `ValueSource`

. `Expression`

itself returns a `ValueSource`

, and when your application asks for the value of a document, it computes it based on the formula and the bounded variables' `ValueSource`

s.
**Customizing expressions**

`JavascriptCompiler`

lets you pass a mapping of custom functions, where each function is implemented by a public and static Method, which takes up to 256 `double`

parameters and returns a `double`

. You can see a good example here. That, together with `variables`

, provides great customization capabilities of expressions.
Sometimes customizing expressions is not so straightforward. For example, someone recently asked on the Lucene user-list how to use a multi-valued field in an expression, for the purpose of computing different functions on it (`max`

, `sum`

etc.). At first, it looks like a custom function, e.g. `maxAll()`

(`max()`

is already taken), which can be embedded in an expression like `5*maxVal(data)`

. However, since `data`

is a variable, and variables are bound to `ValueSource`

s (which return a single value for a document), we cannot pass all values of `data`

to `maxAll()`

.

We can implement a `ValueSource`

though, which returns the maximum value of the field `data`

, and bind a variable `max.data`

to it. Since this isn't a true `function`

, we cannot use a more natural notation like `max(data)`

. Perhaps one day Lucene will have built-in support for multi-valued numeric fields, and the expressions module will auto-detect such fields and pass all values to the function (you're welcome to contribute patches!). Until then though, you need to implement a `ValueSource`

for each such function, but fortunately it is quite trivial. I wrote some prototype code below which demonstrates how to do that.

▶ Multi-valued expressions demo

**Nested expressions**

`longitude`

and `latitude`

fields and you want to use them to compute a document's relevance, by computing the haversine formula. You can easily do that by compiling an expression such as `haversin(40.7143528,-74.0059731,latitude,longitude)`

.
Now, what if you want to use the result of that expression in another expression, such as `_score + 1/(1+haversin(40.7143528,-74.0059731,latitude,longitude))`

? It starts to become somewhat longish to read. Wouldn't it be better if we can encapsulate that in a `distance`

variable and read the expression `_score + 1/(1+distance)`

instead? Fortunately, we can do that easily with nested/sub expressions:

```
Expression distance = JavascriptCompiler.compile(
"haversin(40.7143528,-74.0059731,latitude,longitude)");
Expression expr = JavascriptCompiler.compile("_score + 1/(1+distance)");
SimpleBindings bindings = new SimpleBindings();
bindings.add(new SortField("_score", SortField.Type.SCORE));
bindings.add(new SortField("latitude", SortField.Type.LONG));
bindings.add(new SortField("longitude", SortField.Type.LONG));
bindings.add("distance", distance);
```

Since a variable can be bound to any `ValueSource`

, and expressions are in fact `ValueSource`

s, we can bind the result of an expression to a variable in another expression. Nested expressions also cache their result in case you use them e.g. in multiple clauses of another expression. This could be useful if you have a very expensive expression which needs to be evaluated multiple times per document.
The expressions module is very powerful and lets you code custom ranking/sorting/faceting formulas easily. With custom `functions`

, `variables`

and `ValueSource`

s, it makes it trivial to extend and build upon even further. If you implement useful `functions`

, you're welcome to contribute them!

Nice post. I'm wondering what's the performance involved in using a Lucene Function? Is Lucene doing a full-scan over all results and applying the Function to each value?

ReplyDelete