The biggest contribution to the spike came from cutting over facets to DocValues (LUCENE-4602), which got us nearly 70% improvements over term payloads. Not only are DocValues more efficient than payloads, they also let you cache their information in-memory, for faster access, which we now take advantage by default during faceted search.
Additional improvements came from specializing the decode logic (LUCENE-4686), as well as moving various components that take part during faceted search from an iterator-style API to a bulk-read (decode, aggregate) API (LUCENE-4620). The latter got us an additional ~20% improvements, which is amazing given that decoding and aggregation logic weren't changed - it was all just different software design. This is an important lesson for "hot" code - sometimes in order to make it efficient, you need to let go of some Object Oriented best practices!
While that's a major step forward, there's more to come. There's already work going on (LUCENE-4600 and LUCENE-4610), which will bring even more improvements for the common faceted search use cases (facet counting, no associations, no partitions etc.). Also, I plan to address two points that Mike raised in his blog: (1) making
TotalFacetCounts(aka complements) segment-aware (to make it more NRT-friendly and reduce its reloading cost), and (2) experiment with a more efficient category ordinals cache, to reduce its RAM consumption as well as loading speed. There's also fun work going on here, which explores an alternative encoding for category ordinals. So stay tuned for more updates!
NOTE: cutting over facets to DocValues breaks existing indexes. You need to either rebuild them, or use
FacetsPayloadMigrationReaderto do a one-time migration of the payload information to DocValues.