Drupal SearchAPI and result grouping

In this blog post I will present how, in a recent e-Commerce project built on top of Drupal7 (the former version of the Drupal CMS), we make Drupal7, SearchAPI and Commerce play together to efficiently retrieve grouped results from Solr in SearchAPI, with no indexed data duplication.

We used the SearchAPI and the FacetAPI modules to build a search index for products, so far so good: available products and product-variations can be searched and filtered also by using a set of pre-defined facets. In a subsequent request, a new need arose from our project owner: provide a list of products where the results should include, in addition to the product details, a picture of one of the available product variations, while keep the ability to apply facets on products for the listing. Furthermore, the product variation picture displayed in the list must also match the filter applied by the user: this with the aim of not confusing users, and to provide a better user experience.

An example use case here is simple: allow users to get the list of available products and be able to filter them by the color/size/etc field of the available product variations, while displaying a picture of the available variations, and not a sample picture.

For the sake of simplicity and consistency with Drupal’s Commerce module terminology, I will use the term “Product” to refer to any product-variation, while the term “Model” will be used to refer to a product.

Solr Result Grouping

We decided to use Solr (the well-known, fast and efficient search engine built on top of the Apache Lucene library) as the backend of the eCommerce platform: the reason lies not only in its full-text search features, but also in the possibility to build a fast retrieval system for the huge number of products we were expecting to be available online.

To solve the request about the display of product models, facets and available products, I intended to use the feature offered by Solr called Result-Grouping as it seemed to be suitable for our case: Solr is able to return just a subset of results by grouping them given an “single value” field (previously indexed, of course). The Facets can then be configured to be computed from: the grouped set of results, the ungrouped items or just from the first result of each group.

Such handy feature of Solr can be used in combination with the SearchAPI module by installing the SearchAPI Grouping module. The module allows to return results grouped by a single-valued field, while keeping the building process of the facets on all the results matched by the query, this behavior is configurable.

That allowed us to:

  • group the available products by the referenced model and return just one model;
  • compute the attribute’s facets on the entire collection of available products;
  • reuse the data in the product index for multiple views based on different grouping settings.

Result Grouping in SearchAPI

Due to some limitations of the SearchAPI module and its query building components, such plan was not doable with the current configuration as it would require us to create a copy of the product index just to apply the specific Result Grouping feature for each view.

The reason is that the features implemented by the SearchAPI Grouping are implemented on top of the “Alterations and Processors” functions of SearchAPI. Those are a set of specific functions that can be configured and invoked both at indexing-time and at querying-time by the SearchAPI module. In particular Alterations allows to programmatically alter the contents sent to the underlying index, while the Processors code is executed when a search query is built, executed and the results returned.
Those functions can be defined and configured only per-index.

As visible in the following picture, the SearchAPI Grouping module configuration could be done solely in the Index configuration, but not per-query.

SearchAPI: processor settings

Image 1: SearchAPI configuration for the Grouping Processor.

As the SearchAPI Grouping module is implemented as a SearchAPI Processor (as it needs to be able to alter the query sent to Solr and to handle the returned results), it would force us to create a new index for each different configuration of the result grouping.

Such limitation requires to introduce a lot of (useless) data duplication in the index, with a consequent decrease of performance when products are saved and later indexed in multiple indexes.
In particular, the duplication is more evident as the changes performed by the Processor are merely an alteration of:

  1. the query sent to Solr;
  2. the handling of the raw data returned by Solr.

This shows that there would be no need to index multiple times the same data.

Since the the possibility to define per-query processor sounded really promising and such feature could be used extensively in the same project, a new module has been implemented and published on Drupal.org: the SearchAPI Extended Processors module. (thanks to SearchAPI’s maintainer, DrunkenMonkey, for the help and review :) ).

The Drupal SearchAPI Extended Processor

The new module allows to extend the standard SearchAPI behavior for Processors and lets admins configure the execution of SearchAPI Processors per query and not only per-index.

By using the new module, any index can now be used with multiple and different Processors configurations, no new indexes are needed, thus avoiding data duplication.

The new configuration is exposed, as visible in the following picture, while editing a SearchAPI view under “Advanced > Query options”.
The SearchAPI processors can be altered and re-defined for the given view, a checkbox allows to completely override the current index setting rather than providing additional processors.

Drupal SearchAPI: view's extended processor settings

Image 2: View’s “Query options” with the SearchAPI Extended Processors module.


Conclusion: the new SearchAPI Extended Processors module has now been used for a few months in a complex eCommerce project at Liip and allowed us to easily implement new search features without the need to create multiple and separated indexes.
We are able to index Products data in one single (and compact) Solr index, and use it with different grouping strategies to build both product listings, model listings and model-category navigation pages without duplicating any data.
Since all those listings leverages the Solr FilterQuery query parameter to filter the correct set of products to be displayed, Solr can make use of its internal set of caches and specifically the filterCache to speed up subsequent searches and facets. This aspect, in addition to the usage of only one index, allows caches to be shared among multiple listings, and that would not be possible if separate indexes were used.

For further information, questions or curiosity drop me a line, I will be happy to help you configuring Drupal SearchAPI and Solr for your needs.

Leave a Reply