Hackday: FOQElasticaBundle and Elastica
The goal of this hackday, done by Lukas Smith and me, was to adapt the FOQElasticaBundle and the Elastica library so we can use it for one of our current projects. The two main problems were the following:
- Missing support of the FOQElasticaBundle for nested mapping definitions
- Missing support of the Elastica library to process objects for indexing
In the following I'm going to explain in more details what the problems were and how we tried to fix them.
Nested mapping definitions
Elasticsearch supports nested field types 'object', 'array' and 'nested' (www.elasticsearch.org/guide/reference/mapping/). As long as we did not want to configure fields in these nested types, i.e. define a specific mapping for them, this was no problem at all. If a field is not defined in the mapping elasticsearch guesses its type and configuration based on the data it gets when creating the index. But in order to get substring mapping and fuzzy search we needed to be able to add a specific mapping for fields in nested types. Now, there was already a pull request adding the possibility to define a mapping for fields in objects and nested types and this pull request got merged into master in the meanwhile. But the changes done there are still not enough for what we need. First of all, the configuration still does not allow to add the 'fields' configuration to the mapping of nested fields. We need this configuration to define fields of type 'multi_field' in the nested types (www.elasticsearch.org/guide/reference/mapping/multi-field-type.html). To solve this problem quickly and be able to do the index configuration as we need it for our project as quickly as possible we created a first pull request that simply adds the 'fields' configuration option to the accepted configurations for fields in nested types. Secondly, we think that just adding one more level of configuration is a quick fix to our current problem, but what if we need at some point a nested type inside a nested type? So, in a second pull request we changed the Bundle's Configuration class to accept an infinite number of nesting levels. We're not entirely happy with that solution though. E.g. we had to duplicate the code verifying the type of value assigned to a certain configuration, i.e. if it's a scalar or not. Also we couldn't yet figure out how to display the full path of a wrongly typed configuration value in the error message because we're missing the context of the node we're currently looking at. There is already quite some discussion going on on github and hopefully we'll find good/better solution for this problem.
Object processing for indexing
So far, the elastica library just accepts objects of type Elastica_Document to be added to the index. The data property of this object needs to be an array.
For us this meant that we had to do the whole work of putting the properties of our Doctrine entities into an array ourselves. So we thought it would be an improvement to the library if you could configure it to use a serializer and then you can just pass whatever object you want to be indexed and the library takes over the job of serializing the object into the right format, creating the Elastica_Document and adding it to the index. We also added support for serializer groups, used by the JSMSerializerBundle to be able to define which properties of an object should be serialized or excluded from serialization. The pull request is also waiting to be merged at the moment.
If you have any feedback/ideas on any of these pull requests please don't hesitate to enter the discussion on github.