Luces is Spanish for light. It's also the name given to a small piece of software that's helping us migrate from Lucene to Elasticsearch. And now, the world can see it.
Today, search in Lithium communities is powered by Apache Lucene, a powerful, open-source Java library. Over the years, we have found edge cases and shortcomings in the library's earllier versions, prompting our engineering team to create modifications and extensions to a lot of different components, tweaking it to suit our needs. Lithium has made a lot of improvements to Lucene, but they were kept internal; ironically, some of Lithium’s engineering efforts in Lucene were mirrored by Apache in subsequent versions.
More features were added to augment the search experience, and parts of the Lithium community platform (LIA) started to become tightly coupled with Lucene. Eventually, our version of Lucene diverged from Apache's so much that efforts to upgrade Lucene were unsuccessful. This divergence became one of the main reasons why we embarked on the Elasticsearch migration project.
We faced a lot of challenges in migrating from Lucene to Elasticsearch. Today, LIA takes data from a community and processes it through a slew of logic that creates a Lucene Document (a "simple" Java object), which then gets sent to Lucene to be processed. So, how do we redirect that data to Elasticsearch? Initially, we considered a few approaches:
Build up a JSON document out of community data for indexing so we can send it to the Elasticsearch API straight away. The challenge with this approach was the fact that community data goes through logic written using hundreds, if not thousands, of man-hours of engineering to create the Lucene document. As such, replicating it to create a JSON document from scratch would take considerable effort. Since we were trying to move as quickly as possible, this was not a feasible option.
Take the generated document and send it straight to Elasticsearch. Unfortunately, even though Elasticsearch is written in Java and contains a Lucene engine internally, Elasticsearch consumes data through its API only in JSON format, which is incompatible with a (serialized) Java object. (There is a native Java client, but there are technical reasons why we didn't want to go that route).
Copy the existing Lucene index data into Elasticsearch's internal Lucene index. Unfortunately, this method would also be a huge engineering challenge, as Elasticsearch has its own logic to modify data it ingests before saving it into its internal Lucene index. Going this route would require engineering to replicate what Elasticsearch would do to the data internally (which sounds like a bad time).
Oddly, a solution to this problem doesn't exist (at least, not to the general public). Either not many people were trying to migrate from Lucene to Elasticsearch (yet) or the way we're using Lucene is radically different from everyone else, or possibly both. In the end, we wrote our own utility to convert our generated Lucene document into a JSON format.
On July 29th, 2015, the Lithium Gravity team released the conversion tool as an open source project.