Helpful API or Speedy Lucene 2.9

(Based on an illustration from 1936)

Since 3.3 (June 2007) Eclipse has contained Apache Lucene 1.9.1, which was released in March 2006 as the search framework for the help system. Unfortunately, org.eclipse.help.base makes it impossible to update to Lucene 2.x or 3.x without breaking the API by reexporting Lucene packages and by exposing Lucene classes in an extension point. Nevertheless, in Eclipse 3.7 M3 Lucene 1.9.1 has been replaced by 2.9.1. What are the benefits? Was it worth to break the API? What do you have to know when you deliver help content?

Cheating with Prebuilt Indexes

On my two year old laptop with Eclipse Classic 3.6.1 it takes about 4 seconds to get the search results of the very first query. The major part of these 4 seconds contains the merging of the prebuilt indexes that are contained in the 5 documentation plug-ins (plugins/*.doc.*.jar) into one and storing it in the configuration area for later queries. Without these prebuilt indexes I would have to wait more than 30 seconds for the first search results. Even though this happens only once per installation, a 30-second wait is too long and this is the reason why Eclipse provides the feature of shipping help content with a prebuilt index.

While merging prebuilt indexes the progress bar shows 0%

Faster Indexing

To my surprise, in Eclipse 3.7 M3 with Lucene 2.9.1 the first query takes about 7 instead of 4 seconds. Is Eclipse Help with Lucene 2.9.1 slower than with Lucene 1.9.1? No, it isn’t. Because the version of Lucene which with the index is created must match the runtime version and because Eclipse 3.7 M3 is – probably by mistake – still shipped with Lucene 1.9.1 indexes, the prebuilt indexes are ignored. So it takes only about 7 instead of 30 seconds to create the index from scratch. Due to the Apache Lucene improvements creating an index is now more than 3 times faster than with the 5 years old Lucene 1.9.1. Without the prebuilt indexes the Eclipse Classic download size would be 2.6 MB smaller but the first query would take about 7 seconds (with Eclipse 3.6 and Lucene 1.9.1 more than 30 seconds) instead of 3 or 4 seconds with prebuilt indexes.

Testing the Limits

I also tested this with three huge help plug-ins: about 470 MB total size, more than 15,000 HTML files, and prebuilt indexes with a total size of about 20 MB. In Eclipse 3.6 (with Lucene 1.9.1) it takes about 10 seconds with and about 4.5 minutes without prebuilt indexes in contrast to Eclipse 3.7 M3 (Lucene 2.9.1) with only about less than 2 minutes without and about 8 seconds with prebuilt indexes.

Help Content for Eclipse 3.6 and 3.7

As plug-in developer you may want to support different versions of Eclipse. Especially if you have a huge help plug-in you should ship the content with both, Lucene 1.9.1 and 2.9.1, indexes. In contrast to the documentation it seems to be possible to register more than one index. Eclipse ignores indexes which were prebuilt with a different than the runtime version of Lucene. Of course, a plug-in that contains multiple indexes is always lager than a plug-in with only one or without an index but waiting too long for the search results will bother users and make some users stop using the help.

My 2.9 cents

Now, that the underlying search framework is so extremely fast, I think it would make sense not to merge the prebuilt search indexes into one anymore but to use the prebuilt indexes directly and merge the search result of all prebuilt indexes instead.

Why not increase the major version number of the org.eclipse.help.base bundle instead of sticking to 3.x only because the version number of the Eclipse distribution is 3.x which is a marketing and not a OSGi version number? Now it is possible to install bundles that require the Eclipse help bundle 3.x and that may fail at runtime. I would prefer the installation of such bundles being rejected. I think the OSGi versioning is helpful and we should not cheat it. In my view, developing a large framework like Eclipse in small steps with some (e.g. 2 or 3 years) backwards compatibility is still one of the biggest challenges in software development that has not yet been met.

Flattr this

One Response to “Helpful API or Speedy Lucene 2.9”

  1. Chris Aniszczyk Says:

    Cool… that’s good to know that the indexer in Lucene is fast now😉

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: