Blog Posts

Add new instances to your Jackrabbit cluster - the non-time-consuming way

And here's another blog post about jackrabbit clusters and how to make your life better.

Adding a new instance to a Jackrabbit cluster is very easy. In the beginning. Just provide a proper repository.xml which points to the central sources, add a new cluster id and start. Everything is taking care of from then. The problems start, if you're data grows and gets larger and larger.

If you add a new instance to a Jackrabbit cluster, your new Jackrabbit instance starts up and begins to read all the content to reindex and built up its Lucene search index. It also reads the Journal and rebuilds everything it needs from there. You can imagine, that this can take quite some time if you have a lot of content build up lately.

Furthermore, as this the journal can get huge pretty fast, Jackrabbit introduced a janitor, which cleans the journal log daily. Great, if you have the same instances running all the time (they don't need the log from days/months ago), but not so great, if you want to add new instances (and the wiki entry linked above warns you from exactly this).

But there's a solution to this very problem, and it's not that complicated:

  • Shutdown one of your instances
  • Get the current revision number that instance was from your database
  • Copy your whole Jackrabbit repository directory to another server/location
  • Start your original Jackrabbit again
  • Change repository.xml with a new nodename in your clusterconfig
  • Add that nodename to your DB in JOURNAL_LOCAL_REVISIONS with the number from the original instance
  • Start your new Jackrabbit instance (or keep it for backup purposes)

With this approach, we can be sure that everything is in a consistent state (the Lucene indexes for example) and we can safely start that copy of this instance in another place and it should take up where it was without loosing anything (as long as the janitor didn't run between the backup and starting the new clone).

As a little proof of concept I wrote 2 little scripts, which exactly do what I described above. They can be found on Github at https://github.com/chregu/Jackrabbit-clone-scripts/. They are not used in production (yet) and handle one specific setup (we use MySQL as Persistent Store for example), but it should be easy to adjust it to your needs. It has some tests for avoiding mistakes and the scripts stops then, but I'm sure I missed some not-so-obvious ones. It will help us a lot in adding new instances to a cluster in a decent amount of time. I'm sure some of you out there can make use of it, too (be it only to know how that works in Jackrabbit). The README has some more info.

Related Entries:
- Announcement: PHPCR and Magnolia CMS: Bridging the PHP and Java Worlds
- Jackrabbit and its two SQL languages - some findings
- How to make Jackrabbit globally distributable, fail-safe and scalable in one go
- A (simple) PHPCR browser
- A simple Java Davex client for sniffing the protocol

About the author

Comments [1]

Luis Nivel, 29.08.2014 01:43 CEST

Hi Christian, great post.

I have a question for you. My Jackrabbit is configured to work with a MySql and two cluster nodes. I've configured the Janitor to remove old revisions from the Janitor table. Despite the process is being executed with success, the journal table is not being cleaned. Logs show:
[Jackrabbit-ClusterRevisionJanitor] - org.apache.jackrabbit.core.journal.DatabaseJournal$RevisionTableJanitor.run(DatabaseJournal.java:1215) - Next clean-up run scheduled at Thu Aug 28 23:33:00 UTC 2014

Additional information:
-Both clusters have the same revision number (local_revisions table)
-That revision number is the same that global_revisions has.

Any ideas?

Thanks in advance.

Luis

Add a comment

Your email adress will never be published. Comment spam will be deleted!