And here’s another blog post about jackrabbit clusters and how to make your life better.
Adding a new instance to a Jackrabbit cluster is very easy. In the beginning. Just provide a proper repository.xml which points to the central sources, add a new cluster id and start. Everything is taking care of from then. The problems start, if you’re data grows and gets larger and larger.
If you add a new instance to a Jackrabbit cluster, your new Jackrabbit instance starts up and begins to read all the content to reindex and built up its Lucene search index. It also reads the Journal and rebuilds everything it needs from there. You can imagine, that this can take quite some time if you have a lot of content build up lately.
Furthermore, as this the journal can get huge pretty fast, Jackrabbit introduced a janitor, which cleans the journal log daily. Great, if you have the same instances running all the time (they don’t need the log from days/months ago), but not so great, if you want to add new instances (and the wiki entry linked above warns you from exactly this).
But there’s a solution to this very problem, and it’s not that complicated:
With this approach, we can be sure that everything is in a consistent state (the Lucene indexes for example) and we can safely start that copy of this instance in another place and it should take up where it was without loosing anything (as long as the janitor didn’t run between the backup and starting the new clone).
As a little proof of concept I wrote 2 little scripts, which exactly do what I described above. They can be found on Github at https://github.com/chregu/Jackrabbit-clone-scripts/. They are not used in production (yet) and handle one specific setup (we use MySQL as Persistent Store for example), but it should be easy to adjust it to your needs. It has some tests for avoiding mistakes and the scripts stops then, but I’m sure I missed some not-so-obvious ones. It will help us a lot in adding new instances to a cluster in a decent amount of time. I’m sure some of you out there can make use of it, too (be it only to know how that works in Jackrabbit). The README has some more info.
Hi Christian, great post.
I have a question for you. My Jackrabbit is configured to work with a MySql and two cluster nodes. I’ve configured the Janitor to remove old revisions from the Janitor table. Despite the process is being executed with success, the journal table is not being cleaned. Logs show:
[Jackrabbit-ClusterRevisionJanitor] – org.apache.jackrabbit.core.journal.DatabaseJournal$RevisionTableJanitor.run(DatabaseJournal.java:1215) – Next clean-up run scheduled at Thu Aug 28 23:33:00 UTC 2014
-Both clusters have the same revision number (local_revisions table)
-That revision number is the same that global_revisions has.
Thanks in advance.