When we launched the Symfony CMF initiative back in 2010, one of the first decisions that was made was to adopt JCR as the basis for our work as we felt that one of the biggest short comings of CMS at the time was in the hard coupling of the storage and business layers. However JCR only defines language level interfaces and APIs. It doesn't define a remoting, let alone, REST API. Thankfully the reference implementation of JCR, called Jackrabbit, did provide an WebDav inspired HTTP API with some JSON mixed-in. We submitted several patches to improve its performance and reduce round trips. We also actively participated in the definition of the JCR 2.1 version of the spec to make it more useful in a client-server scenario. On top of that we have invested a lot of time to create Jackalope, a reference implementation for PHPCR, a port of the JCR spec to PHP. In fact liip.ch runs on Jackrabbit using Symfony CMF.

Now however Jackrabbit Oak has been released, which is a from scratch rewrite. Jackrabbit is an Apache project but a lot of the development is done by Adobe. This is especially true for the successor Jackrabbit Oak that essentially aims to be the ā€œgit of content repositoriesā€ both in terms of market share but also in terms of internal architecture. Along with that there are now plans to also provide a new, cleaner remoting API. Adobe invited us to their offices in Basel earlier this month to discuss the API to ensure that the API fits our needs with PHPCR. Of course now a days we also do projects with Adobe's Jackrabbit based CMS called AEM. All the more reasons to take Adobe up on its offer.

REST is about resources

Our ā€œdelegationā€ consisted of David and myself along with Alfu, who David and Angela mentored during his thesis adding ACL support to the old Jackrabbit HTTP API. We met up with Adobe developers Angela, one of the long time lead developers of Jackrabbit, and Francesco, who is leading the initative for the new remoting API. The first point of discussion was what is the granularity that we want to expose as resources. The initial idea was to expose nodes (ie. individual documents with their properties) as resources. But when doing the initial design work Francesco realized that in many use cases, remote users will want to modify multiple nodes at once. If one would expose nodes as individual resources it would then become necessary to provide some kind of session/transaction mechanism to submit those changes and have them be applied in an atomic fashion. But such mechanism complicate the client side use, more importantly they hurt in horizontal scalability on the server side.

As such he proposes to actually not expose individual nodes as resources, but instead to expose the repository as a whole as the smallest granual unit. This also fits well because in our experience not only do most writes affect multiple nodes, in most cases we also wanted to read multiple nodes. For example one of the first things we added to the old Jackrabbit remote API was the ability to fetch multiple nodes at once. Furthermore one feature we heavily used was the ability to automatically fetch children of nodes up to a given depth.

Protocol format

Basically a write would then be a POST consisting of a series of operations. Now a delete of a node would in this logic also be send a part of such a POST. As repositories (though this might evolve to actually mean workspace) are the smallest granular unit a DELETE would then we used to delete an entire repository only. One of the concerns here is how to make the format compact, yet readable. Jackrabbit previously made use of JSOP, but Jackrabbit Oak will likely use a JSON Patch inspired format. That being said the implementation inside Oak will keep the serialization logic separate, so it should be possible to implement different protocols.

Reducing round trips

As stated above the decision to make repositories the smallest granular unit has a lot to do with reducing network round trips. In a way if you look at the REST spec, it essentially disregards such concerns entirely. But the reality is that even with caching, it is often necessary to reduce network round trips. As such Francesco is planninig:

  • adding many ways to hint (or configure) Oak what additional information to return (for example it will be possible to filter child nodes that should automatically be returned by node name and node type
  • possiblity to set a limit (maybe even page through) the list of child nodes
  • possiblity to specify up to what byte size binaries should be inlined in the response
  • possiblity to filter the properties that should be returned.
  • ability to either inline binaries or send/receive them in separate requests to give full flexibility

These are all very important for PHPCR and go above the features that are available to use in the current Jackrabbit HTTP API.

Git inspiration

In order to make it easier for applications to manage concurrent writes the remoting API will also expose some of the git inspirations from the internals of Oak, specifically the revision ID. One of the out comes of the meeting was that the current revision of the server state should be returned essentially in every response to again reduce round trips and to make clients aware of concurrent writes. That being said, in general Oak will always try a 3-way merge when trying to write to the repository. Conflicts will therefore only happen when trying to write to the same property or when writing to removed a node.

Getting to work

Francesco will soon begin the implementation while maturing the current state of the API spec. The effort is being organized by subtickets attached to this Jira ticket. Contributions are specifically encouraged and maybe the new AEM squad here at Liip and some of the Symfony CMF community members will participate.