Content storage done right

Jackalope and PHPCR have been a reoccuring topic on this blog. Back in 2009 we here at Liip began exploring the possibility of integrating Jackarabbit, the reference implementation of the Java Content Repository specification, with PHP. The vision was two fold: First up we wanted to make it possible to directly interact with content stored in AdobeCQ (called Day Communiqué at the time) or Magnolia. Additionally we also felt it would be a great asset to the PHP CMS world to be able to leverage all the power of JCR from PHP, hence PHPCR. The initial attempts made use of the Zend Java Bridge to communicate directly from PHP to Java. However eventually we realized that it would be more feasible to use the native HTTP API provided by Jackrabbit. But things only really took off when the Symfony CMF initiative decided to adopt our work. Now four years later we finally have the first stable releases of PHPCR, Jackalope and the hibernate inspired object mapper PHPCR ODM.

It is PHP

The gut feeling of many PHP developers when faced with Java is usually one of worries of factory factories, endless XML configuration and deep class structures. This sentiment might best be illustrated by this famous saying: Java is a DSL for taking large XML files and converting them to stack traces. At the same time perceptions have changed. Lucene is the basis for Solr and ElasticSearch, the go to full text search engine for most PHP developers. In fact pretty much all PHP CMS defer to either of these solutions when dealing with larger data sets when it comes to search. By the way, Lucene is integrated into Jackrabbit right out of the box. Furthermore, many PHP developers have realized that there is in fact value in decoupled architectures and design patterns which indeed tend to result in more complex class structures. The benefits are however reuseability, testability and the fact that each unit of code is much more approachable on its own. But of course this does not mean that there is no value in the PHP platform aside from its ubiquity resulting in the fact that in October 2013 PHP was used on 80% of all domains. The conclusion is that there is no need to jump ship from PHP but there is value in bringing outside ideas to the platform. Obviously porting JCR to PHPCR cannot be done as a one to one mapping. So PHPCR leverages the fact that PHP provides associatives arrays where JCR has to rely on more unwiedly object structures. And most importantly for making PHPCR relevant for the real world, there is also an implementation that is written purely in PHP connecting to a RDBMS using Doctrine DBAL, ie. its possible to use PHPCR without running any Java at all.

It is community

Through out this entire effort Liip has done a significant portion of the work. However there has always been people from the community involved as well and this was the goal from the very start. At this point one can therefore with confidence state this this is a community effort and that development is no longer driven entirely by Liip. This includes reporting bugs but also fixing bugs, improving or adding features, tests and of course also documentation. I want to briefly highlight some key contributors:

  • Karsten for his initial work on the PHPCR interfaces which we adopted
  • Uwe and Johannes for becoming the first non Liipers to make significant improvements to Jackalope and PHCR ODM
  • Benjamin for laying the groundwork for Jackalope Doctrine DBAL
  • Dan for his work on the PHPCR ODM Query Builder

What is next?

This release of PHPCR provides compatibility with JCR 2.1. Jackalope, the reference implementation of PHPCR, integrates with Jackrabbit and Doctrine DBAL. The next steps consist of completing some of the optional features and further performance improvements. For example it would be interesting to make it possible to be able to use Solr/ElasticSearch in combination with the Doctrine DBAL implementation. Another feature we are looking forward to is improved logging and caching capabilities. We are also looking forward to work picking up again on the MongoDB implementation. We are also keeping an eye on the next major version of Jackrabbit, code name Oak. In fact we have already tested compatibility with the current releases together with the Adobe engineers. But generally we are most looking forward to people to add PHPCR to their applications where ever they feel they can benefit from a storage solution that provides unstructured content in a tree structure with support for node types, binaries, versioning and full text search. Case in point we are looking forward to the imminent release of Symfony CMF.

Tags: , ,

Symfony2 CMF: One hackday closer to the 1.0 release!

Last friday, a full dozen of Symfony2 Content Management Framework developpers gathered at the Liip Office in Zurich, Switzerland to exchange on the state of the project. We had people from England, Netherlands, France, Germany, Austria and Switzerland.

We went over all bundles and sorted out issues and pull requests between 1.0 and later. We also discussed many open questions and decided on topics. We now effectively entered a scope freeze phase, meaning we don’t want to add any new features to the 1.0 goals. Pull requests for open 1.0 issues as well as bugfixes are very welcome however.

Release plan

For a couple of weeks, there has been a “beta1” version of the CMF out. We still have a couple of BC breaks planned, but since beta1 they must be documented in the CHANGELOG.md file. A stable release will take some time yet, unfortunately. We postponed our goal release date to end of July. According to the dashboard tool there are about 60 issues until then, ergo we have to close 12 issues per week. This sound like a lot – on the other hand we where 12 people at the meeting, so it would mean everybody has to solve one issue per week.

We also sketched the release process for after the 1.0 release is out. An intermediate 1.1 release is planned for autumn, after that we will follow the core Symfony release cycle with a delay of about one month.

How can you help?

If you would like to contribute, please look into the issues in milestone 1.0 of each of the CMF bundles. While we all love great new features, we have to focus on cleaning the existing features and making sure they are future proof. If you don’t know where to start, please ask on the mailinglist or in #symfony-cmf on irc.freenode.org.

Jackalope-jackrabbit 1.0.0 Beta 1 released

Yesterday i tagged version Beta 1 of jackalope-jackrabbit, our PHP Content Repository (PHPCR) implementation. PHPCR is an API to manage tree-structured data, modelled after the Java Content Repository JCR specification. Time to summarize what jackalope can do today. For people already familiar with Jackalope, I summarize the recent changes at the bottom.

State of Jackalope-Jackrabbit

The basic node API is implemented and working fully. You can do direct access to paths and walk the tree hierarchy and read and write data. All data types including binary streams are supported.
We do support the workspace and session write operations including cross-workspace synchronization of nodes.

Queries can be expressed in the JCR-SQL2 language as well as the query object model. Additionally, jackalope provides a fluent query builder on top of the query object model. Most features of query are supported, the exceptions being query parameters and storing a query as stored query.

Exporting the repository to the standard “system” format and also the “document” format works, as does re-importing any repository dump (including those generated by Java JCR implementations).

Node type definitions can be inspected and are used to restrict allowed data when used. Both the CND file format and the object model to define custom node types are supported.

The observation journal is a log of everything that happened in the repository since a specified timestamp. It can be filtered to just see events an application is interested in. Filtering is for example done by repository path and by type of event like adding, changing, moving or removing.

Versioning is implemented and working. You can make nodes versionable, create new versions, inspect the version history to see old versions and restore old versions. Activities and version labels however are not yet implemented.

Session scoped locks are implemented. They allow to block a node, and optionally its subtree, to synchronize operations.
We did not yet implement open scoped locks, but that should not be too hard to do.

Besides building PHP content management, Jackalope-Jackrabbit can also be used to connect PHP applications to the Java jackrabbit based systems Magnolia CMS and Adobe CQ (currently called Adobe Experience Manager)

A couple of optional PHPCR features are not (yet) implemented:

  • Permissions and capabilities: Checks whether the current user is allowed to do an operation
  • Same Name Siblings: multiple children of the same node each having the same name
  • Shareable nodes: would allow a node to have more than one parent
  • Access Control Lists (ACL), because Jackrabbit does not expose this feature over the remoting protocol jackalope uses
  • Lifecycle mmanagement
  • Retention and Hold
  • Transactions, because Jackrabbit does not expose this feature.

It is not a design decision not to support those features. We would be happy about contributions to those features, though right now we focus on stabilizing and improving the features already supported.

Changes

Bootstrapping

The RepositoryFactoryInterface was defining static methods. This is not legal PHP but the interpreter accepted it. We changed them to class methods and require the factory to have a no-argument constructor. Instead of

JackalopeRepositoryFactoryJackrabbit::getRepository()

you now need to do

$factory = new JackalopeRepositoryFactoryJackrabbit; $factory->getRepository();

Workflows

Jackalope now supports the SessionInterface::cloneFrom method and NodeInterface::update methods. This allows to copy a node into a different namespace, modify it there and then synchronize the changes back. With this, you can build workflow concepts.

Observation

The observation journal now is now usable. We figured out how to only get the events you actually care about when using the skipTo method. You can now use jackalope-jackrabbit as a message queue.

CLI commands

You have now commands to move a node, to touch (= create) a node and properties and to list the node types present in your repository.
The CLI commands are reorganized to a logical naming schema. We grouped them by what they affect: phpcr:node:*, phpcr:workspace:* and phpcr:node-type:*. We now differentiate between workspace:purge that deletes the whole repository and node:remove that can only delete a subtree but never “/”.

Performance

Besides some performance tweaks, we implemented the new PHPCR NodeInterface::getNodeNames method. This method allows to get the names of children of a node without actually needing to fetch them from the backend. Perfect for showing a list of nodes without the overhead.

Node Type Definition

We can now convert between the CND file format and node type objects and back. This means you can register node types in jackalope-jackrabbit by using the node type API.

Stability tweaks

The import and export got improved to work more reliable and some bugs are fixed. Jackalope-jackrabbit takes more effort now to avoid same name siblings from being created. If they should still happen to be created, deletion is fixed to be able to delete them. We fixed issues with the types of values returned by queries.

Preliminary Adobe CQ support

We already have Magnolia CMS support since last year. Adobe CQ is another CMS built on top of the commercial distribution of Jackrabbit called CRX. We tested compatibility with CRX and jackalope can now read and write data from that content repository.

Tags:

For the very first time: Hello UK!

I have been on the conference tour for quite some time now and on top of that I travel about every 2nd weekend to some frisbee tournament around the globe. Yet I have never visited the UK. So far the most I have seen of the UK was transiting via direct bus from one London airport to another. So I was quite thrilled when my talk about the Symfony2 CMF was accepted for PHPNE in Newcastle. I was also quite keen to learn more about the PHP community over there. At any rate I flew in on Monday evening and made my way to the hotel in a light drizzle which perfectly matched my image of UK weather. But even in the dark one could make out the historical feel to the architecture in the city center. I cut the sightseeing short and crashed into bed. Next day I made my way to the conference venue which was set at a movie theatre. There were countless busy bee’s from the organization team and in general this conference was organized top notch. Quite an impressive achievement given that this was the very first PHPNE. The theatre also provided top notch projectors and of course comfy seats.

Rowan kicks off the show

Rowan did an awesome intro keynote talking about how to make better developers, touching both and practical aspects in terms of tech setup and continuing on to useful tips about self education and human interaction.

API Design by Alex

Next talk I listened to was about API design by Alex. It was a combination of general tips and lesson learned from introducing an open API at a university. I pondered raising my hand when the presenter flat out recommended to always go with JSON over XML for REST APIs. Personally JSON is my preferred format when the client is usually a browser as it integrates so naturally there. For server to server interaction I much prefer XML, since its simply a much better format when one needs to evolve data format as one can get away with many changes without requiring a new API version.

Bastian on logging and metrics

After this talk I attended was delivered by Bastian about logging and metrics. This is a topic I find quite important but unfortunately I have found its hard to convince clients to invest into this as they tend to focus more and features end users can see. However to ensure a stable environment which is sort of the basis for ensuring that customers can enjoy your site, logging is key. Furthermore only with metrics it is possible to figure out what features customers actually use, where they might get stuck and what could be easily dropped. It was nice to see that the tools Bastian chose to introduce are exactly the tools I had on my radar already. He specifically mentioned graylog, logstash, statsd and graphite.

Lunch time chat on knowledge exchange

Over lunch I ended up chatting with Rowan about internal knowledge exchange. Rowan works for Invica. Just like Liip they have multiple offices and deliver projects using various frameworks and applications. Interestingly they seem to have several teams that span across offices. At Liip we of course also sometimes collaborate between teams in multiple offices, however project teams for the most part always come together in a single office.

Fabrice showing the path to Symfony2

After lunch the sessions commenced again. I had seen Fabrice’s talk about migrating legacy applications to Symfony2 before. However it has evolved quite a bit since then. I will definitely apply many of his tips next time I have the task to migrate a legacy application and look forward to Theodo release some of their solutions as open source.

Intro to the CMF

Right after Fabrice’s talk I delivered my CMF presentation to a noticeably smaller crowd. I hear the talk that was running in parallel about dealing with failures was very good. At any rate I must admit I have seen this pattern before at other conferences. I tend to have a fair number of people asking questions in my talk and afterwards even more. So I figure that the market for people willing to explore a new content management approach is simply a bit smaller than other topics, yet for those who do have issues with the current solutions its quite an important topic. At the start of my talk I asked what frameworks people are using and it validated a feeling I had gotten in the previous talks: There was no clear goto framework for conference attendees. The preferred frameworks were all over the map.

Theij’s on the cloud

The closing keynote was held by Theijs who did a very energetic delivery to drive home the point that really the cloud is nothing more than .. the internet! Additionally he provided several good hints about what things to consider when using SaaS.

Bar chatting

After the closing ceremony with lots of prizes we then moved on to dinner and finally everybody met back at a local bar where we had free drinks all night. However instead of simply getting drunk there were many lively discussions. I talked quite a lot with people about Symfony2 and the need for RAD development layers on top of the core. I also chatted a bit about e-commerce solutions in the PHP scene, specifically Magento, Oxid, FoxyCart as well as Vespolina and Sylius. At any rate, people were so busy talking that the free drinks money actually only ran out just before the bar was closing anyway.

CMF workshop

For me however the conference wasn’t quite over then as thanks to a late afternoon return flight I had offered to provide a CMF introduction workshop on the next day. Anthony had managed to secure funds to give us a room large enough that I could give 10 people a 4 hour run down of all the details about the CMF with much more depth than my talk at the conference. Some people had already played with the CMF while others had only heard about it in a blog post. I led the attendees decide how to best spend the time. In the end all attendees installed the Symfony2 CMF Standard Edition, I answered several specific questions that some of the people had that already played with the CMF but aside from that I just ended up going through the PHPCR and Routing/Menu slides, showing a bit more about the SonataAdminBundle and various other pieces. After the workshop I grabbed lunch together with Lars and Marc. Lars had been posting various questions about how to implement publishing workflows on the mailing list and is now diving into implementing some missing functionality into the Jackrabbit transport layer of Jackalope, which was just merged.

Wrapping up

Overall I must say it was a great conference and I was especially grateful that I could do the workshop. As we are approaching the first stable release its important to get as many people double checking our concepts as possible. Secretly I am also hoping that the attendees will take what they have learned and spread the word in their cities and organizations. Even with the internet frameworks and applications tend to be strongest in close geographical proximity to the core developers. So having an opportunity to make the Symfony2 CMF known outside of continental Europe will hopefully help grow the user base.

Tags:

So when is the Symfony CMF release?

In January, Lukas wrote a collection of things left to do. Later I wrote a tentative release schedule that turned out to be too optimistic. I just updated that document with new dates. Sorry about this.

There are two actually quite cool reasons for the delays. One is that we had two projects at Liip where we had to integrate the CMF into existing projects. It was fun, but we found quite a lot of issues and missing features in Doctrine PHPCR-ODM that we fixed resp. implemented. (The Symfony2 Form component is incredibly powerful, but requires the persistance layer to work very exactly and we did not want any more workarounds and hacks to a achieve functionality.) The other reason is that many other people started using the CMF too. Some found issues that need to be fixed, others even managed to contribute fixes themselves – but which took time to review and comment on. Also, a lot of new features have been built or are currently being built.

One feature I am particularly happy with is that PHPCR-ODM can now cascade persist referrers. This is quite useful for example to embed route and menu editing into the edit form for static content, as shown in the screenshot. (This is already in the ContentBundle and should come into the sandbox with the next update.)

Getting to a release

As soon the above 2 client projects have stabilized, Lukas and I plan to focus on cleaning the base layers PHPCR and Jackalope to a point where we can tag release candidates, then bring Doctrine PHPCR-ODM to a state that can be called 1.0. We might have to postpone new features, but try to find all the interface changes and similar things to avoid BC breaks once 1.0 is stable.

Getting things out depends on contributions from the community too. While it is great to get new features, we now need to focus on cleaning up those we already have. On improving the documentation. On investigating bugs. The Release plan links the important missing pieces, if you want to dig into something.

There is the idea for a hackday in May or June. Please contact me or Lukas if you are interested. If we have enough interest, it is much more likely that we actually organize something.

I will present PHPCR at the Symfony Live conference in Paris beginning of April. I will also be at the hackday on Saturday if anybody is interested in meeting me there. Additionally, Thomas and me planned a (very focused) hackday on April 3rd to integrate SonataPagesBundle with PHPCR on the 3rd. If you are interested in helping with that integration, please contact me or Thomas.

Documentation hackday

This week, we also did a hackday at Liip, with focus on documentating some of those new features people built. I tried to collect what we achieved that day:

Jackalope and Magnolia CMS: Recording online, questions and answers

Last thursday, i did the webinar about PHPCR and Magnolia CMS. You can download the slides or watch the recorded presentation (you need to register to see it). Thanks to all the attendees, I hope you enjoyed it.
There where some questions that i want to answer here on the blog to have the answers available to everybody.

Question: How reliable is Jackalope, can i trust my data to it?

Answer: Jackalope is still on the young side, but running successfully in a couple of real life applications. We also built a PHPCR API test suite that checks on many features. The PHPCR test suite is by no means as complete as the JCR technology compatibilty kit (TCK) testing suite. With jackalope-jackrabbit, the data is stored into Jackrabbit however. Jackrabbit is really mature and validates the data coming in over DavEx. In the very worst case when Jackalope totally fails, you could still write Java code or use Java tools to interact with Jackrabbit directly. In my experience I never needed to resort to that.

Question: You mentioned that same name siblings are not supported by Jackalope. Is there any particular reason for Jackalope for not supporting this? Is support coming, or is this a design choice?

Answer: Same name siblings is one of the optional chapters of JCR. Jackrabbit supports them, so it would be doable to implement them in Jackalope as well. It can be rather tricky to do it, having unique child names made a lot of the code easier. But if somebody wants to take the effort to implement it, I would be happy to provide support and help understanding implementation details.
The main reason to have them, by the way, is that JCR / PHPCR can import arbitrary XML documents, and such documents tend to have same name siblings, just think of a XHTML document for example.

Question: What if I need a particular PHPCR feature not yet implemented in Jackalope?

Answer: Jackalope is an open source project. Contributions are always welcome, and the community will help. Liip is also open to consulting work for such tasks. Note that there are a few chapters that will be impossible to implement with jackalope-jackrabbit until the jackrabbit side implements exposing them over DavEx. This is not impossible, but considerably more effort. The next major version of Jackrabbit, codename “Oak” will focus more on feature completeness over the remoting protocol.

Question: Where are queries executed? Is there any way to measure the impact on the Magnolia repository? (I wouldn’t want my main sites performance to degrade dramatically) Is there some sort of cache or index on the PHPCR side?

Answer: Normal requests for nodes are cached by Jackalope for the PHP session. SQL2 or QOM queries however are not cached. The impact will not be higher than when implementing the functionality in Java in a custom module inside Magnolia CMS. The load should be slightly lower even, as the DavEx remoting is more low-level than the full JCR API. And the PHP application can life on its own separate server. Nontheless it makes of course sense to cache rendered pages or page-fragments on the PHP side. But that is an application level task.

Question: DavEx provides to the Jackrabbit used in the Magnolia CMS. Is there a way to connect to different repositories?

Answer: DavEx is an implementation specific protocol invented for jackrabbit to do remoting. Jackalope thus can talk to any Jackrabbit instance, and it has no problem opening several sessions with different servers. But connecting to any non-jackrabbit JCR implementation is not supported. JCR is an API, not a protocol. Note that there are ideas to generalize the jackrabbit-protocol into another standard called JSOP.

Question: Something is unclear to me: Are calls from Jackalope to the Jackrabbit repository bypassing Magnolia?

Answer: Yes, exactly. The Magnolia CMS DavEx module just provides a servlet that receives the requests and pipes them through to the jackrabbit DavEx layer. This is needed because Magnolia CMS runs Jackrabbit in its own process and does not use the DavEx remoting.
The following diagram shows the interaction between Jackalope, Jackrabbit and Magnolia CMS. The PHP application is talking to the PHPCR interfaces, implemented by Jackalope. Jackalope uses the HTTP based DavEx protocol to talk to the Jackrabbit DavEx handler which is exposed by the servlet Magnolia plugin. Magnolia CMS is accessed through web requests and uses its in-process connection to Jackrabbit to read data. If Magnolia CMS needs to know about data changes done by the PHP application, it would have to use the observation feature of Jackrabbit to listen for changes on the content.


When not using Magnolia CMS, you could also run jackrabbit standalone or use the .war archive in any servlet container like Tomcat.

Question: Is there also support for the other “direction”? A case where Magnolia would use a PHPCR-backed repo/app instead of being the master?

Answer: This is partially answered above – JCR and PHPCR are APIs for code, not a protocol. A PHPCR server implementation would need to expose its content over DavEx for jackrabbit to connect to it. Besides the effort to build that, the scenario does not make too much sense. PHP is typically used in request-response setup where all objects are lost after a request. Such a server would be extremly slow as it would spend most of the time bootstrapping. While there are solutions for building PHP application servers like react, i don’t see a real benefit.
What can make sense is to configure Magnolia CMS to talk to a remote Jackrabbit rather than use the in-process Jackrabbit bindings. Then Magnolia CMS and the PHP client would both connect to the same repository outside of Magnolia CMS.

Question: Will the Magnolia DavEx Module be available as a Maven dependency?

Answer: It is available in the maven repository of Magnolia CMS. Here’s the dependency snippet:

<dependency>
<groupid>info.magnolia.davex</groupid>
<artifactid>magnolia-module-jackrabbit-davex</artifactid>
<version>0.2</version>
</dependency>

You can use the http://nexus.magnolia-cms.com/content/groups/public/ repository, which contains all public artifacts from Magnolia, CE and Forge, releases and snapshots.

Or – for just the Magnolia Forge projects, either of these, depending on what you’re after:

  • http://nexus.magnolia-cms.com/content/repositories/magnolia.forge.releases/
  • http://nexus.magnolia-cms.com/content/repositories/magnolia.forge.snapshots/

Tags: , ,

Announcement: PHPCR and Magnolia CMS: Bridging the PHP and Java Worlds

Liip is a PHP company but we are not agnostic to what happens in other fields. And sometimes we need to integrate with other systems like a Java based CMS. Rather than using something radical like Quercus, a Java implementation of the PHP language or the rather fiddly PHP/Java Bridge, we wanted something less intrusive and more general purpose.

Taking inspriation from the proven Java Content Repository (JCR) standard, we created PHPCR by simplifying the Java interfaces. The content repository combines the best of document-oriented databases and XML databases, providing developers with a well-defined API to access and manage content. Ebi, Chregu and I started implementing the PHPCR interfaces in jackalope which can talk to Jackrabbit, the JCR standard implementation. The neat thing of this is that you can reuse Java content from a normal and simple PHP environment, through a clean and powerful API.

Last year we had a hackday where we integrated PHP Jackalope with the Java MagnoliaCMS. On 28 Feb 2013, I will present the topic as a MagnoliaCMS webinar entitled “Connect PHP Applications with Magnolia CMS through PHPCR”.

If you’re a PHP developer interested in getting your feet wet in the Java world, or a Java developer interesting in seeing PHP code talking to your favorite Java CMS, this webinar is a must-attend for you. You’ll find more information and a registration form for the webinar on the Magnolia website. Please subscribe and attend the presentation online!

Tags: , ,

Symfony CMF: what is left todo?

Just as Fabien did in his “Symfony 2.2 Schedule Update” I
would first like to wish everyone a happy 2013. But as Fabien did, I also want to get back to business now too.

Over the holidays several people in the community have been quite busy. Especially Emmanuel
and Daniel have been pushing things forward. As a result the
SonataAdminBundle integration has been improved considerably.
Daniel has made several contributions to PHPCR ODM. Henri also just published
an update on the work going on around create.js that we use to
provide inline editing. At the same time there are now several new contributors each month bringing a stream of
code and documentation improvements. I especially want to mention Tiago here, as he
did a lot of very important additions and corrections to the documentation.

At the same time Liipers have also worked on several topics. David is putting the final
touches on updating PHPCR and Jackalope
to match the recent changes done in the JCR 2.1 (ie. JSR-333) spec. Many of the changes
there were a result of feedback from the PHPCR team, so this update will bring some important improvements. Adrien has brought the integration
of create.js for content authoring to a usable state.
I have been busy with some smaller improvements to Jackalope Doctrine DBAL
and some bigger changes to PHPCR ODM. At the same time we are quite excited that both
ezPublish5 and
Drupal 8 will leverage the CMF Routing component to handle the needs for dynamic routing.

So things are progressing and for people willing to live a bit on the edge, all components are quite ready
for production use even today. However we are still stuck in alpha state, which means we try not to break BC unnecessary but it
still happens frequently. The next step is going to beta, at which point we will become more hesitant to break BC
and where we will always provide upgrade documentation and if possible also an update script. In my humble opinion
there is really no reason not to move all parts in the stack to beta more or less right now. Here is what I believe
to be part of our stack:

I added a (*) to PHPCR, since there will in fact not be a stable release before JSR-333 is ratified, which I expect
to happen sometime in 2013. However I do not expect anything else but minor tweaks if any.

As for the (**) here I mean that we also need stable releases of dependencies, like f.e. FOSRestBundle
and createphp. Also note that I specifically omitted SearchBundle as
I am not sure if it will be ready in time and its not really core functionality per se.

As our goal is to make a stable release around the time of the 2.2/2.3 core releases we however have to quickly
move past the beta label so that we can provide a reliable development platform. In order to get there
I think we need to do work on the following:

1) Jackalope (MUST)

  • Merge the JSR-333 updates mentioned above.
  • Fix the left-over issues from the initial work to provide XML import/export
  • Wrap up the open PRs in the Doctrine DBAL implementation

2) Create.js (MUST)

  • Fix the open issues in our integration
  • Create an example for inline block creation/ordering in the sandbox and standard edition
  • Improve support for non visible metadata (like tags, dates, publish states etc.)

3) Documentation (MUST)

  • Continue to make improvements to the documentation

4) KnpMenu (SHOULD)

5) Sonata (SHOULD)

  • Add more standard blocks (f.e. slide show)
  • Add support for reordering child nodes in the tree UI
  • Add support for SonataCacheBundle to allow for caching of block content
  • Add support for SonataMediaBundle to allow for more flexible media asset management

Anything I missed?

Tags: , ,

Jackrabbit and its two SQL languages – some findings

PHPCR is an important technology for us at Liip. The most mature content storage implementation to be used with PHPCR is Jackrabbit, which we for example use on liip.ch.

Jackrabbit provides – among many other features – a pretty advanced search engine based on Lucene. It has different Query Languages, called SQL, SQL2 and XPath (from different generations of the JCR specs). That two of them are named SQL often leads to missunderstands how Jackrabbit works.

This article is about those and the differences between them. We had to learn the hard way in some of our projects that they’re not handled the same deep inside of Jackrabbit. The documentation is unfortunetaly pretty sparse in that regard, I hope this blogpost helps a little bit for clearing that up. (We won’t talk a lot about XPath, but it has the same behaviour as the older SQL implementation in Jackrabbit).

But first some background:

Jackrabbit isn’t a traditional relational database like MySQL or PostgreSQL. In fact they don’t have very much in common. It’s more a NoSQL database (which happens to use a SQL dialect for searching) or a Key-Value Store or a Document-Database than a RDBMS. Jackrabbit can use a RDBMS for storing its data, usually we use MySQL or PostgreSQL. But it’s only used to store the nodes in a quite flat fashion. It even does this compressed and somehow optimized, so that if you look at the content of those tables, it won’t make much sense. Jackrabbit doesn’t really need a RDBMS for that. It also can store to a file system or other “flat” storage layers. Adobe itself uses for example a Tar-file in their commercial Jackrabbit-based CRX. What this all means: it doesn’t really matter, where Jackrabbit stores its data and it’s not related to what features are available in the API facing to the developers using Jackrabbit.

This means that because the storage layer is completely hidden away and the data is stored quite flat and non-relational, it’s not possible to search this data directly on the storage layer. That is where the Lucene index comes into play. When we save something in Jackrabbit, it gets stored in the storage layer, but also indexed by Lucene. Later, when we search for something, that Lucene layer is used to search all the nodes which met the search criteria and then the matched nodes are fetched from the storage layer (if not already in the cache).

Jackrabbit can’t directly use the result from the Lucene results, it has for example to check if you’re allowed to access that node via its ACL methods. That’s also a reason why there’s no fast, reliable COUNT() in Jackrabbit. What happens here is that Jackrabbit fetches all the node-ids from the Lucene index (this one is fast) and then checks for each node, if you can access it. This can take quite some time if you have a large result set. It’s no problem for small resultsets, no matter how big your total data is, Lucene is quite good in that.

And what has this all to do with all that SQL1 and SQL2 talk in the beginning?

There are two major JCR (Java Content Repository) Specifications: 1.0 (JSR-170 from 2005) and 2.0 (JSR-283 from 2009). In JCR 1.0 there were 2 query languages defined. XPath and SQL. XPath was the main one (it made sense for a hierarchical DB structure which is quite similar to one big XML document) and an SQL dialect was retrieved from that. It was called SQL, because it had the well known “SELECT FROM WHERE *** ORDER BY” syntax, but it has nothing to do with relational databases. It also has no relation to the ANSI SQL1 from 1986/1989 (depending how you look at it, but it’s indeed better known as SQL-89), it wasn’t even called SQL1 in JSR-170, just SQL, but from now on I will refer to it as SQL1 (or maybe better JCR-SQL1 to make it clear that this is not the ANSI SQL1 from 1986, but if I ever talk of only SQL1, it’s JCR-SQL1 not ANSI SQL1). JCR-SQL1 didn’t have JOIN capabilities or anything else “fancy” (like LENGTH).

Those 2 query languages were somehow poorly defined in the specs and lacked features, so the JCR people went and built something compeletly new for JCR 2.0: An Abstract Query Model (AQM) which clearly defined the semantics of a search. For this AQM two concrete language bindings are specified: JCR-SQL2 and JCR-JQOM (JCR Java Query Object Model, a mapping of the AQM in Java objects and methods). Again, that SQL2 has nothing to do with 1992’s ANSI SQL2 (aka SQL-92), they just share some of the same syntax.

That new query model in JCR 2.0 added much more features, mainly JOINs. But this also made it much more complex to implement and to map those searches efficiently and performant onto Lucene queries (more about that below).

The XPath syntax was ditched in JCR 2.0, because people understood the SQL syntax way better than the XPath syntax (it’s still available in Jackrabbit)

JCR-SQL2 is supported in PHP’s Jackalope since almost the begining. The QOM since a few months. We didn’t use the QOM in our projects until recently, but SQL2 only

As said above, due to the possible more complex nature of JCR-SQL2 queries, Jackrabbit has to do more and can’t do everything on the Lucene indexes. For example due to JOINs, doing a proper ordering is much more complicated. This is again mainly a problem if you have large resultsets, but then it makes them really slow (like for example: Give me the 10 latest articles out of 10’000’s of articles). Unfortunately Jackrabbit does it the slow way also if you don’t use JOINs at all.

What we found out while analyzing this is that the older query engine for JCR-SQL1 and JCR-XPath is much more mature and does indeed queries like “Give me the 10 latest articles” pretty fast, even if you have 100’000s of articles in Jackrabbit. Due to the simpler nature of those queries and more maturity of the code, those are tuned pretty good. For example sorting happens before fetching the nodes from the storage layer. This fact is unfortunately not well advertised by the Jackrabbit community.

We invested then quite some time to get SQL1 running in Jackalope and make the QOM smart to choose between SQL1 or SQL2 automatically, depending on the complexity of the query.

We also changed some of the handcrafted JCR-SQL2 queries to JCR-SQL1 queries, because we did have queries which could potentially have many results. For example in fulltext queries, if you use common words you easily get many nodes as results and then in JCR-SQL2 all nodes were loaded into memory and then sorted. This was slow and trashed all the internal jackrabbit caches. With JCR-SQL1, Jackrabbit sorts them right in Lucene and only gets the actually top 10 nodes from the storage layer. This is always fast.

That’s why it’s really important to know when to use SQL1 or SQL2. You can’t go wrong with SQL1 and only switch to SQL2 if you need more complex queries. Or just use the QOM, then you don’t even have to think about that (but you still should know, what the impact of complex queries is).

I hope JCR-SQL2 queries will be one day as fast as SQL1 queries in Jackrabbit, but this won’t be an easy task after investigating a little bit into the code and corresponding Jira issues.

Even with that shortcoming, the good thing is that Jackrabbit does have a built-in powerfull search based on the industry standard Lucene. If you need more flexibility than this solution provides, you have to go the common path with many other setups, look into an external Solr or Elastic Search service (which use Lucene internally as well, but are much more configurable). With the upcoming “Jackrabbit 3” (working title “Oak”) that should be easy to implement directly on the JCR layer.

Tags: , , ,

JSDay & PHPDay 2012 Verona

From May 16th to May 19th the latest edition of jsDay and phpDay took place in Verona, Italy. Both are two-day conferences, the first one centered around JavaScript, the second around PHP (obviously). They are organised by the community (Grusp) which means they are much more focused on technology than on marketing. A number of Liipers were attending one or both conferences and some where even giving talks.

jsDay

JsDay directly started with a mindblowing talk by Mark Boas (The slides can be found here). He demonstrated his technique called “hyperaudio” which he uses to enhance audio content on the web with semantic information. Doing so he makes the audio content both crawlable by bots and accessible by visually impaired people. Also this technique gives great possibilities in terms of user interaction: It gets possible to highlight the currently spoken words, to spool exactly to a certain word or to switch the language of an audio file while it is playing.

Another highlight of jsDay clearly was JavaScript messiah Douglas Crockfords Keynote. He spoke about the functionality of our brain and why it makes it so hard for us to step away from subjective criteria when it comes to programming style.

In general the trends in the JavaScript world are all around emerging technologies like in browser data storage, 3D CSS transformations and the yet not mature native audio / video handling. A big topic was node.js which had a handful of talks devoted to it. Another trend in the JS microcosmos is the movement from simple libraries to ingenious JavaScript frameworks, as client heavy applications grow more and more complex.

phpDay

There have been a lot of interesting talks by people from the PHP community. The spectrum of talks covered a lot of ground, from highly technical aspects like how to build PHP extensions to more social or organisational aspects with talks about agile workflow and team communication. Below you find a list of the most remarkable talks:

Sum-Up

As always at conferences, a lot of interesting talks also happened at the infamous ‘hallway track’ and at the social events where we met many interesting people from all Europe and the states. The conference was really well organised by Grusp and the talks were well balanced.

team phpDay

At the social event on the phpDay we made a picture of all attending Liipers. Unfortunately Mirco wasn’t around so Derick Rethans (!!) kindly agreed to play the body double for him :-)

The next jsDay/phpDay will take place again next year from May 16th to 19th. We really recommend to go to at least one of the conferences if not both. If you go you’d do well to reserve some time to visit the beautiful city of Verona.

Notice

Unfortunately, during the end of our stay two major incidents occurred in Italy which shocked the whole country. Our thoughts go out to all the people affected by those tragedies.