Blog Posts

Table Inheritance with Doctrine

Introduction

Lately we had several projects where we had to store in a database very different items that shared a common state.

As an example take the RocketLab website you are reading: Events and BlogPosts are aggregated in the LabLog list as if they were similar items. And indeed they all have a Title, a Date and a Description.

But if you get the detail page of an Event or a BlogPost you can see that they actually don't contain the same information: a BlogPost contains essentially formatted text when an Event contains more structured information such as the place where the event will take place, the type of event it is, if people need to register to attend, etc..

Still we have to access those entities sometimes as similar items (in the LabLog list) or as different items (in the events list and in the blog posts list).

Naïve database model

Our first idea, and it was not that bad, Drupal does just the same, was to have a database table with the common fields, a field containing the type of item (it's either an event or a blog post) and a data field where we serialized the corresponding PHP object. This approach was ok until we had to filter or search LabLog items based on fields that were contained in the serialized data.

Indeed SQL does not know anything about PHP serialized data, thus you cannot use any of it's features on that data.

So how do you get all the LabLog items that are Events, happen in April 2012 and are "techtalks"? The only way is to go through all the Events records of April, unserialize the data and check if it's a techtalk event. In SQL you would normally only do a single request to find those items.

A better database model

There is a better way to model this in a database, it's called table inheritance. It exists in two forms: single table inheritance and multiple table inheritance.

Multiple table inheritance

Multiple table inheritance requires to use three tables instead of a single one. The idea is to keep the common data in a "parent" table, which will reference items either in the Event table or in the BlogPost table. The type column (called the discriminator) helps to find out if the related item should be searched in the Event table or in the BlogPost table. This is called multiple table inheritance because it tries to model the same problem as object inheritance using multiple database tables.

Multiple table inheritance

When you have a LabLogItem you check the type field to know in which table to find the related item, then you look for that item with the ID equals to related_id.

Single table inheritance

Alternatively the same can be modelled in a single table. All the fields are present for all the types of LabLogItem but the one that do not pertain to this particular type of item are left empty. This is called single table inheritance.

Single table inheritance

Single or multiple table inheritance

The difference is really only in how the data is stored in the database. On the PHP side this will not change anything. One may notice that single table inheritance will promote performance because everything is in a single table and there is no need to use joins to get all the information. On the other hand, multiple table inheritance will allow a cleaner separation of the data and will not introduce "dead data fields", i.e. fields that will remain NULL most of the time.

Table inheritance with Symfony and Doctrine

Symfony and Doctrine make it extremely easy to use table inheritance. All you need to do is to model your entities as PHP classes and then create the correct database mapping. Doctrine will take care of the hassle of implementing the inheritance in the database server.

Please note that the code I present here is not exactly what we use in RocketLab; we are developers and as such we always have to make things harder. But the idea is there...

The parent entity

In the case of RocketLab we created a parent (abstract) entity, called LabLogItem, that contains the common properties.

/**
 * This class represents a LabLog item, either a BlogPost or an Event.
 * It is abstract because we never have a LabLog entity, it's either an event or a blog post.
 * @ORM\Entity
 * @ORM\Table(name="lablog")
 * @ORM\InheritanceType("SINGLE_TABLE")
 * @ORM\DiscriminatorColumn(name="type", type="string")
 * @ORM\DiscriminatorMap( {"event" = "Event", "blogpost" = "BlogPost"} )
 */
abstract class LabLogItem
{
    /**
     * @ORM\Id
     * @ORM\Column(type="integer")
     * @ORM\GeneratedValue(strategy="AUTO")
     */
    protected $id;

    /**
     * @ORM\Column(type="date")
     */
    protected $date;

    /**
     * @ORM\Column(type="string")
     */
    protected $title;

    /**
     * @ORM\Column(type="text")
     */
    protected $description;

    /***** Getters and setters *****/

    public function getId()
    {
        return $this->id;
    }

    public function setDate($date)
    {
        $this->date = $date;
    }

    public function getDate()
    {
        return $this->date;
    }

    // And so on...
}

There are several things to note about the mapping:

  • @ORM\InheritanceType: indicates that this entity is used as parent class in the table inheritance. This example uses single table inheritance, but using multiple tables inheritance is as easy as setting the parameter to "JOINED". Doctrine will create an manage the unique or multiple database tables for you !
  • @ORM\DiscriminatorColumn: indicates which column will be used as discriminator (i.e. to store the type of item). You don't have to define this column in the entity, it will be automagically created by Doctrine.
  • @ORM\DiscriminatorMap: this is used to define the possible values of the discriminator column as well as associating a specific entity class with each type of item. Here the discriminator columns may contain the string "event" or "blogpost". When its value is "event" the class Event will be used, when its value is "blogpost", the class BlogPost will be used.

Basically that's the only thing you need to use table inheritance, but let's have a look at the children entities.

The children entities

We have two regular entities to model the events and blog posts. Those entities extend LabLogItem.

/**
 * Represent a blog post item.
 * Note that this class extends LabLogItem
 */
class LabLogItemBlog extends LabLogItem
{
    /**
     * @ORM\Column(type="text")
     */
    protected $content;

    /***** Getters and setters *****/

    public function getContent()
    {
        return $this->content;
    }

    public function setContent($content)
    {
        $this->content = $content;
    }
}

/**
 * Represent an event item.
 * Note that this class extends LabLogItem
 */
class LabLogItemEvent extends LabLogItem
{
    /**
     * @ORM\Column(type="string")
     */
    protected $eventType;

    /**
     * @ORM\Column(type="string")
     */
    protected $location;

    /**
     * @ORM\Column(type="boolean")
     */
    protected $requiresRegistration;

    /***** Getters and setters *****/

    public function getEventType()
    {
        return $this->eventType;
    }

    public function setEventType($type)
    {
        $this->eventType = $type;
    }

    // And so on...
}

There is not much special in the children entities. An important thing to note is that the common fields defined in the parent entity LabLogItem SHOULD NOT be repeated here. Also you may notice that there is no annotations in the children such as @ORM\Entity to indicate that they are entities. Indeed they will inherit the annotations of LabLogItem and become entities.

From now on, when you create a PHP object of type Event and ask the entity manager to persist it, Doctrine will automatically do the complex work for you. From the developper point of view, Events and BlogPosts are just entities like any other.

It's easy to do operations on items which you don't know exactly the type:

$item = $entityManager->getRepository('RocketLabBundle:LabLogItem')->findOneByDate($someDate);

// Here we don't know exactly whether $item contains a blog post or an event...

if ($item instanceof Event) {
    // Then it's an Event
    echo $item->getEventType();
} else {
    // Otherwise it's a BlogPost
    echo $item->getContent();
}

But, if you know the type of item you are using you still can use them as regular entities:

$item = $entityManager->getRepository('RocketLabBundle:Event')->findOneByDate($someDate);

//  We have searched the Event entity repository so what we get in $item MUST BE an Event
echo $item->getEventType();

Conclusion

As you can see above using table inheritance with Symfony and Doctrine is very easy. It's just a matter of creating the parent class and the correct mapping. Furthermore you can switch from single to multiple table inheritance by modifying one line of code.

This technique should be used whenever you need to store items with a common state but that are very different in their nature.

About the author

Comments [11]

Stefan K., 28.03.2012 00:25 CEST

You get this type of table inheritance with PostgreSQL out-of-the box. See http://www.postgresql.org/docs/current/static/ddl-inherit.html . Although, I don't know how adaptive Symfony and Doctrine are, to cope with this.

Khepin, 28.03.2012 02:31 CEST

Something you might be interested was how FriendFeed was using MySQL to store schema-less data (article here: http://backchannel.org/blog/friendfeed-schemaless-mysql)

They basically only ever stored serialized objects and then created tables with just the fields they wanted to index and query on. A side benefit was that they could create or remove indexes and fields on indexes on the fly.

Or maybe using a search engine rather than directly querying objects on your data store could be interesting as well.

Not saying that these directly apply to your particular case. But interesting approaches for similar issues.

lxwebsolutions, 28.03.2012 12:00 CEST

really nice tutorial just shared with a colleague

david, 28.03.2012 12:10 CEST

@khepin: there are use cases for this, yes. but for a simple case like we encountered, it was a really bad choice. the jcr implementation jackrabbit for example does that: serialize the data into table cells and using lucene to search. in the jackalope-doctrine-dbal implementation of PHPCR we do it similar, but for now without lucene and thus without full search.

Jory Geerts, 29.03.2012 11:02 CEST

Nice article.
I would personally go with multiple table inheritance (which the Doctrine documentation confusingly calls 'class table inheritance') since it is a cleaner setup and the performance differences shouldn't be that big. (For small-ish datasets anyway.)

I worked on a project where we had a bigger inheritance tree, I think 6 entities with one common parent, and one of those 6 had two child-entities itself.
Do you guys have experience with that kind of a setup? (Either in Doctrine, or in general - for the project I worked on we just had 6 tables and a buckload of duplicate fields. :) )
I'd say the number of "dead data fields" gets pretty high pretty fast.

Ernst, 04.01.2013 11:23 CEST

Thanks for the informative post. I think it's explaining the topic very good.

There's one mistake regarding drupal: "..., Drupal does just the same, ... and a data field where we serialized the corresponding PHP object."

Actually Drupal core (d6/d7) does implement a kind of multi table inheritance and it does not store any serialized data for nodes in the node, node_revision and field_data_body(d7) tabels. If you are using CCK(d6)/Fields(d7) it actually creates tabels for your subtypes.

To be precise: Drupal is not using any of the OO features of PHP for node types. But custom node types can be seen as a subtype of the general node type.

Contributed modules can ouf course extend the node/node_revision tables and store serialized data. Your post correctly highlights the problems of this approach.

HTH
Ernst

Damien, 10.07.2013 17:25 CEST

Do you know how to create relations on the children entities ?
I have a parent entity and 15 children entities, some of them have relations with other entities.

I did everything fine, but Doctrine keeps telling me that my mapping is inconsistent :(

And do you know how to query multiple children in the parent repository.
For instance : where type in (1,2,5,9) ?

Thanks

sensi, 07.09.2013 20:07 CEST

Nice tutorial, really helpful.
Is it possible to use this method with a "mappedSuperclass"?

Jeroen Schouten, 14.10.2013 20:59 CEST

I got the following exception:

[Doctrine\ORM\Mapping\MappingException]
Class "App\Entity\Child" sub class of "App\Entitiy\Parent" is not a valid entity or mapped super class.

The fix is easy: The children entities must have a `@ORM\Entity` annotation in order to work!

Damien, 16.10.2013 10:26 CEST

Of course. Each entity must have the annotation @Entity.
Besides you can either set a unique repository for each child entity or set the same repository (the parent one for example) for all child entities.
Services and factories are very helpful for that.

ALF, 04.07.2014 15:36 CEST

$item = $entityManager->getRepository('RocketLabBundle:LabLogItem')->findOneByDate($someDate);

// Here we don't know exactly whether $item contains a blog post or an event...

if ($item instanceof Event) {
// Then it's an Event
echo $item->getEventType();
} else {
// Otherwise it's a BlogPost
echo $item->getContent();
}

I don't understand. Can you explain me how you can get the event type on LabLogItem object. Since you fetch LabLogItem and you don't load LabLogItemEvent, i don't know how you can call the method getEventType and get a result... Or doctrine is dealing with that and automatically load the good instance?

Add a comment

Your email adress will never be published. Comment spam will be deleted!