Last week, some Liipers and I have attended the JSDay and PHPDay in Verona, Italy.Ā We have seen many interesting talks and could take home lots of new ideas and knowledge.
In this blog post I just want to bring close what I have learned about HTTP caching and RESTful Applications during the workshop by Fabien Potentier and the talk by David Zuelke.
HTTP caching
Workshop āCaching on the Edgeā by Fabien Potentier āĀ phpday.it/2011/session/caching-edge
Fabien started his workshop with the question: āWho in this room has read the HTTP specifications?ā. As you might already guess, there where very few who could answer with āYesā. To be exact: nobody but himself.
He seemed not to be very surprised, but on the other hand he was not pleased to hear people joking about. He meant that very serious and in the next hours we could then understand why.
Some facts about HTTP caching
HTTP/1.1 allows caching anything by default. That simple means, that every HTTPĀ responseĀ without a āCache-Controlā header is being cached.
In practice, servers do already avoid the caching of āCache-Controlā, āCookie / Set-Cookieā, āWWW-Authenticateā, āPost / Putā and āStatus Codesā. Important is to know, that they don't have to.
Cache headers only work with āsafeā HTTP methods like GET and HEAD. It's important to avoid changing the state of a server on a GET request. You should only change its state if you have to.
Cookies also prevent the page of being cached.
The cache headers
It's not always easy to decide which caching header you should use for which case. There are just some important facts about cahing headers that you should know about:
- The expires header is very unsafe to use, because there is no way to make sure that the time of all the machines a response is passing is the same. You should usually prefer the Cache-Control header using the property āmax-ageā.
- If you use the āmax-ageā property of the Cache-Control header, every reverse proxy automatically adds an āageā property to compensate the time the request between every machine needed.
- For both expires and cache-control response header, the request sends an āIf-modified-sinceā header.
- To avoid the reverse proxies of caching, you can add the param āprivateā. With this param, only the Browser is allowed to cache. On the other hand, you can set the ās-maxageā param to only let the proxies cache. This is used for Edge Side Includes (ESI).
- To set up a hash tag for a page, you can use the ETag header. This one can be checked by sending the āIf-none-matchā header in your request.
- Non-modified pages can respond with the code 304 āNot modifiedā.
_As we use PHP, we should be aware of that PHP automatically adds the āCache-Control: no-store, no-cacheā header whenever there is a session (sessionstart).
Edge side includes (ESI)
Edge side includes make it possible to only cache single parts of the page. Also you can set up own cache controls for each include.
To use ESI, you have to send two response headers: āSurrogate-Capabilityā and āSurrogate-Controlā.Ā Through these headers the server identifies itself and tells that it understands ESI.
You can then use the ESI-tags as follows:
<esi:include src="/path/to/content" />
More informations about ESI you can find underĀ www.w3.org/TR/esi-lang
There are some reverse proxies that support edge side includes, but the most common one at this time is Varnish. If you are interested in reading more about varnish, you can check out the project page:Ā www.varnish-cache.org
RESTful Applications
Talk āDesigning HTTP Interfaces and RESTful Web Servicesā by David Zuelke āĀ joind.in/3013
Another topic in most of our projects are their APIs. I attended this talk because I have already started to implement the internal API for the Migipedia project. It was very interesting to see, how RESTful web services would be supposed to be.
HTTP interface design
To start, I would like to list some bad URLs, that you can take a look at them and to have some examples to correct later on.
- liip.ch/api/1.0/products.format
- liip.ch/api/1.0/product/show/333.format
- liip.ch/api/1.0/products/filter/chocolate/sort/desc.format
- liip.ch/api/1.0/photos/filter/product/333.format
- liip.ch/api/1.0/photo/show/1234.format
REST ā DEFINED BY ROY THOMAS FIELDING
To answer what exactly could be done better, you have to know the definition of REST.Ā The rest approach could be used anywhere, it just matches very good to HTTP. Things REST is meant to be:
- There has to be a Client / Server connection
- It must be stateless
- The pages should be cacheable
- It has to be a layered system
- The system must beĀ uniform
- An URL identifies a resource
- Sub URLs are sub resources (HTTP specification)
- URLs have a implicit hierarchy
- Methods perform operations on resources
- Operation is implicit and isĀ notĀ part of the URL
- Hypermedia formats are used to represent the data
- Link relations are used to navigate a service
url problems solved
liip.ch/api/1.0/products/show.format
- Versions should be handled by defining a hypermedia format
- Formats can also be requested by adding a hypermedia format
- The actions (show, add) should be accessed by using methods (GET, POST)
Solution:
liip.ch/api/products
Accept: text/vnd.ch.liip.api.v1+html, application/vnd.ch.liip.api.v1+xml
Methods: GET, POST
liip.ch/api/1.0/product/show/333.format
- The actions (show, delete) should be accessed by using methods (GET, DELETE)
- Resource names should always be in plural (name should not change for single or multiple entries)
Solution:
liip.ch/api/products/333
Accept: text/vnd.ch.liip.api.v1+html, application/vnd.ch.liip.api.v1+xml
Methods: GET, DELETE, ā¦
liip.ch/api/products/filter/chocolate/sort/desc
- Filters should be defined as GET parameters
Solution:
liip.ch/api/products?filter=chocolate&sort=desc
Accept: text/vnd.ch.liip.api.v1+html, application/vnd.ch.liip.api.v1+xml
Method: GET
liip.ch/api/photos/filter/product/333
- Sub resources have to be sub URLs
- Retrieve (Get), add (POST) or erase (DELETE) sub resources by using the related methods. But keep in mind to protect your resources being changed or deleted by unauthorized users
Solution:
liip.ch/api/products/333/photos
Accept: text/vnd.ch.liip.api.v1+html, application/vnd.ch.liip.api.v1+xml
Methods: GET, POST, DELETE
Uncommon HTTP methods
Besides the usually used HTTP methods, there are a number of often forgotten but useful methods to be kept in mind. To be brief, I picked the imho two most useful ones:
- OPTIONS
If a client asks for information using this method, it should get the available HTTP methods for this specific location and user.
- PATCH
This method signalizes a partial update of the information of the given resource. This might become handy to keep a history of changes and reduces the amount of data to be send.
Hyperlinks
There was another problem, that most of the APIs on the web do not serve you with the thing that makes the Web ticking: hyperlinks.
Every response should contain hyperlinks to related or further contents. The consumer of the API should be able to find these pages by parsing the response. In an XML responseĀ for example,Ā you can add related hyperlinks (XLinks) in the following format:
<atom:link rel="product" type="aplication/vnd.ch.liip.api.v1+xml" href="http://liip.ch/api/products/333" />
Conclusion
You can now see, that both topics have HTTP headers in common. As a web engineer, you need to understand those headers to create web applications corresponding to the specifications of HTTP. That's why you should at least read some informations about the HTTP specifications, for example the Headers and the Status Codes.
By writing this post I could hopefully bring you the topics a bit closer and hope you have an idea of how you should use HTTP headers for caching and how RESTful applications should look like. If you want to learn more about this topic I recommend to read the following pages:
HTTP headers āĀ net.tutsplus.com/tutorials/other/http-headers-for-dummies
REST āĀ en.wikipedia.org/wiki/Representational_State_TransferXLinks āĀ en.wikipedia.org/wiki/XLink
Lovefilm.com API (suggestion by D. Zuelke)
Thank you very much for your attention.