Inspired by a post on Bertrand’s blog and especially a comment by Stephanie there, I implemented a language detection feature of blog entries on Planet Switzerland. If you’re only interested in eg. German posts, you can search for lang:de (or French or Italian or English) from now on.
I used the PEAR Text_LanguageDetect class for this feature and so far, it works pretty well, if you limit the detection to those four languages. If you take all 51 available languages into consideration, then maybe 10% gets “funny” languages assigned (like azeri, cebuano, hausa, hawaiian, tagalog, pidgin or any other european language). It’s still not perfect with limiting to those four languages, but mainly short texts are assigned wrongly and it’s way below 5%.
And because some people just like statistics, here’s the break down of how many posts are done in which language:
With the many many search options available in the mean time on Planet Switzerland, it was time to collect and write them down. So here it is: Search Options for Planet Switzerland.