Blog web aggregation and problems with it

With the recent rise of web based feed aggregators like Planet PHP come also some problems. Not with Planet PHP, of course, but with other sites, which do similar stuff, without asking the content producers.

I publish my blog content under the Attribution-NonCommercial-ShareAlike of Creative Commons. This means, you can use my content on your side, as long as you do proper attribution, don’t use it for commercial purposes and publish it under the same or similar licenses. Fair enough, IMHO. It’s of course difficult to draw the line between “commercial” and “non-commercial” and also what proper attribution is. Eg, does putting Google Ads already make a page commercial? Is a link to the original article enough attribution? Unfortunately, most of the blogs aggregated on Planet PHP do not have any license statements on their page (I didn’t look through all), but let’s assume that most of them at least agree to a Attribution like license, ’cause that’s basically what the PHP license also is.

But what can be found on the net (doesn’t really matter if the feed is from Planet PHP or from my blog) is sometimes really crossing the line. First case: LiveJournal. The have a user called planet_php, which just aggregates the Planet PHP feed and shows the headlines in an overview. Fine (even though the non-commercial aspect is debatable for LiveJournal). But if you click on one of those headlines, you don’t get to the original article, but to a separate page with the full content (and finally a link to the original article). But there’s even a comment function for that article on LiveJournal. WTH :) I want the comments on my post, not on LiveJournal’s.. besides the “spamming” problem Aaron described once

LiveJournal is a border case, so is phpn.org, which does basically the same as LiveJournal sans commenting, but with GoogleAds.

But who really is going over the top is phpclasses.org and their newsletter. They aggregate their “Latest news” from phpn.org, but again put the full content on a separate page for reach post. The link to the original article goes then to phpn.org, again to their own page with the content, and there I finally have a link to the really original article. So I have to click 3 times until I get to the real article. No proper attribution, and I don’t have to tell you how many ads are on the phpclasses pages…

What’s the difference to what Planet PHP does?

  • We don’t put single pages online with the full content of the blog posts, we directly link to the original article, also in the RSS feed. So if you click on a link in your RSS reader, you directly get to the original post and not to Planet PHP. Even though we put the full content online, it’s in a overview page with 9 others posts which changes constantly, so Google spamming isn’t really an issue here IMHO. And the log-stats also tell me that not many people coming from google with generic search queries (they do come with stuff like “php news”, “php planet”, etc, which is what it is :) )
  • We usually get asked by the blog authors, if we want to take their feed on Planet PHP. I assume, they agree then with the way we use their content. We didn’t ask all the authors, when we started, but I assume they would have complained until now, if they don’t agree (I don’t think, they could have missed Planet PHP until now..)
  • We don’t have ads, it’s clearly a non-commercial project (yeah yeah, it’s somehow sponsored by Bitflux and Netzwirt, but I certainly don’t make money with it :) )

What’s the solution?

As I said, I don’t care much, if someone uses my content on other sites as long as the license is respected. Maybe LiveJournal and phpn.org are respecting the license, but what they are doing is IMHO not being a nice net-citizen with their “each post an a seperate page without any added value” approach.

The other problem is that most blogs don’t have a clear license for their published content, so it’s hard to tell what’s allowed. If they all would use one of the Creative Commons licenses (or any similar one) and put that into their RSS feed like proposed by userland.com, I could add them to the Planet PHP feed and all would be a lot clearer :)

Anyway, way too long post and no more coffee in the office :)

Update: Just to prevent miss-understandings: I’m absolutely fine with stuff like this from php-mag.net. It’s edited and approved by a human (even tough 95% is copy&paste from my post and it’s basically a commercial site, but that’s fair use IMHO) and they don’t just take each and every post from my feed without adding at least some own content…

About LJ, I recieved an email from them shortly after I complained about my blog being listed.

They explained that they syndicate news from other sources but the data is only stored on their servers for 2 weeks, and is done for the conveniece of the LJ users. They thought my idea of adding tags so that google doesn’t index the pages interesting.

I thought it was a fairly acceptable solution. Aggregate for your users benefit, but then don’t store the pages and clutter up the internet.