The Data Stack – Download the most complete overview of the data centric landscape.

This blog post offers an overview and PDF download of the data stack, thus all tools that might be needed for data collection, processing, storage, analysis and finally integrated business intelligence solutions.

(Web)-Developers are used to stacks, most prominent among them probably the LAMP Stack or more current the MEAN stack. On the other hand, I have not heard too many data scientists talking about so much about data stacks – may it because we think, that in a lot of cases all you need is some python a CSV, pandas, and scikit-learn to do the job.

But when we sat down recently with our team, I realized that we indeed use a myriad of different tools, frameworks, and SaaS solutions. I thought it would be useful to organize them in a meaningful data stack. I have not only included the tools we are using, but I sat down and started researching. It turned out into an extensive list aka. the data stack PDF. This poster will:

  • provide an overview of solutions available in the 5 layers (Sources, Processing, Storage, Analysis, Visualization)
  • offer you a way to discover new tools and
  • offer orientation in a very densely populated area

So without further ado, here is my data stack overview (Click to open PDF). Feel free to share it with your friends too.

Liip Data stack version 1.0

Liip data stack version 1.0

Click here to get notified by email when I release version 2.0 of the data stack.

Let me lay out some of the questions that guided me in researching each area and throw in my 5 cents while researching each one of them:

Continue reading about The Data Stack – Download the most complete overview of the data centric landscape.

Tags: , , , , , ,

Predicting how long the böögg is going to burn this year with a bit of eyeballing and machine learning.

So apparently there is the tradition of the böögg in Zürich. It is a little snowman made out of straw that you put up on top of a pole, stuff with explosives and then light up. Eventually the explosives inside the head of the snowman will catch fire and then blow up with a big bang. The tradition demands it that if the böögg explodes after a short time, there will be a lot of summer days, if it takes longer then we will have more rainy days. It reminds me a bit of the groundhog day. If you want to know more about the böögg, you should check out the wikipedia page https://de.wikipedia.org/wiki/Sechseläuten.

Now people have started to bet on how long it will take for the böögg to explode this year. There is even a website  that lets you bet on it and you can win something. In my first instinct I inserted a random number (13 min 06 seconds) but then thought – isn’t there a way to predict it better than with our guts feeling? Well it turns out there is – since we live in 2016 and have open data on all kinds of things. Using this data, what is the prediction for this year?

590 seconds – approximately 10 minutes.

We will have to see on Monday to see if this prediction was right – but I can offer you to show now how I got to this prediction with a bit of eyeballing and machine learning. (Actually our dataset is so small that we wouldn’t have to use any of the tools that I will show you, but its still fun.)

Continue reading about Predicting how long the böögg is going to burn this year with a bit of eyeballing and machine learning.

Tags: ,

Why Piwik matters now

Piwik is an open-source web analytics solution that has been around for quite some years now and has seen a recent revival with the advent of Piwik 2.

It proposes all the necessary tools to capture, collect, process and analyse traffic data. Yes it has an API, yes fancy reports, segments, dashboards and goals, yes also to custom variables, …

Although I have immense respect for the product team behind Google Analytics, I must admit that Piwik brings three features that are unmet in Google Analytics.

Continue reading about Why Piwik matters now

Tags: , ,