JASP or not to JASP? Bayesian statistical methods for free.

There are two ways of doing statistical analysis. One that I call the Excel approach and the other that I call the Experts approach.

You use the former if you are a mouse user, and like to point and click at things without understanding what is happening. You can also forego control too.

And you use the latter for finer control over outputs. They require some understanding of the theoretical aspects of the analysis, and a lot more punching of the keyboard.

Introducing JASP

I’ve been experimenting a statistics software called JASP that aims to be a small statistics software — ‘an alternative to SPSS’. It falls in the realm of Excel users but has the power of Experts underneath. What? Is THAT POSSIBLE?

JASP has only a few, well… FOUR, categories of analysis — t-tests, anova, regression, and frequencies — and this is fine for most of the use cases. I find it particularly interesting for EDA (Exploratory Data Analysis) where you just want to look at the data your working with.

Everything is visual and done with the mouse, from loading data, to clicking through the options of the analysis you want to perform.

The output is beautiful but limited to HTML with the images embedded as png data:uris. It would be great to export the results for use in academic papers — eps or pdfs would be great for LaTeX. This makes JASP a little limiting.

What about Mondrian?

In any case, JASP is still in its infancy and will evolve rapidly. As for exploratory data analysis software there are alternatives. One of my favourites is Mondrian. Mondrian is old as it started in 1997, latest stable is from 2011 and latest beta is from 2013. Mondrian doesn’t have the beautiful graphics of JASP either. Sometimes simple things are the best and Mondrian just works.

But isn’t JASP just a gui for R?

The cool thing about JASP is that all the clicks and clanks are just a way to pass parameters to R functions. The backend is R and therefore you can do everything the hard way if you really need to improve on the existing functions. For example, sometimes you need to make your plot scales logarithmic. There is no way to do that in JASP but if you use the R function underlying it… you solve your problem — and you can export PDFs for paper production.

The R scripts that power JASP are available at Github. If you want to use them in your own advanced work in R just go ahead and read them.

Data Analysis of 2015 Tourism in the EU: Why raw numbers are worthless (tl:dr)

2015 EU Tourism

Eurostat released a short statistic about the number of tourism nights spent in the EU in 2015. Curiously, Spain shows up as the top tourist destination in Europe. Hm… right at the moment when everyone is speaking about the Spanish influence everywhere. Top. What does that really mean? And what is it about all these rankings in absolute terms that are not comparable across countries in the first place?

When looking at the data we can see that there is no “normalisation”. All comparisons are made in absolute terms. And this is like comparing apples to oranges or comparing China population to the Vatican. Doesn’t make any sense to me.

The data contains also the number of tourism nights of Non-Residents and of Residents for each country.

This in itself is very interesting, because we can take different strategies to compare the two different types of tourism. Non-residents visiting the country vs Residents visiting other other parts of their own country.

The issue with this type of raw data is that it gives the impression that things are very different when sometimes they aren’t. Portugal and Spain appear very different from this chart, but the it doesn’t take into consideration the sizes of both countries.

We need some normalisation of the data so we can compare the numbers between different countries. Comparing Malta and France, or Spain and Sweden doesn’t mean much if one doesn’t take into consideration other things like country population, country GDP, country area, etc… Is summary we need some variable that acts as normalisation.

I fetcheb the population and area of each of the countries in the report and normalised the data by population and area. Here are some interesting results (full results at bottom of post).

Residents tourism

Normalising by population allows us to have a comparable measure between countries for the tourism that each country had in 2015 from its own residents. By doing so we are just getting an average number of nights each person spent doing tourism in his own country.

It is clear that Spain is not the top country anymore. Norway, France, Sweden and Netherlands take the spotlight. Their nationals are the ones that make the most tourism inside their own countries. On average a Norwegian spends almost 4,5 nights doing tourism in Norway contrary to Spain where this number drops to 3,3. Clearly rich countries residents do more tourism inside than poor countries residents — hinting that analysing this data against the GDP might be an interesting approach.

Non-Residents tourism

On the other hand if we want to see how attractive a country is to tourists, we can’t normalise to the countries population (not entirely true), but instead we can normalise in relation to the area of the country. The reasoning behind this is that the interest is proportional to landscape features — beaches, monuments, cities, etc… — that the country presents to tourists, and those are proportional to the area of the country… Obviously some locations have higher densities of tourist spots than others.

The results are surprising. Malta being a very small island is fully dedicated to tourism. It is clearly the outlier here — and forced me to use a logarithmic scale — but it shows clearly that the ranking of tourism cannot be measured by raw numbers alone. Performance and efficiency require comparable measures not raw data.

In this case Spain and Portugal are very close together. Wouldn’t that be expected? Portugal and Spain both have strong beach summer tourism, a pure geographic factor. In history terms Portugal has a common past with Spain of war and family. Therefore monuments and historic cities should be relatively similar in terms of attractiveness to tourists. Both countries have excellent food and their cultural heritage ashamed none. Why would they attract tourists in such a different way as the first chart tried to indicate?

Spain might be slightly more efficient at attracting tourists, they are closer to the European center, and there are probably some supra linear effects, but in the end iberian countries are very similar.

Conclusion

It is clear that what european agencies sometimes published should be taken with a pinch of salt. Not because their data is wrong, but because the reading of the data might be misleading.

Yes, Spain has the largest number of tourism nights in Europe. Does it mean that Spain is the most efficient country in terms of tourism? Does it mean that Spain has the largest number of touristic features to attract foreigners? Well clearly not.

Croatia seems impressive in taking advantage of what it has. Malta, Cyprus are also very effective because they are islands. If you look at nationals tourism, rich northern countries like Norway, France, Sweden and Netherlands, seem to have more nights per person than any other.

What this shows is that there are many narratives that can be written about the same data. Raw numbers alone is misleading. The difference between Portugal and Spain is not that different if you try to correct for the size of the countries.

The full analysis is available in R Markdown format and you can play with the script yourself by using RStudio.

This post is available in PDF format

World Top Coffee Drinkers

Who are the world’s top coffee drinkers?

Coffee Drinkers

World's Top 30 Coffee Import - Are We Coffee Drinkers?

I’ve been playing a bit with R and experimenting with different data sets. The above plot is of the Top 30 coffee import countries ordered by their population. I fond a bit surprising some facts:

  • The US is the biggest coffee importer in the world, but if we divide the coffee imports by the population they are low in this ranking, even below Portugal.
  • Belgium! Why is belgium first? They have a similar population to Portugal, but the amount of coffee they import is almost 10 times bigger.

This is obviously just an exercise on data manipulation but some curiosities arise. Wikipedia has a page on Coffee consumption per capita that reorders these countries in a different way: Belgium comes 8th and the top spot is for Finland (Fins drink 4x the coffee we drink here in Portugal!), so don’t take this plot to seriously.

Data sources: Coffee Imports by country – Google Fusion Tables World Pupulation – Wikipedia