5 essential tricks for R users

R is very powerful and is becoming the language of data scientists. But some things require a bit of learning and are not obvious to the R newcomer. Here are five useful tips if you are just starting out:

  1. Sometimes data is not in the correct format and you need to reshape data to use it in R. Instead of using external software you can do it directly in R.
  2. Plotting multiple series in the same figure. This can be accomplished using R ggplot2 library producing better looking graphics.
  3. If you do Network Analysis, you’ll need to partition the graph into communities. Finding communities in R is easy with the igraph package.
  4. Still playing with graphs, you can colour different nodes according to some data property. Check how to colour graph nodes in R.
  5. R ggplot2 allows you to accept most of the defaults and have great plots, but sometimes you might want to customise them further. Check how to customise ggplot2 axes labels.

Extra: If you use Sweave to automate your reports with live data in R, you might sometimes want to extract the R snippets to a new R file. Instead of copying and pasting, try this:

Stangle("big_sweave_doc.Rnw", output="big_sweave_code.R")

R and iGraph: Colouring Community Nodes by attributes

When plotting the results of community detection on networks, sometimes one is interested in more than the connections between nodes. These network relations are usually multidimensional and you might want to represent other aspects other than the network links between nodes.

Plotting node attributes in R and iGraph might be tricky as the documentation is not always clears. In this example we’ll be using the Davis Dataset from the iGraph repository. This dataset was collected by Davis et al1 in the 1930s and represents the observed attendance at 14 social events by 18 Southern women.

Load Data in R

Let’s start by loading network data into R and also load the required igraph libraryigraph is a graph manipulation library that makes it very easy to create, load, analyse, and plot graphs in R. I’ve provided many examples of using igraph but one should really invest some time learning igraph for serious networks and graphs analysis.

graph <- nexus.get("Davis")

plot of Davis et al graph without Colouring Community Nodes

It is obvious that the plot of the graph is very bland and it isn’t easy to see any natural graph partition. It would be nice to have clusters of people in different colours and shapes. Let’s try to improve the plot by identifying communities in the graph.

Finding communities

Let’s ignore that this graph is a bipartite graph for now, let’s just try to partition the graph into communities. Is there a natural division? A division of the graph nodes where there are more connections inside the partition than there are connections to other communities? Let’s colour the nodes according to the community they belong:

graph.com <- fastgreedy.community(graph)
V(graph)$color <- graph.com$membership + 1

To find the communities in the graph, we first use Clauset et al2 greedy algorithm that maximises modularity of the graph in an agglomerative hierarchical clustering. Modularity compares the density of connections inside communities with a null model where connections between graph nodes was random. Then we assign the attribute color according to the graph community membership.

plot of David communities with Colouring Community Nodes

Change graph node’s shape

As we see the algorithm found 2 communities and at first inspection the division seams reasonable. In any case remember that there's another natural division that the algorithm couldn't find: the bipartite relation Event / Woman. Each node has this other property that says “Is Woman” or “Is Event”. Let's use that to characterise the graph in a different way. We have 14 Events. For these we'll change their shape.

V(graph)[V(graph)$type == 1]$shape <- "square"
V(graph)[V(graph)$type == 0]$shape <- "circle"

plot of Davis communities with changed shapes in R

As we can see, we change the look of the graph in R by changing attributes of the nodes or of the edges. Almost every visual aspect can be changed, from the layout of the graph, to the size of the elements of the graph. This examples illustrated the basic mechanisms to change the plotting of graphs to make them more informative.

  1. Allison Davis, Burleigh Bradford Gardner, Mary R. Gardner, Deep south: a social anthropological study of caste and class, University of Chicago Press (1941). 

  2. A Clauset, MEJ Newman, C Moore: Finding community structure in very large networks, http://www.arxiv.org/abs/cond-mat/0408187 

Social Network Analysis em R e algum arrumar de casa

A área de Social Network Analysis está cada vez na actualidade científica e não só. Em 2010 leccionei numa Winter School uma cadeira sobre sobre Software para Análise de Redes Sociais no qual dei uma achega à utilização do R1 para análise de redes. O R não é só útil para análise de redes sociais, servindo para produção de documentos com gráficos de forma automática e reprodutível, análise estatística variada, manipulação de big data de forma rápida, etc… Na verdade o R é uma verdadeira mula de trabalho que se presta a diversas fases da manipulação e análise de dados.

Na área da Social Network Analysis (SNA) o R apresenta alguns packages que merecem ser analisados. Um deles é o package igraph que é possui muitas das funcionalidades necessárias para o estudo de redes, desde a produção de grafos segundo determinados modelos, análise de propriedades, detecção de comunidades… O próprio site do igraph tem um livro online sobre o igraph que pode ajudar quem se inicia neste package. Quem estiver a estudar SNA pela primeira vez pode ver também os tutoriais de Hanneman, embora em alguns casos não seja utilizado o R, mas outros softwares como o Ucinet ou o Pajek.

Para quem se estiver a iniciar no R no entanto há outros tutorias ou apresentações que ajudarão a entrar na linguagem. Se precisam de uma introdução em português vejam estes pdfs produzidos no IST aqui e aqui.

Some Graph/Network libraries for Python

Either for Social Networks Analysis, for Multi-agent simulation or Text mining, networks are everywhere and everyone seems to be producing their own library. Here’s a quick collection of some popular libraries that integrate well with Python (among other things).

NetworkX (1.4) – http://networkx.lanl.gov/ Python Full python and slow for large number of nodes.

igraph (0.5.4) – http://igraph.sourceforge.net/index.html python extension module, R package, Ruby gem or as C library Finding it very useful to use with R <- integrates perfectly with other thing I do in R.

boost – http://www.boost.org/ C++