5 essential tricks for R users

R is very powerful and is becoming the language of data scientists. But some things require a bit of learning and are not obvious to the R newcomer. Here are five useful tips if you are just starting out:

  1. Sometimes data is not in the correct format and you need to reshape data to use it in R. Instead of using external software you can do it directly in R.
  2. Plotting multiple series in the same figure. This can be accomplished using R ggplot2 library producing better looking graphics.
  3. If you do Network Analysis, you’ll need to partition the graph into communities. Finding communities in R is easy with the igraph package.
  4. Still playing with graphs, you can colour different nodes according to some data property. Check how to colour graph nodes in R.
  5. R ggplot2 allows you to accept most of the defaults and have great plots, but sometimes you might want to customise them further. Check how to customise ggplot2 axes labels.

Extra: If you use Sweave to automate your reports with live data in R, you might sometimes want to extract the R snippets to a new R file. Instead of copying and pasting, try this:

Stangle("big_sweave_doc.Rnw", output="big_sweave_code.R")

R Tip: define ggplot axis labels

Formatting text and labels in ggplot or ggplot2 axis is easy. A common task when producing plots for publication is to replace default labels. Default labels in axes tend to reflect the name of variables used and sometimes these are not the most descriptive labels. At least not when you are publishing the plots in a scientific journal. So let’s try to break down some ways to personalise ggplot plot axes.

Quick Navigation:

For this formatting example I’ll use the movies dataset that is available in R. First thing we need to do is to load ggplot2 library and then the movies dataset

library(ggplot2)
data(movies)

The default ggplot axis labels

Traditionally the labels are set in the axis directly by ggplot from the aesthetics selected e.g.:

p0<-ggplot(data=movies, aes(x=year))
p0<-p0+geom_point(aes(y=rating))+geom_smooth(aes(y=rating))
p0

plot of define x and y axis ggplot

To make ggplot axes’ labels different we can use xlab and ylab. This defines x and y axis in ggplot easily.

p0+xlab('The glorious years of the movies')+ylab('The public ratings')

Setting axes labels in ggplot with scales

p0+
  scale_x_continuous('The glorious years of the movies (with scales)')+
  scale_y_continuous('The public ratings (with scales)')

Also worth investigating is the labs function that allow the change of the axes and the title e.g.:

p0+labs(
  x='The glorious years of the movies (with labs)',
  y='The public ratings (with labs)'
  )

Formatting labels text for size and rotation?

Ggplot can change axis label orientation, size and colour. To rotate the axes in ggplot you just add the angle property. To change size ou use size and for colour you uses color (Notice that a ggplot uses US-english spelling). Finally, note that you can use the face property to define if the font is bold or italic.

p0 + xlab('The Years of Cinema')+
  ylab('Public Ratings')+
  theme(
    axis.text.x=element_text(angle=90, size=8),
    axis.title.x=element_text(angle=10, color='red'),
    axis.title.y=element_text(angle=80, color='blue', face='bold', size=14)
    )

The formatting of the text in the labels is a bit counter intuitive because it uses a slightly different nomenclature. The formatting is done with the theme function and by defining element_text’s with the wanted format. In the example above the axis.text.x defines the ticks format and the axis.title.? define the labels format.

A good way to learn all the elements that a ggplot theme can format can be obtained from the help menu by entering ?theme. These examples are just scrapping the surface of what you can do but hope they can get you started in formatting text size and orientation inside ggplot plots.

Side Note: Did you noticed how crappy the movies from the 70s, 80s and 90s were?

Lorenz Attractor in R.

Lorenz Attractor

The Lorenz System is one of the most famous system of equations in the realm of chaotic systems first studied by Edward Lorenz. The double lob remembering a butterfly wing is on the imagination of any complex systems enthusiast.

Lorenz Attractor Equations

The three equations that govern its behavior are:

where usually

So let's define the parameters, the initial state of the system and the equations in the form of a function called Lorenz

parameters <- c(s = 10, r = 28, b = 8/3)
state <- c(X = 0, Y = 1, Z = 1)
 
Lorenz <- function(t, state, parameters) {
    with(as.list(c(state, parameters)), {
        dX <- s * (Y - X)
        dY <- X * (r - Z) - Y
        dZ <- X * Y - b * Z
        list(c(dX, dY, dZ))
    })
}

We can now start processing this and plotting it.

times <- seq(0, 50, by = 0.01)
library(deSolve)
out <- ode(y = state, times = times, func = Lorenz, parms = parameters)
 
par(oma = c(0, 0, 3, 0))
plot(out, xlab = "time", ylab = "-")
plot(out[, "Y"], out[, "Z"], pch = ".", type = "l")
mtext(outer = TRUE, side = 3, "Lorenz model", cex = 1.5)

The above example is mainly copied from the the deSolve package documentation that uses the same example. What I'd like to point out in this is how simple it is to solve and plot a system of differential equations in R. The language is simple and clear and it is much more practical than implementing everything from scratch.

If you want to play with a couple of Lorenz attractors synchronizing please visit this Java implementation.

Up next: Animating the Lorenz Attractor in R.