R Tip: define ggplot axis labels

Formatting text and labels in ggplot or ggplot2 axis is easy. A common task when producing plots for publication is to replace default labels. Default labels in axes tend to reflect the name of variables used and sometimes these are not the most descriptive labels. At least not when you are publishing the plots in a scientific journal. So let’s try to break down some ways to personalise ggplot plot axes.

Quick Navigation:

For this formatting example I’ll use the movies dataset that is available in R. First thing we need to do is to load ggplot2 library and then the movies dataset

library(ggplot2)
data(movies)

The default ggplot axis labels

Traditionally the labels are set in the axis directly by ggplot from the aesthetics selected e.g.:

p0<-ggplot(data=movies, aes(x=year))
p0<-p0+geom_point(aes(y=rating))+geom_smooth(aes(y=rating))
p0

plot of define x and y axis ggplot

To make ggplot axes’ labels different we can use xlab and ylab. This defines x and y axis in ggplot easily.

p0+xlab('The glorious years of the movies')+ylab('The public ratings')

Setting axes labels in ggplot with scales

p0+
  scale_x_continuous('The glorious years of the movies (with scales)')+
  scale_y_continuous('The public ratings (with scales)')

Also worth investigating is the labs function that allow the change of the axes and the title e.g.:

p0+labs(
  x='The glorious years of the movies (with labs)',
  y='The public ratings (with labs)'
  )

Formatting labels text for size and rotation?

Ggplot can change axis label orientation, size and colour. To rotate the axes in ggplot you just add the angle property. To change size ou use size and for colour you uses color (Notice that a ggplot uses US-english spelling). Finally, note that you can use the face property to define if the font is bold or italic.

p0 + xlab('The Years of Cinema')+
  ylab('Public Ratings')+
  theme(
    axis.text.x=element_text(angle=90, size=8),
    axis.title.x=element_text(angle=10, color='red'),
    axis.title.y=element_text(angle=80, color='blue', face='bold', size=14)
    )

The formatting of the text in the labels is a bit counter intuitive because it uses a slightly different nomenclature. The formatting is done with the theme function and by defining element_text’s with the wanted format. In the example above the axis.text.x defines the ticks format and the axis.title.? define the labels format.

A good way to learn all the elements that a ggplot theme can format can be obtained from the help menu by entering ?theme. These examples are just scrapping the surface of what you can do but hope they can get you started in formatting text size and orientation inside ggplot plots.

Side Note: Did you noticed how crappy the movies from the 70s, 80s and 90s were?

How to plot multiple data series in ggplot for quality graphs?

plot multiple data series with ggplot

I've already shown how to plot multiple data series in R with a traditional plot by using the par(new=T), par(new=F) trick. Now I'll show how to do it within ggplot2.

First let's generate two data series y1 and y2 and plot them with the traditional points methods

x <- seq(0, 4 * pi, 0.1)
n <- length(x)
y1 <- 0.5 * runif(n) + sin(x)
y2 <- 0.5 * runif(n) + cos(x) - sin(x)
plot(x, y1, col = "blue", pch = 20)
points(x, y2, col = "red", pch = 20)

This is exactly the R code that produced the above plot. It is just a simple plot and points functions to plot multiple data series. It is not really the greatest, smart looking R code you want to use. Better plots can be done in R with ggplot.

Plotting with Ggplot2

Now, let's try this with ggplot2.

First we need to create a data.frame with our series.

If we have very few series we can just plot adding geom_point as needed.

library(ggplot2)
df <- data.frame(x, y1, y2)
ggplot(df, aes(x, y = value, color = variable)) + 
    geom_point(aes(y = y1, col = "y1")) + 
    geom_point(aes(y = y2, col = "y2"))

But if we have many series to plot an alternative is using melt to reshape the data.frame and with this plot an arbitrary number of rows. For example:

library(reshape)
# This creates a new data frame with columns x, variable and value
# x is the id, variable holds each of our timeseries designation
df.melted <- melt(df, id = "x")
 
ggplot(data = df.melted, aes(x = x, y = value, color = variable)) +
  geom_point()

And thats how to plot multiple data series using ggplot. The basic trick is that you need to melt your data into a new data.frame. Remember, in data.frames each row represents an observation.

update:

Another option, pointed to me in the comments by Cosmin Saveanu (Thanks!), it to plot the multiple data series with facets (good for B&W):

library(reshape)
ggplot(data = df.melted, aes(x = x, y = value)) +
geom_point() + facet_grid(variable ~ .)