Monday, December 2, 2013

Plotting Geologic Time in R

I'm going to be plotting some data in R, with geologic age along the y-axis, so I thought it would be nice to have a version of the timescale to go with it.  In order to plot everything with the same y-axis, I need the Cenozoic timescale as a stacked bar chart R.

The first step was creating a table with the information I needed. I created a comma-separated file that I loaded into R. My file had four columns: Time (a text column where I listed the Epochs), AgeStart (the beginning of each epoch), AgeEnd (the end of each epoch) and Duration (the length of each epoch).

In R, I'm using the ggplot2 library (and for what it's worth, I use RStudio). To begin with:
# Script created by Tannis McCartney
library(ggplot2)
# Read in time scale file (Ages from 2013 International Time Scale)
Cenozoic <- read.csv(file='Cenozoic.csv')
The next step was to calculate the midpoint of each epoch.
midpoint <- Cenozoic$Age.End + Cenozoic$Duration/2
I also needed to tell R which order to put my epochs in. For a larger dataset, I would probably create a sequential numerical column, but with only seven epochs in the Cenozoic, I could spell it out:
Cenozoic$Time <- factor(Cenozoic$Time, levels = c("Holocene", "Pleistocene", "Pliocene", "Miocene", "Oligocene", "Eocene", "Paleocene"))
With ggplot2, the plot is generated by a series of commands added together:
ggplot(data = Cenozoic, aes(x= "Epoch", y = Duration, fill = Time)) + ylab("Ma") + xlab(NULL) + geom_bar(stat="identity") + geom_text(label = Cenozoic$Time, aes(y=midpoint, ymax=65.5)) + theme(legend.position="none")  + scale_fill_manual(values=c("#FEF2E0", "#FFF2AE", "#FFFF99", "#FFFF00", "#FDC07A", "#F3B46C", "#FDA75F"))
Let me break that down a bit:

ggplot(data = Cenozoic, aes(x= "Epoch", y = Duration, fill = Time))
This plots the data in the Cenozoic table with "Epoch" on the x-axis, Duration on the y-axis, and using Time to fill the bars with different colours. Epoch is not something I had in the input table - to create a single stacked bar chart you want only one x value and it has to be assigned a value/string. I called it Epoch. By using Duration along the y-axis, it creates the stacked bars of the appropriate widths, and it fills the colours according to the Time column.

ylab("Ma") + xlab(NULL)
This sets the labels for the x- and y- axes. I didn't need an x-axis label since the tick will be labeled with Epoch based on my inputs above.

geom_bar(stat="identity")
This tells R to create bars with heights that represent specific values (in this case, the duration of the epoch).

geom_text(label = Cenozoic$Time, aes(y=midpoint, ymax=65.5))
This puts the epoch labels in the center of each bar. The midpoint values calculated earlier are used here.

theme(legend.position="none")
This turns off the legend, since the labels have been applied to the bars directly.

scale_fill_manual(values=c("#FEF2E0", "#FFF2AE", "#FFFF99", "#FFFF00", "#FDC07A", "#F3B46C", "#FDA75F"))
This sets the hex colour values for each bar. I decided to go all out here - I used the "official" colours for the International Time Scale. I got the RGB values for each epoch from this handy resource at Purdue. I'm not sure if R can handle RGB values, but I know it can take hex colours, so I converted the RGB values for the epochs to hex using an online converter. There may be a more efficient way to do this, and if I was working with more than seven rows of data, I would have spent a bit of time finding it. However, this works for the Cenozoic epochs. This command creates a column that assigns the colours to the bars in the same order they are plotted.

Since my plan is to tile this with other data I'm plotting, I'm leaving off the title for now. Eventually I will make this narrower too. Once I have everything else scripted in R, I'll decide whether or not to flip the y-axis or rotate the whole plot, but for now I'm pretty pleased with the results: