Jeromy Anglim's Blog: Psychology and Statistics


Thursday, May 3, 2012

How to plot three categorical variables and one continuous variable using ggplot2

This post shows how to produce a plot involving three categorical variables and one continuous variable using ggplot2 in R.

The following code is also available as a gist on github.

1. Create Data

First, let's load ggplot2 and create some data to work with:

library(ggplot2)

set.seed(4444)
Data <- expand.grid(group=c("Apples", "Bananas", "Carrots", "Durians", 
            "Eggplants"),
            year=c("2000", "2001", "2002"),
            quality=c("Grade A", "Grade B", "Grade C", "Grade D", 
            "Grade E"))
Group.Weight <- data.frame(
    group=c("Apples", "Bananas", "Carrots", "Durians", "Eggplants"),
    group.weight=c(1,1,-1,0.5, 0))
Quality.Weight <- data.frame(
    quality=c("Grade A", "Grade B", "Grade C", "Grade D", "Grade E"),
    quality.weight = c(1,0.5,0,-0.5,-1))
Data <- merge(Data, Group.Weight)
Data <- merge(Data, Quality.Weight)
Data$score <- Data$group.weight + Data$quality.weight + 
    rnorm(nrow(Data), 0, 0.2)
Data$proportion.tasty <- exp(Data$score)/(1 + exp(Data$score))
2. Produce Plot

And here's the code to produce the plot.

ggplot(data=Data, 
       aes(x=factor(year), y=proportion.tasty, 
           group=group,
           shape=group,
           color=group)) + 
               geom_line() + 
               geom_point() +
               opts(title = 
               "Proportion Tasty by Year, Quality, and Group") +
               scale_x_discrete("Year") +
               scale_y_continuous("Proportion Tasty") + 
        facet_grid(.~quality )

And here's what it looks like:

three categorical variables ggplot2