Easy cell statistics for factorial designs

A common task when analyzing multi-group designs is obtaining descriptive statistics for various cells and cell combinations.

There are many functions that can help you accomplish this, including aggregate() and by() in the base installation, summaryBy() in the doBy package, and describe.by() in the psych package. However, I find it easiest to use the melt() and cast() functions in the reshape package.

As an example, consider the mtcars dataframe (included in the base installation) containing road test information on automobiles assessed in 1974. Suppose that you want to obtain the means, standard deviations, and sample sizes for the variables miles per gallon (mpg), horsepower (hp), and weight (wt). You want these statistics for all cars in the dataset, separately by transmission type (am) and number of gears (gear), and for the cells formed by crossing these two variables.

You can accomplish this with the following code:

options(digits = 3)
library(reshape)

# define and name the statistics of interest
stats <- function(x)(c(N = length(x), Mean = mean(x), SD = sd(x)))

# label the levels of the classification variables (optional)
mtcars$am   <- factor(mtcars$am, levels = c(0, 1), labels = c("Automatic", "Manual"))
mtcars$gear <- factor(mtcars$gear, levels = c(3, 4, 5),
                      labels = c("3-Cyl", "4-Cyl", "5-Cyl"))

# melt the dataset
dfm   <- melt(mtcars,
              # outcome variables
              measure.vars = c("mpg", "hp", "wt"),
              # classification variables
              id.vars = c("am", "gear"))

# statistics for the entire sample
cast(dfm, variable ~ ., stats)

# statistics for cells defined by transmission type
cast(dfm, am + variable ~ ., stats)

# statistics for cells defined by number of gears
cast(dfm, gear + variable ~ ., stats)

# statistics for cells defined by each am x gear combination
cast(dfm, am + gear + variable ~ ., stats)

The output is given below:

  variable  N   Mean     SD
1      mpg 32  20.09  6.027
2       hp 32 146.69 68.563
3       wt 32   3.22  0.978

         am variable  N   Mean     SD
1 Automatic      mpg 19  17.15  3.834
2 Automatic       hp 19 160.26 53.908
3 Automatic       wt 19   3.77  0.777
4    Manual      mpg 13  24.39  6.167
5    Manual       hp 13 126.85 84.062
6    Manual       wt 13   2.41  0.617

  gear  variable  N   Mean      SD

1 3-Cyl      mpg 15  16.11   3.372
2 3-Cyl       hp 15 176.13  47.689
3 3-Cyl       wt 15   3.89   0.833
4 4-Cyl      mpg 12  24.53   5.277
5 4-Cyl       hp 12  89.50  25.893
6 4-Cyl       wt 12   2.62   0.633
7 5-Cyl      mpg  5  21.38   6.659
8 5-Cyl       hp  5 195.60 102.834
9 5-Cyl       wt  5   2.63   0.819

          am  gear variable  N   Mean      SD
1  Automatic 3-Cyl      mpg 15  16.11   3.372
2  Automatic 3-Cyl       hp 15 176.13  47.689
3  Automatic 3-Cyl       wt 15   3.89   0.833
4  Automatic 4-Cyl      mpg  4  21.05   3.070
5  Automatic 4-Cyl       hp  4 100.75  29.010
6  Automatic 4-Cyl       wt  4   3.30   0.157
7     Manual 4-Cyl      mpg  8  26.27   5.414
8     Manual 4-Cyl       hp  8  83.88  24.175
9     Manual 4-Cyl       wt  8   2.27   0.461
10    Manual 5-Cyl      mpg  5  21.38   6.659
11    Manual 5-Cyl       hp  5 195.60 102.834
12    Manual 5-Cyl       wt  5   2.63   0.819

The approach is easily generalized to any number of grouping variables (factors), dependent/outcome variables, and statistics, and gives you a powerful tool for slicing and dicing data.

About these ads
This entry was posted in R Programming. Bookmark the permalink.

One Response to Easy cell statistics for factorial designs

  1. Wiltrud says:

    Hadley has updated the package to reshape 2, which has changed things somewhat. The package plyr also does some of this functionality.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s