Converting the Shape of an R Data Frame

The goal of a statistical technique called ANOVA is to determine if the means (averages) of three or more groups of data are equal when you only have samples from the groups. Suppose data intended for Excel is stored as “data1.txt” and looks like:

Group1,Group2,Group3
7,9,3
6,9,5
8,7,6
6,8,5

But in order to directly use the R language aov() function, you really want the data to have a structure like this:

Score,Group
7,G1
6,G1
8,G1
6,G1
9,G2
9,G2
7,G2
8,G2
3,G3
5,G3
6,G3
5,G3

This type of conversion is very common and mildly annoying to perform. I usually write a small R function that accepts an R data frame that has data in the original square-like format, and converts it to a data frame in the long-like format. For example, this code reads a data file into an R data frame, loads the custom convert function, and creates a new data frame suitable for use by the aov() function:

> mydf1 source(“myconvert.R”)
> mydf2 <- myconvert(mydf1)

ConvertRDataFrameShape

The custom myconvert() R function is:

myconvert = function(originalDF) {
  nr <- nrow(na.omit(originalDF)) # not header
  nc <- ncol(na.omit(originalDF))
  tot <- nr * nc

  newlabels <- c("G1", "G2", "G3") 
  result <- data.frame(Score=numeric(tot),
    Group=character(tot),
    stringsAsFactors=F)

  k <- 1
  for (j in 1:nc) {
    for (i in 1:nr) {
      result$Score[k] <- originalDF[i,j]
      result$Group[k] <- newlabels[j]
      k <- k + 1
    }
  }

  return(result)
}

The function has lots of hard-coded values for simplicity, as opposed to adding lots of parameters to make it more general.

Advertisements
This entry was posted in Machine Learning. Bookmark the permalink.