Restructuring Data for Intensive Longitudinal Analysis

I’ve manually restructured data for intensive longitudinal analyses one too many times… So I wrote a function to do this for me, and posted here so I’d remembered that I had done so.

Specifically, this code parses intensive longitudinal data (ILD, e.g., daily diary, EMA) variables into within-person and between-person components. Between-person variables are grand-mean centered averages; i.e., an individual’s average score across the ILD period centered around the total sample mean. Within-person variables represent a single observation for a given individual centered around that person’s mean. Parsing the data in this way allows you to better understand if it’s someone’s disposition (between-person) that matters, or if it is departures from one’s norm (within-person) that matters in your analysis.

As usual, we’ll use the tidyverse packages.

library(tidyverse)

We’ll use some sample data from a two-week daily diary period to illustrate this. Here, we have daily measures of participants mood, including happiness, depressed mood, anxiety, and anger.

head(ddData)

## # A tibble: 6 × 5
##   id    happy  depr   anx anger
##   <fct> <dbl> <dbl> <dbl> <dbl>
## 1 1      2.67  1.33   3    1   
## 2 1      3     1.33   2    1   
## 3 1      3.33  1      2    1   
## 4 1      3     1.33   2    1   
## 5 1      3     1.67   2    1   
## 6 1      2.33  3      3.5  2.67

We have 200 participants:

ddData %>% 
  group_by(id) %>% 
  summarise() %>% 
  nrow()

## [1] 200

As you can probably gather, the actual data manipulation here is quite straightforward. The general workflow is:

Use group_by() to group your data by person (or whatever your level 2 variable is)
Calculate each person’s person-mean
Calculate the within-person variable by subtracting the person mean from the daily measure.
Ungroup your data
Calculate the grand mean (across the whole sample). Note that this should be the mean of the person-means, so that you don’t inadvertently weight the grand mean based on who responded to the most surveys.
Calculate the between-person variable by subtracting the grand mean from the person-mean

For a single variable, say happy, this looks like this:

happyData <-  ddData %>%
  #group by individual (or level 2 variable)
  group_by(id) %>%
  #calculate (person) mean
  mutate(happy_pm = mean(happy, na.rm=TRUE))%>%
  #calculate within-person variables
  mutate(happy_w = happy - happy_pm) %>%
  #Need to ungroup here: future operations should be for whole dataset,
  #not per person/unit
  ungroup()%>%
  #calculate grand mean; mean across whole sample
  mutate(happy_gm = mean(happy_pm, na.rm=TRUE)) %>%
  #calculate between person variable
  #this is the difference between an individual's person-mean and the grand mean
  mutate(happy_b = happy_pm - happy_gm)

head(happyData)

## # A tibble: 6 × 9
##   id    happy  depr   anx anger happy_pm   happy_w happy_gm happy_b
##   <fct> <dbl> <dbl> <dbl> <dbl>    <dbl>     <dbl>    <dbl>   <dbl>
## 1 1      2.67  1.33   3    1        2.33  0.334        3.65   -1.32
## 2 1      3     1.33   2    1        2.33  0.667        3.65   -1.32
## 3 1      3.33  1      2    1        2.33  1.00         3.65   -1.32
## 4 1      3     1.33   2    1        2.33  0.667        3.65   -1.32
## 5 1      3     1.67   2    1        2.33  0.667        3.65   -1.32
## 6 1      2.33  3      3.5  2.67     2.33 -0.000250     3.65   -1.32

With one variable, this is pretty straightforward. But when you have multiple predictor variables that you need to parse, it gets to be a lot of copy-and-pasting (i.e., a lot of room for human error). Instead, we can create a function that takes a vector of variables and performs this operation on all of them.

Writing the Function

First, we’ll just dump the code above into a helper function. The main difference here is that we have to do a bit of maneuvering in order to accurately access variables names within the function. We input the names for our grouping variable and the variable we want to parse as strings. Then, in the body of the function, we use the sym() function from the rlang package to turn these variable names into symbols. The !! operator tells R to look in our data frame for the variable instead of in the environment (fun fact, !! is called the “bang bang” operator!)

In this function, grouping is your grouping variable (typically participant ID), and variable is the variable you want to parse into within- and between-group components. Again, this function will only parse one variable at a time.

parseOne <- function(data, grouping, variable) {
  require(rlang)
  #this is (almost) the same as above:
  data <- data %>% 
    #here we use !! and sym()
    group_by(!!sym(grouping)) %>% 
    mutate(var_pm = mean(!!sym(variable), na.rm=TRUE),
           var_w = !!sym(variable) - var_pm) %>%
    ungroup()%>%
    mutate(var_gm = mean(var_pm, na.rm=TRUE),
         var_b = var_pm - var_gm) %>% 
    select(-c(var_pm, var_gm))
  
  #here we rename our variable using the original variable name as stem
  names(data)[names(data) == "var_w"] <- paste(variable,"_w", sep = "")
  names(data)[names(data) == "var_b"] <- paste(variable,"_b", sep = "")

  return(data)
}

This gives us an updated data frame with one variable parsed into it’s within and between-person components:

newDat <- parseOne(ddData, grouping = "id", variable = "happy")
head(newDat)

## # A tibble: 6 × 7
##   id    happy  depr   anx anger   happy_w happy_b
##   <fct> <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl>
## 1 1      2.67  1.33   3    1     0.334      -1.32
## 2 1      3     1.33   2    1     0.667      -1.32
## 3 1      3.33  1      2    1     1.00       -1.32
## 4 1      3     1.33   2    1     0.667      -1.32
## 5 1      3     1.67   2    1     0.667      -1.32
## 6 1      2.33  3      3.5  2.67 -0.000250   -1.32

Once we have that helper function, we can create a wrapper function that intakes a vector of variables that we want to parse. All we need is a for loop that iteratively applies our new function for each variable in the vector:

parseILD <- function(data, grouping, variables){
for(i in variables){
  data <- parseOne(data, grouping, i)
}
return(data)
}

Let’s try it out:

newDat <- parseILD(ddData, grouping = "id", variables = c("happy", "depr"))
head(newDat)

## # A tibble: 6 × 9
##   id    happy  depr   anx anger   happy_w happy_b  depr_w  depr_b
##   <fct> <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl>   <dbl>   <dbl>
## 1 1      2.67  1.33   3    1     0.334      -1.32 -0.0557 -0.0981
## 2 1      3     1.33   2    1     0.667      -1.32 -0.0557 -0.0981
## 3 1      3.33  1      2    1     1.00       -1.32 -0.389  -0.0981
## 4 1      3     1.33   2    1     0.667      -1.32 -0.0557 -0.0981
## 5 1      3     1.67   2    1     0.667      -1.32  0.278  -0.0981
## 6 1      2.33  3      3.5  2.67 -0.000250   -1.32  1.61   -0.0981

We can add as many variables as we want to our variables vector:

newDat <- parseILD(ddData, grouping = "id", 
                   variables = c("happy", "depr", "anx", "anger"))
head(newDat)

## # A tibble: 6 × 13
##   id    happy  depr   anx anger   happy_w happy_b  depr_w  depr_b  anx_w anx_b
##   <fct> <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl>   <dbl>   <dbl>  <dbl> <dbl>
## 1 1      2.67  1.33   3    1     0.334      -1.32 -0.0557 -0.0981  0.792 0.624
## 2 1      3     1.33   2    1     0.667      -1.32 -0.0557 -0.0981 -0.208 0.624
## 3 1      3.33  1      2    1     1.00       -1.32 -0.389  -0.0981 -0.208 0.624
## 4 1      3     1.33   2    1     0.667      -1.32 -0.0557 -0.0981 -0.208 0.624
## 5 1      3     1.67   2    1     0.667      -1.32  0.278  -0.0981 -0.208 0.624
## 6 1      2.33  3      3.5  2.67 -0.000250   -1.32  1.61   -0.0981  1.29  0.624
## # ℹ 2 more variables: anger_w <dbl>, anger_b <dbl>

And now you have parsed data for your ILD analyses!