I’ve manually restructured data for intensive longitudinal analyses one too many times… So I wrote a function to do this for me, and posted here so I’d remembered that I had done so.
Specifically, this code parses intensive longitudinal data (ILD, e.g., daily diary, EMA) variables into within-person and between-person components. Between-person variables are grand-mean centered averages; i.e., an individual’s average score across the ILD period centered around the total sample mean. Within-person variables represent a single observation for a given individual centered around that person’s mean. Parsing the data in this way allows you to better understand if it’s someone’s disposition (between-person) that matters, or if it is departures from one’s norm (within-person) that matters in your analysis.
As usual, we’ll use the tidyverse
packages.
library(tidyverse)
We’ll use some sample data from a two-week daily diary period to illustrate this. Here, we have daily measures of participants mood, including happiness, depressed mood, anxiety, and anger.
head(ddData)
## # A tibble: 6 × 5
## id happy depr anx anger
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 1 2.67 1.33 3 1
## 2 1 3 1.33 2 1
## 3 1 3.33 1 2 1
## 4 1 3 1.33 2 1
## 5 1 3 1.67 2 1
## 6 1 2.33 3 3.5 2.67
We have 200 participants:
ddData %>%
group_by(id) %>%
summarise() %>%
nrow()
## [1] 200
As you can probably gather, the actual data manipulation here is quite straightforward. The general workflow is:
Use
group_by()
to group your data by person (or whatever your level 2 variable is)Calculate each person’s person-mean
Calculate the within-person variable by subtracting the person mean from the daily measure.
Ungroup your data
Calculate the grand mean (across the whole sample). Note that this should be the mean of the person-means, so that you don’t inadvertently weight the grand mean based on who responded to the most surveys.
Calculate the between-person variable by subtracting the grand mean from the person-mean
For a single variable, say happy
, this looks like this:
happyData <- ddData %>%
#group by individual (or level 2 variable)
group_by(id) %>%
#calculate (person) mean
mutate(happy_pm = mean(happy, na.rm=TRUE))%>%
#calculate within-person variables
mutate(happy_w = happy - happy_pm) %>%
#Need to ungroup here: future operations should be for whole dataset,
#not per person/unit
ungroup()%>%
#calculate grand mean; mean across whole sample
mutate(happy_gm = mean(happy_pm, na.rm=TRUE)) %>%
#calculate between person variable
#this is the difference between an individual's person-mean and the grand mean
mutate(happy_b = happy_pm - happy_gm)
head(happyData)
## # A tibble: 6 × 9
## id happy depr anx anger happy_pm happy_w happy_gm happy_b
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2.67 1.33 3 1 2.33 0.334 3.65 -1.32
## 2 1 3 1.33 2 1 2.33 0.667 3.65 -1.32
## 3 1 3.33 1 2 1 2.33 1.00 3.65 -1.32
## 4 1 3 1.33 2 1 2.33 0.667 3.65 -1.32
## 5 1 3 1.67 2 1 2.33 0.667 3.65 -1.32
## 6 1 2.33 3 3.5 2.67 2.33 -0.000250 3.65 -1.32
With one variable, this is pretty straightforward. But when you have multiple predictor variables that you need to parse, it gets to be a lot of copy-and-pasting (i.e., a lot of room for human error). Instead, we can create a function that takes a vector of variables and performs this operation on all of them.
Writing the Function
First, we’ll just dump the code above into a helper function. The main difference here is that we have to do a bit of maneuvering in order to accurately access variables names within the function. We input the names for our grouping
variable and the variable we want to parse as strings. Then, in the body of the function, we use the sym()
function from the rlang
package to turn these variable names into symbols. The !!
operator tells R to look in our data frame for the variable instead of in the environment (fun fact, !!
is called the “bang bang” operator!)
In this function, grouping
is your grouping variable (typically participant ID), and variable
is the variable you want to parse into within- and between-group components. Again, this function will only parse one variable at a time.
parseOne <- function(data, grouping, variable) {
require(rlang)
#this is (almost) the same as above:
data <- data %>%
#here we use !! and sym()
group_by(!!sym(grouping)) %>%
mutate(var_pm = mean(!!sym(variable), na.rm=TRUE),
var_w = !!sym(variable) - var_pm) %>%
ungroup()%>%
mutate(var_gm = mean(var_pm, na.rm=TRUE),
var_b = var_pm - var_gm) %>%
select(-c(var_pm, var_gm))
#here we rename our variable using the original variable name as stem
names(data)[names(data) == "var_w"] <- paste(variable,"_w", sep = "")
names(data)[names(data) == "var_b"] <- paste(variable,"_b", sep = "")
return(data)
}
This gives us an updated data frame with one variable parsed into it’s within and between-person components:
newDat <- parseOne(ddData, grouping = "id", variable = "happy")
head(newDat)
## # A tibble: 6 × 7
## id happy depr anx anger happy_w happy_b
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2.67 1.33 3 1 0.334 -1.32
## 2 1 3 1.33 2 1 0.667 -1.32
## 3 1 3.33 1 2 1 1.00 -1.32
## 4 1 3 1.33 2 1 0.667 -1.32
## 5 1 3 1.67 2 1 0.667 -1.32
## 6 1 2.33 3 3.5 2.67 -0.000250 -1.32
Once we have that helper function, we can create a wrapper function that intakes a vector of variables that we want to parse. All we need is a for loop that iteratively applies our new function for each variable in the vector:
parseILD <- function(data, grouping, variables){
for(i in variables){
data <- parseOne(data, grouping, i)
}
return(data)
}
Let’s try it out:
newDat <- parseILD(ddData, grouping = "id", variables = c("happy", "depr"))
head(newDat)
## # A tibble: 6 × 9
## id happy depr anx anger happy_w happy_b depr_w depr_b
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2.67 1.33 3 1 0.334 -1.32 -0.0557 -0.0981
## 2 1 3 1.33 2 1 0.667 -1.32 -0.0557 -0.0981
## 3 1 3.33 1 2 1 1.00 -1.32 -0.389 -0.0981
## 4 1 3 1.33 2 1 0.667 -1.32 -0.0557 -0.0981
## 5 1 3 1.67 2 1 0.667 -1.32 0.278 -0.0981
## 6 1 2.33 3 3.5 2.67 -0.000250 -1.32 1.61 -0.0981
We can add as many variables as we want to our variables
vector:
newDat <- parseILD(ddData, grouping = "id",
variables = c("happy", "depr", "anx", "anger"))
head(newDat)
## # A tibble: 6 × 13
## id happy depr anx anger happy_w happy_b depr_w depr_b anx_w anx_b
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2.67 1.33 3 1 0.334 -1.32 -0.0557 -0.0981 0.792 0.624
## 2 1 3 1.33 2 1 0.667 -1.32 -0.0557 -0.0981 -0.208 0.624
## 3 1 3.33 1 2 1 1.00 -1.32 -0.389 -0.0981 -0.208 0.624
## 4 1 3 1.33 2 1 0.667 -1.32 -0.0557 -0.0981 -0.208 0.624
## 5 1 3 1.67 2 1 0.667 -1.32 0.278 -0.0981 -0.208 0.624
## 6 1 2.33 3 3.5 2.67 -0.000250 -1.32 1.61 -0.0981 1.29 0.624
## # ℹ 2 more variables: anger_w <dbl>, anger_b <dbl>
And now you have parsed data for your ILD analyses!