1 The Motivating Study
Our motivating study is the ADHD SMART (PI: Pelham), an example of a prototypical SMART. A picture of this schematic is also available in the handout V2_handout_motivating_study.pdf.
2 Set up
# The R programming language provides basic capabilities, but much of what
# people do in R is based on add-on 'packages', which we load into R using the
# `library` command
# Load package for data cleaning
library(dplyr)
# Load package for plotting
library(ggplot2)
# Load package for data analysis
library(geepack)
# Sometimes, we may write our own custom functions in R (similar to macros in
# SAS). This line here loads a custom function for estimating contrasts and
# mimics the ESTIMATE statement in SAS.
source("R/estimate.R")2.1 Examine simulated data
| ID | odd | severity | priormed | race | Y0 | A1 | R | NRtime | adherence | Y1 | A2 | Y2 | cell |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 2.88 | 0 | 1 | 2.32 | -1 | 0 | 4 | 0 | 2.79 | 1 | 0.598 | C |
| 2 | 0 | 4.13 | 0 | 0 | 2.07 | 1 | 1 | NA | 1 | 2.20 | NA | 4.267 | D |
| 3 | 1 | 5.57 | 0 | 1 | 1.00 | -1 | 1 | NA | 0 | 2.29 | NA | 1.454 | A |
| 4 | 0 | 4.93 | 0 | 1 | 3.23 | 1 | 0 | 4 | 0 | 3.05 | -1 | 6.762 | E |
| 5 | 1 | 5.50 | 0 | 1 | 1.48 | 1 | 0 | 6 | 0 | 1.73 | -1 | 3.580 | E |
| 6 | 0 | 5.50 | 0 | 1 | 1.72 | 1 | 0 | 3 | 0 | 2.40 | 1 | 2.075 | F |
3 Analysis code for Aim Type 1
3.1 Regression model
\[ E\left[Y_2(A_1) | \mathbf{X}\right] = \beta_0 + \beta_1 A_{1} + \beta_2 X_{1c} + \beta_3 X_{2c} + \beta_4 X_{3c} + \beta_5 X_{4c} \]
3.2 Step-by-step workflow and R code syntax
Step 1. Contrast code stage-1 and stage-2 treatment indicators (done for you) and center each baseline covariate by its own grand mean
dat_smart <- dat_smart %>%
# Center each baseline covariate by its own grand mean
mutate(odd_c = odd - mean(odd), # this creates X_1c
severity_c = severity - mean(severity), # this creates X_2c
priormed_c = priormed - mean(priormed), # this creates X_3c
race_c = race - mean(race)) # this creates X_4cStep 2. Estimate parameters in regression model
We used Generalized Estimating Equations (GEE) to estimate the model for the mean under each first-stage intervention option; obtaining the robust standard error was simple: we only needed to ensure that std.err = "san.se" in the geeglm function.
Call:
geeglm(formula = Y2 ~ A1 + odd_c + severity_c + priormed_c +
race_c, data = dat_smart, id = ID, std.err = "san.se")
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 2.8894 0.1365 447.77 <2e-16 ***
A1 0.3758 0.1471 6.53 0.011 *
odd_c -0.6609 0.2903 5.18 0.023 *
severity_c -0.0695 0.0688 1.02 0.312
priormed_c -0.1827 0.3469 0.28 0.598
race_c 0.5632 0.3926 2.06 0.151
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation structure = independence
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 2.79 0.289
Number of clusters: 150 Maximum cluster size: 1
\[ \begin{split} \widehat{E}\left[Y_2(A_1) | \mathbf{X}\right] &= \widehat{\beta_0} + \widehat{\beta_1} A_1 \\ &+ \widehat{\beta_2} X_{1c} + \widehat{\beta_3} X_{2c} + \widehat{\beta_4} X_{3c} + \widehat{\beta_5} X_{4c} \end{split} \]
Correspondence between output and equation
| Quantity | Value |
|---|---|
| \(\widehat{\beta_0}\) | 2.8894 |
| \(\widehat{\beta_1}\) | 0.3758 |
| \(\widehat{\beta_2}\) | -0.6609 |
| \(\widehat{\beta_3}\) | -0.0695 |
| \(\widehat{\beta_4}\) | -0.1827 |
| \(\widehat{\beta_5}\) | 0.5632 |
Step 3. Estimate key quantities of interest
We typically have the mean of \(Y_2\) under each first-stage intervention option and the main effect of first-stage intervention options as our key quantities of interest.
# The step above creates three rows and six columns in L.
L <- rbind(
# The 1st line can be thought of as a set of instructions
# to tell the computer to calculate b0+b1
"Mean Y2 under A1=+1 (BMOD)" = c(1, 1, 0, 0, 0, 0),
# The 2nd line can be thought of as a set of instructions
# to tell the computer to calculate b0-b1
"Mean Y2 under A1=-1 (MED)" = c(1, -1, 0, 0, 0, 0),
# The 3rd line can be thought of as a set of instructions
# to tell the computer to calculate 2*b1
"Main effect using full sample" = c(0, 2, 0, 0, 0, 0))# If one wishes to estimate more quantities, one may simply add a new row to L
# before triggering the execution of the estimation step (i.e., the next code
# snippet). This new row will have to have exactly six columns since our
# regression model of interest has exactly six parameters (including the
# intercept term).
print(L) [,1] [,2] [,3] [,4] [,5] [,6]
Mean Y2 under A1=+1 (BMOD) 1 1 0 0 0 0
Mean Y2 under A1=-1 (MED) 1 -1 0 0 0 0
Main effect using full sample 0 2 0 0 0 0
Estimate 95% LCL 95% UCL SE p-value
Mean Y2 under A1=+1 (BMOD) 3.2653 2.8793 3.6512 0.197 <1e-04 ***
Mean Y2 under A1=-1 (MED) 2.5136 2.1128 2.9143 0.204 <1e-04 ***
Main effect using full sample 0.7517 0.1750 1.3284 0.294 0.0106 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Workflow for Aim Type 1 is completed.
This output says that:
- The estimated mean of \(Y_2\) under \(A_1=+1\) (BMOD) is
3.2653 - The estimated mean of \(Y_2\) under \(A_1=-1\) (MED) is
2.5136 - The estimated main effect of first-stage intervention options is
0.7517
4 Analysis code for Aim Type 2
4.1 Regression model
\[ E\left[Y_2(A_2) | \mathbf{X}, R = 0\right] = \beta_0 + \beta_1 A_{2} + \beta_2 X_{1cNR} + \beta_3 X_{2cNR} + \beta_4 X_{3cNR} + \beta_5 X_{4cNR} \]
4.2 Step-by-step workflow and R code syntax
Step 1. Contrast code stage-1 and stage-2 treatment indicators (done for you) and center each baseline covariate by its own mean among non-responders
dat_smart_nonresponders <- dat_smart %>%
# In all subsequent steps,
# we only retain non-responders' observations
filter(R == 0) %>%
# Center each baseline covariate by its own
# mean conditional on response
mutate(odd_cNR = odd - mean(odd), # this creates X_1cNR
severity_cNR = severity - mean(severity), # this creates X_2cNR
priormed_cNR = priormed - mean(priormed), # this creates X_3cNR
race_cNR = race - mean(race)) # this creates X_4cNRStep 2. Estimate parameters in regression model
Call:
geeglm(formula = Y2 ~ A2 + odd_cNR + severity_cNR + priormed_cNR +
race_cNR, data = dat_smart_nonresponders, id = ID, std.err = "san.se")
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 2.7151 0.1643 273.06 <2e-16 ***
A2 -0.4200 0.1692 6.16 0.013 *
odd_cNR -0.8406 0.3486 5.81 0.016 *
severity_cNR 0.0454 0.0893 0.26 0.611
priormed_cNR -0.2806 0.3578 0.62 0.433
race_cNR 0.4007 0.5242 0.58 0.445
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation structure = independence
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 2.75 0.297
Number of clusters: 101 Maximum cluster size: 1
\[ \begin{split} \widehat{E}\left[Y_2(A_2) | \mathbf{X}\right] &= \widehat{\beta_0} + \widehat{\beta_1} A_2 \\ &+ \widehat{\beta_2} X_{1cNR} + \widehat{\beta_3} X_{2cNR} + \widehat{\beta_4} X_{3cNR} + \widehat{\beta_5} X_{4cNR} \end{split} \]
Correspondence between output and equation
| Quantity | Value |
|---|---|
| \(\widehat{\beta_0}\) | 2.7151 |
| \(\widehat{\beta_1}\) | -0.4200 |
| \(\widehat{\beta_2}\) | -0.8406 |
| \(\widehat{\beta_3}\) | 0.0454 |
| \(\widehat{\beta_4}\) | -0.2806 |
| \(\widehat{\beta_5}\) | 0.4007 |
Step 3. Estimate key quantities of interest
We typically have the mean of \(Y_2\) among non-responders under each second-stage intervention option and the main effect of second-stage intervention options among non-responders as our key quantities of interest.
L <- rbind(
# The 1st line can be thought of as a set of instructions to tell the
# computer to calculate b0+b1
"Mean Y2 under A2=+1 (Intensify)" = c(1, 1, 0, 0, 0, 0),
# The 2nd line can be thought of as a set of instructions to tell the
# computer to calculate b0-b1
"Mean Y2 under A2=-1 (Augment)" = c(1, -1, 0, 0, 0, 0),
# The 3rd line can be thought of as a set of instructions to tell the
# computer to calculate 2*b1
"Main effect using non-responders" = c(0, 2, 0, 0, 0, 0)) [,1] [,2] [,3] [,4] [,5] [,6]
Mean Y2 under A2=+1 (Intensify) 1 1 0 0 0 0
Mean Y2 under A2=-1 (Augment) 1 -1 0 0 0 0
Main effect using non-responders 0 2 0 0 0 0
Estimate 95% LCL 95% UCL SE p-value
Mean Y2 under A2=+1 (Intensify) 2.2952 1.8181 2.7722 0.243 <1e-04 ***
Mean Y2 under A2=-1 (Augment) 3.1351 2.6881 3.5821 0.228 <1e-04 ***
Main effect using non-responders -0.8399 -1.5032 -0.1767 0.338 0.0131 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Workflow for Aim Type 2 is completed.
This output says that:
- The estimated mean of \(Y_2\) under \(A_2=+1\) (Intensify) is
2.2952 - The estimated mean of \(Y_2\) under \(A_2=-1\) (Augment) is
3.1351 - The estimated main effect of second-stage intervention options among non-responders to first-stage intervention options?
-0.8399
5 Analysis code for Aim Type 3
5.1 Regression model
\[ E\left[Y_2(A_1, A_{2NR}) | \mathbf{X}\right] = \beta_0 + \beta_1 A_{1} + \beta_2 A_2 + \beta_3 A_1 A_2 + \beta_4 X_{1c} + \beta_5 X_{2c} + \beta_6 X_{3c} + \beta_7 X_{4c} \]
5.2 Step-by-step workflow and R code syntax
Step 1. Contrast code stage-1 and stage-2 treatment indicators (done for you) and center each baseline covariate by its own grand mean
dat_smart <- dat_smart %>%
# Center each baseline covariate by its own grand mean
mutate(odd_c = odd - mean(odd), # this creates X_1c
severity_c = severity - mean(severity), # this creates X_2c
priormed_c = priormed - mean(priormed), # this creates X_3c
race_c = race - mean(race)) # this creates X_4cStep 2. Create weights
The actual numeric value of the weights (i.e., 2 and 4) can be obtained “by hand” by calculating the inverse (reciprocal) of the probability of being assigned to a particular adaptive intervention.
For responders, \(W_i = \dfrac{1}{\mathbb{P}\left[A1=A1_i\right]\mathbb{P}\left[A2=A2_i\mid A1=A1_i, R=1\right]} = \dfrac{1}{\left(\frac{1}{2}\right)\left(1\right)} =2\)
For non-responders, \(W_i = \dfrac{1}{\mathbb{P}\left[A1=A1_i\right]\mathbb{P}\left[A2=A2_i\mid A1=A1_i, R=0\right]} = \dfrac{1}{\left(\frac{1}{2}\right)\left(\frac{1}{2}\right)} = 4\)
Step 3. Create new dataset with replicated rows for responders
We restructure the original dataset such that instead of one observation per responder, the new dataset includes two copies of each responder’s observation. Non-responders’ observations will remain intact (preserved) in the replication step.
We first save non-responders’ observations into a dataset of their own.
Restructure the dataset: STEP 1
We next save responders’ observations into a dataset of their own.
Restructure the dataset: STEP 2
Recall that A2 was coded as NA in the original dataset. The key in the replication step is to replace NA in A2 by +1 in the first copy and NA in A2 by -1 in the second copy.
# In the next two lines of code, we create two copies of rows_to_replicate.
# For the first copy (see next line) assign a value of +1 to A2.
dat_plus_one_pseudodata <- dat_rows_to_replicate %>%
mutate(A2 = +1)
# For the second copy (see next line) assign a value of -1 to A2.
dat_minus_one_pseudodata <- dat_rows_to_replicate %>%
mutate(A2 = -1)Restructure the dataset: STEP 3
# Inspect a couple of rows to verify that each non-responder has one row and a
# weight of 4
dat_smart_replicated %>%
filter(R == 0) %>%
arrange(ID) %>%
select(ID, odd_c, severity_c, priormed_c, race_c, A1, R, A2, design_weights,
Y2, cell) %>%
head(., n = 4)| ID | odd_c | severity_c | priormed_c | race_c | A1 | R | A2 | design_weights | Y2 | cell |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.593 | -1.891 | -0.3 | 0.153 | -1 | 0 | 1 | 4 | 0.598 | C |
| 4 | -0.407 | 0.161 | -0.3 | 0.153 | 1 | 0 | -1 | 4 | 6.762 | E |
| 5 | 0.593 | 0.732 | -0.3 | 0.153 | 1 | 0 | -1 | 4 | 3.580 | E |
| 6 | -0.407 | 0.727 | -0.3 | 0.153 | 1 | 0 | 1 | 4 | 2.075 | F |
Done restructuring
Sanity check: Non-responders
# Inspect a couple of rows to verify that each responder has two rows (which
# are exact copies of each other, except for the value of A2!) and a weight of
# 2
dat_smart_replicated %>%
filter(R == 1) %>%
arrange(ID) %>%
select(ID, odd_c, severity_c, priormed_c, race_c, A1, R, A2, design_weights,
Y2, cell) %>%
head(., n = 4)| ID | odd_c | severity_c | priormed_c | race_c | A1 | R | A2 | design_weights | Y2 | cell |
|---|---|---|---|---|---|---|---|---|---|---|
| 2 | -0.407 | -0.638 | -0.3 | -0.847 | 1 | 1 | 1 | 2 | 4.27 | D |
| 2 | -0.407 | -0.638 | -0.3 | -0.847 | 1 | 1 | -1 | 2 | 4.27 | D |
| 3 | 0.593 | 0.798 | -0.3 | 0.153 | -1 | 1 | 1 | 2 | 1.45 | A |
| 3 | 0.593 | 0.798 | -0.3 | 0.153 | -1 | 1 | -1 | 2 | 1.45 | A |
Sanity check: Responders
Center covariates before weighting and replicating and not after weighting and replicating. In other words, if Step 1 was performed after both Steps 2 and 3, then estimates of the treatment effect will not necessarily be correct!
Step 4. Estimate parameters in regression model
Reminder: regression model \[ \begin{split} E\left[Y_2(A_1, A_2) | \mathbf{X}\right] &= \beta_0 + \beta_1 A_{1} + \beta_2 A_2 + \beta_3 A_1 A_2 \\ &+ \beta_4 X_{1c} + \beta_5 X_{2c} + \beta_6 X_{3c} + \beta_7 X_{4c} \end{split} \]
# We want to account for the fact that replicated responders refer to the same
# person To let geeglm know that the replicated observations belong to the same
# person we need to make sure that they are next to each other in the dataset.
# It's easy to forget this, and (if you do forget) geeglm won't necessarily
# warn you. If you forget, then this may affect your variance estimation
# Arrange our new (replicated) dataset by ID
dat_smart_replicated <- dat_smart_replicated %>%
arrange(ID)model <- geeglm(Y2 ~ A1 + A2 + I(A1*A2)
+ odd_c + severity_c + priormed_c + race_c,
# specify which column contains the participant ID's
id = ID,
# remember to use the weighted and replicated dataset
data = dat_smart_replicated,
# remember to weight each row appropriately
weights = design_weights,
# ask the computer to treat replicates as independent units
corstr = "independence",
# ask the computer to give you robust standard errors
std.err = "san.se")
Call:
geeglm(formula = Y2 ~ A1 + A2 + I(A1 * A2) + odd_c + severity_c +
priormed_c + race_c, data = dat_smart_replicated, weights = design_weights,
id = ID, corstr = "independence", std.err = "san.se")
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 2.9142 0.1326 483.31 <2e-16 ***
A1 0.4209 0.1415 8.85 0.0029 **
A2 -0.3473 0.1110 9.79 0.0018 **
I(A1 * A2) -0.1070 0.1111 0.93 0.3352
odd_c -0.6989 0.2871 5.92 0.0149 *
severity_c -0.0694 0.0662 1.10 0.2950
priormed_c -0.1278 0.3400 0.14 0.7069
race_c 0.5673 0.3739 2.30 0.1292
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation structure = independence
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 2.66 0.262
Number of clusters: 150 Maximum cluster size: 2
\[ \begin{split} \widehat{E}\left[Y_2(A_1, A_2) | \mathbf{X}\right] &= \widehat{\beta_0} + \widehat{\beta_1} A_{1} + \widehat{\beta_2} A_2 + \widehat{\beta_3} A_1 A_2 \\ &+ \widehat{\beta_4} X_{1c} + \widehat{\beta_5} X_{2c} + \widehat{\beta_6} X_{3c} + \widehat{\beta_7} X_{4c} \end{split} \]
Correspondence between output and equation
| Quantity | Value |
|---|---|
| \(\widehat{\beta_0}\) | 2.9142 |
| \(\widehat{\beta_1}\) | 0.4209 |
| \(\widehat{\beta_2}\) | -0.3473 |
| \(\widehat{\beta_3}\) | -0.1070 |
| \(\widehat{\beta_4}\) | -0.6989 |
| \(\widehat{\beta_5}\) | -0.694 |
| \(\widehat{\beta_6}\) | -0.1278 |
| \(\widehat{\beta_7}\) | 0.5673 |
Step 5. Obtain estimated means under each embedded AI
Reminder: regression model \[ \begin{split} E\left[Y_2(A_1, A_2) | \mathbf{X}\right] &= \beta_0 + \beta_1 A_{1} + \beta_2 A_2 + \beta_3 A_1 A_2 \\ &+ \beta_4 X_{1c} + \beta_5 X_{2c} + \beta_6 X_{3c} + \beta_7 X_{4c} \end{split} \]
Reminder: how variables were coded
- \(A_1 = 1\) (BMOD), \(A_1 = -1\) (MED)
- \(A_2 = 1\) (INTENSIFY), \(A_2 = -1\) (AUGMENT)
Contrast coding logic: Mean under (MED, AUGMENT) \[ \begin{split} E\left[Y_2({\color{seagreen}{-1}}, {\color{seagreen}{-1}}) | \mathbf{X}\right] &= \beta_0 {\color{royalblue}{(1)}} + \beta_1 {\color{royalblue}{(-1)}} + \beta_2 {\color{royalblue}{(-1)}} + \beta_3 {\color{royalblue}{(1)}} \\ &+ \beta_4 {\color{royalblue}{(0)}} + \beta_5 {\color{royalblue}{(0)}} + \beta_6 {\color{royalblue}{(0)}} + \beta_7 {\color{royalblue}{(0)}} \end{split} \]
Contrast coding logic: Mean under (BMOD, AUGMENT) \[ \begin{split} E\left[Y_2({\color{seagreen}{1}}, {\color{seagreen}{-1}}) | \mathbf{X}\right] &= \beta_0 {\color{royalblue}{(1)}} + \beta_1 {\color{royalblue}{(1)}} + \beta_2 {\color{royalblue}{(-1)}} + \beta_3 {\color{royalblue}{(-1)}} \\ &+ \beta_4 {\color{royalblue}{(0)}} + \beta_5 {\color{royalblue}{(0)}} + \beta_6 {\color{royalblue}{(0)}} + \beta_7 {\color{royalblue}{(0)}} \end{split} \]
Correspondence between output and equation
| Quantity | Value |
|---|---|
| \[ \widehat{E}\left[Y_2({\color{seagreen}{-1}}, {\color{seagreen}{-1}}) | \mathbf{X}\right] \] | 2.734 |
| \[ \widehat{E}\left[Y_2({\color{seagreen}{1}}, {\color{seagreen}{-1}}) | \mathbf{X}\right] \] | 3.789 |
| \[ \widehat{E}\left[Y_2({\color{seagreen}{-1}}, {\color{seagreen}{1}}) | \mathbf{X}\right] \] | 2.253 |
| \[ \widehat{E}\left[Y_2({\color{seagreen}{1}}, {\color{seagreen}{1}}) | \mathbf{X}\right] \] | 2.881 |
# L is user-specified with
# -- number of rows = number of AI's
# -- number of columns = number of parameters in regression model
L <- rbind(
# These statements get the contrast corresponding to
# the mean end-of-study outcome under each embedded AI
"AI#1 (MED, AUGMENT)" = c(1, -1, -1, 1, rep(0,4)),
"AI#2 (BMOD, AUGMENT)" = c(1, 1, -1, -1, rep(0,4)),
"AI#3 (MED, INTENSIFY)" = c(1, -1, 1, -1, rep(0,4)),
"AI#4 (BMOD, INTENSIFY)" = c(1, 1, 1, 1, rep(0,4)))# The function `estimate` has two user-specified arguments: -- fit: an output
# of geeglm, the function we used to estimate the parameters of our regression
# model of interest -- combos: contrasts of interest, encoded into a matrix
est_contrasts <- estimate(fit = model, combos = L)
print(est_contrasts) Estimate 95% LCL 95% UCL SE p-value
AI#1 (MED, AUGMENT) 2.734 2.303 3.164 0.220 <1e-04 ***
AI#2 (BMOD, AUGMENT) 3.789 3.348 4.231 0.225 <1e-04 ***
AI#3 (MED, INTENSIFY) 2.253 1.694 2.812 0.285 <1e-04 ***
AI#4 (BMOD, INTENSIFY) 2.881 2.367 3.394 0.262 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Step 6: Obtain estimated effect
Correspondence between output and equation
| Quantity | Value |
|---|---|
| \[ \begin{split}&\widehat{E}\left[Y_2({\color{seagreen}{1}}, {\color{seagreen}{-1}}) | \mathbf{X}\right] - \widehat{E}\left[Y_2({\color{seagreen}{-1}}, {\color{seagreen}{-1}}) | \mathbf{X}\right] \\&= \widehat{\beta_1} {\color{royalblue}{(2)}} + \widehat{\beta_3} {\color{royalblue}{(-2)}} \\\end{split} \] | 1.056 |
Let’s suppose that the primary pairwise comparison of interest is the causal effect of (BMOD, AUGMENT) versus (MED, AUGMENT).
5.3 Visualize 95% CI’s
Visual check: does the 95% CI cross zero (horizontal dashed blue line)?
Workflow for Aim Type 3 is completed.