--- title: "Utilities for Improving Estimation Efficiency via Augmentation and for Propensity Score Estimation" author: "Jared Huling" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: fig_width: 7 fig_height: 5 toc: true toc_depth: 3 number_sections: true self_contained: true preamble: > \usepackage{amsmath,amssymb,amsfonts,bm,bbm,amsthm} \usepackage{longtable} \usepackage{booktabs} \usepackage{bbm, bm} \def\bsx {\boldsymbol x} \def\bsX {\boldsymbol X} \def\bsT {\boldsymbol T} \def\bsbeta {\boldsymbol \beta} \def\bsgamma {\boldsymbol \gamma} \def\bsW {\boldsymbol W} \def\bsy {\boldsymbol y} \def\bsY {\boldsymbol Y} \def\bsM {\boldsymbol M} \def\bfx {\mathbf x} \def\bfX {\mathbf X} \def\bfT {\mathbf T} \def\bfW {\mathbf W} \def\bfy {\mathbf y} \def\bfY {\mathbf Y} \def\bfM {\mathbf M} \def\bfU {\mathbf U} vignette: > %\VignetteIndexEntry{Utilities for Improving Estimation Efficiency via Augmentation and for Propensity Score Estimation} %\VignetteEngine{knitr::rmarkdown} --- # Efficiency augmentation To demonstrate how to use efficiency augmentation and the propensity score utilities available in the `personalized` package, we simulate data with two treatments. The treatment assignments are based on covariates and hence mimic an observational setting with no unmeasured confounders. ```{r loadpkg, message=FALSE, warning=FALSE} library(personalized) ``` In this simulation, the treatment assignment depends on covariates and hence we must model the propensity score $\pi(x) = Pr(T = 1 | X = x)$. In this simulation we will assume that larger values of the outcome are better. ```{r sim_data_1, message = FALSE, warning = FALSE} library(personalized) set.seed(1) n.obs <- 500 n.vars <- 10 x <- matrix(rnorm(n.obs * n.vars, sd = 3), n.obs, n.vars) # simulate non-randomized treatment xbetat <- 0.5 + 0.25 * x[,9] - 0.25 * x[,1] trt.prob <- exp(xbetat) / (1 + exp(xbetat)) trt <- rbinom(n.obs, 1, prob = trt.prob) # simulate delta delta <- (0.5 + x[,2] - 0.5 * x[,3] - 1 * x[,1] + 1 * x[,1] * x[,4] ) # simulate main effects g(X) xbeta <- 2 * x[,1] + 3 * x[,4] - 0.25 * x[,2]^2 + 2 * x[,3] + 0.25 * x[,5] ^ 2 xbeta <- xbeta + delta * (2 * trt - 1) # simulate continuous outcomes y <- drop(xbeta) + rnorm(n.obs, sd = 3) ``` # Propensity score utilities Estimation of the propensity score is a fundamental aspect of the estimation of individualized treatment rules (ITRs). The `personalized` package offers support tools for construction of the propensity score function used by the `fit.subgroup()` function. The support is via the `create.propensity.function()` function. This tool allows for estimation of the propensity score in high dimensions via `glmnet`. In high dimensions it can be important to account for regularization bias via cross-fitting (); the `create.propensity.function()` offers a cross-fitting approach for high-dimensional propensity score estimation. A basic usage of this function with cross-fitting (with 4 folds; normally we would set this larger, but have reduced it to make computation time shorter) is as follows: ```{r} # arguments can be passed to cv.glmnet via `cv.glmnet.args` prop.func <- create.propensity.function(crossfit = TRUE, nfolds.crossfit = 4, cv.glmnet.args = list(type.measure = "auc", nfolds = 3)) ``` `prop.func` can then be passed to `fit.subgroup()` as follows: We have set `nfolds` to 3 for computational reasons; it should generally be higher, such as 10. ```{r} subgrp.model <- fit.subgroup(x = x, y = y, trt = trt, propensity.func = prop.func, loss = "sq_loss_lasso", nfolds = 3) # option for cv.glmnet (for ITR estimation) summary(subgrp.model) ``` # Augmentation utilities Efficiency in estimating ITRs can be improved by including an augmentation term. The optimal augmentation term generally is a function of the main effects of the full outcome regression model marginalized over the treatment. Especially in high dimensions, regularization bias can potentially have a negative impact on performance. Cross-fitting is again another reasonable approach to circumventing this issue. Augmentation functions can be constructed (with cross-fitting as an option) via the `create.augmentation.function()` function, which works similarly as the `create.propensity.function()` function. The basic usage is as follows: ```{r} aug.func <- create.augmentation.function(family = "gaussian", crossfit = TRUE, nfolds.crossfit = 4, cv.glmnet.args = list(type.measure = "mae", nfolds = 3)) ``` We have set `nfolds` to 3 for computational reasons; it should generally be higher, such as 10. `aug.func` can be used for augmentation by passing it to `fit.subgroup()` like: ```{r} subgrp.model.aug <- fit.subgroup(x = x, y = y, trt = trt, propensity.func = prop.func, augment.func = aug.func, loss = "sq_loss_lasso", nfolds = 3) # option for cv.glmnet (for ITR estimation) summary(subgrp.model.aug) ``` # Comparing performance with augmentation We first run the training/testing procedure to assess the performance of the non-augmented estimator: ```{r} valmod <- validate.subgroup(subgrp.model, B = 3, method = "training_test", train.fraction = 0.75) valmod ``` Then we compare with the augmented estimator. Although this is based on just 3 replications, we can see that the augmented estimator is better at discriminating between benefitting and non-benefitting patients, as evidenced by the large treatment effect among those predicted to benefit (and smaller standard error) by the augmented estimator versus the smaller conditional treatment effect above. ```{r} valmod.aug <- validate.subgroup(subgrp.model.aug, B = 3, method = "training_test", train.fraction = 0.75) valmod.aug ```