Package 'independenceWeights' reference manual

Title:	Estimates Weights for Confounding Control for Continuous-Valued Exposures
Description:	Estimates weights to make a continuous-valued exposure statistically independent of a vector of pre-treatment covariates using the method proposed in Huling, Greifer, and Chen (2021) <arXiv:2107.07086>.
Authors:	Jared Huling [aut, cre] , Noah Greifer [aut]
Maintainer:	Jared Huling <[email protected]>
License:	MIT + file LICENSE
Version:	0.0.2
Built:	2025-02-24 02:46:23 UTC
Source:	https://github.com/jaredhuling/independenceweights

Construction of distance covariance optimal weights weights

Description

Constructs independence-inducing weights (distance covariance optimal weights) for estimation of causal quantities for continuous-valued treatments

Usage

independence_weights(
  A,
  X,
  lambda = 0,
  decorrelate_moments = FALSE,
  preserve_means = FALSE,
  dimension_adj = TRUE
)
independence_weights(
  A,
  X,
  lambda = 0,
  decorrelate_moments = FALSE,
  preserve_means = FALSE,
  dimension_adj = TRUE
)

Arguments

`A`	vector indicating the value of the treatment or exposure variable. Should be a numeric vector.
`X`	matrix of covariates with number of rows equal to the length of `A` and each column is a pre-treatment covariate to be balanced between treatment groups.
`lambda`	tuning parameter for the penalty on the sum of squares of the weights
`decorrelate_moments`	logical scalar. Whether or not to add constraints that result in exact decorrelation of weighted first order moments of `X` and `A`. Defaults to `FALSE`.
`preserve_means`	logical scalar. Whether or not to add constraints that result in exact preservation of weighted first order moments of `X` and `A`. Defaults to `FALSE`.
`dimension_adj`	logical scalar. Whether or not to add adjustment to energy distance terms that account for the dimensionality of `X`. Defaults to `TRUE`.

Value

An object of class "independence_weights" with elements:

`weights`	A vector of length `nrow(X)` containing the estimated sample weights
`A`	Treatment vector
`opt`	The optimization object returned by `osqp::solve_osqp()`
`objective`	The value of the objective function at its optimal value. This is the weighted dependence statistic plus any ridge penalty on the weights.
`D_unweighted`	The value of the weighted dependence distance using all weights = 1 (i.e. unweighted)
`D_w`	The value of the weighted dependence distance of Huling, et al. (2021) using the optimal estimated weights. This is the weighted dependence statistic without the ridge penalty on the weights.
`distcov_unweighted`	The unweighted distance covariance term. This is the standard distance covariance of Szekely et al (2007). This term is always equal to `D_unweighted`.
`distcov_weighted`	The weighted distance covariance term. This term itself does not directly measure weighted dependence but is a critical component of it.
`energy_A`	The weighted energy distance between `A` and its weighted version
`energy_X`	The weighted energy distance between `X` and its weighted version
`ess`	The estimated effective sample size of the weights using Kish's effective sample size formula.

An object of class "independence_weights".

`weights`	the estimated weights, the distance covariance optimal weights (DCOWs)
`A`	the treatment vector
`opt`	the object returned by whatever optimization routine was used
`objective`	the value of the optimized objective function
`distcov_unweighted`	the unweighted distance covariance between treatment and covariates
`distcov_weighted`	the weighted distance covariance between treatment and covariates
`energy_A`	the (energy) distance between the treatment distribution and the weighted treatment distribution. Smaller values mean the marginal distribution of the treatment is preserved after weighting
`energy_x`	the (energy) distance between the covariate distribution and the weighted covariate distribution. Smaller values mean the marginal distribution of the covariates is preserved after weighting
`ess`	the expected sample size after weighting. Kish's approximation is used

References

Szekely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics 35(6) 2769-2794 doi:10.1214/009053607000000505

Huling, J. D., Greifer, N., & Chen, G. (2021). Independence weights for causal inference with continuous exposures. arXiv preprint arXiv:2107.07086. https://arxiv.org/abs/2107.07086

Examples


simdat <- simulate_confounded_data(seed = 999, nobs = 500)

y <- simdat$data$Y
A <- simdat$data$A
X <- as.matrix(simdat$data[c("Z1", "Z2", "Z3", "Z4", "Z5")])

dcows <- independence_weights(A, X)

print(dcows)

# distribution of response:
quantile(y)

## create grid
trt_vec <- seq(min(simdat$data$A), 50, length.out=500)

## estimate ADRF
adrf_hat <- weighted_kernel_est(A, y, dcows$weights, trt_vec)$est

## estimate naively without weights
adrf_hat_unwtd <- weighted_kernel_est(A, y, rep(1, length(y)), trt_vec)$est

ylims <- range(c(simdat$data$Y, simdat$true_adrf(trt_vec)))
plot(x = simdat$data$A, y = simdat$data$Y, ylim = ylims, xlim = c(0,50))
## true ADRF
lines(x = trt_vec, y = simdat$true_adrf(trt_vec), col = "blue", lwd=2)
## estimated ADRF
lines(x = trt_vec, y = adrf_hat, col = "red", lwd=2)
## naive estimate
lines(x = trt_vec, y = adrf_hat_unwtd, col = "green", lwd=2)

simdat <- simulate_confounded_data(seed = 999, nobs = 500)

y <- simdat$data$Y
A <- simdat$data$A
X <- as.matrix(simdat$data[c("Z1", "Z2", "Z3", "Z4", "Z5")])

dcows <- independence_weights(A, X)

print(dcows)

# distribution of response:
quantile(y)

## create grid
trt_vec <- seq(min(simdat$data$A), 50, length.out=500)

## estimate ADRF
adrf_hat <- weighted_kernel_est(A, y, dcows$weights, trt_vec)$est

## estimate naively without weights
adrf_hat_unwtd <- weighted_kernel_est(A, y, rep(1, length(y)), trt_vec)$est

ylims <- range(c(simdat$data$Y, simdat$true_adrf(trt_vec)))
plot(x = simdat$data$A, y = simdat$data$Y, ylim = ylims, xlim = c(0,50))
## true ADRF
lines(x = trt_vec, y = simdat$true_adrf(trt_vec), col = "blue", lwd=2)
## estimated ADRF
lines(x = trt_vec, y = adrf_hat, col = "red", lwd=2)
## naive estimate
lines(x = trt_vec, y = adrf_hat_unwtd, col = "green", lwd=2)

Printing results for estimated energy balancing weights

Description

Prints results for energy balancing weights

Prints weighted energy statistics for given weights

Usage

## S3 method for class 'independence_weights'
print(x, digits = max(getOption("digits") - 3, 3), ...)

## S3 method for class 'weighted_energy_terms'
print(x, digits = max(getOption("digits") - 3, 3), ...)
## S3 method for class 'independence_weights'
print(x, digits = max(getOption("digits") - 3, 3), ...)

## S3 method for class 'weighted_energy_terms'
print(x, digits = max(getOption("digits") - 3, 3), ...)

Arguments

`x`	a fitted object from `weighted_energy_stats`
`digits`	minimal number of significant digits to print.
`...`	further arguments passed to or from `print.default`.

Value

Nothing returned

Simulation of confounded data with a continuous treatment

Description

Simulates confounded data with continuous treatment based on Vegetabile et al's simulation

Usage

simulate_confounded_data(
  seed = 1,
  nobs = 1000,
  MX1 = -0.5,
  MX2 = 1,
  MX3 = 0.3,
  A_effect = TRUE
)
simulate_confounded_data(
  seed = 1,
  nobs = 1000,
  MX1 = -0.5,
  MX2 = 1,
  MX3 = 0.3,
  A_effect = TRUE
)

Arguments

`seed`	random seed for reproducibility
`nobs`	number of observations
`MX1`	the mean of the first covariate. Defaults to -0.5, the value used in the simulations of Vegetabile, et al (2021).
`MX2`	the mean of the second and fourth covariates. Defaults to 1, the value used in the simulations of Vegetabile, et al (2021).
`MX3`	the probability that the fifth covariate (a binary covariate) is equal to 1. Defaults to 0.3, the value used in the simulations of Vegetabile, et al (2021).
`A_effect`	whether (`TRUE`) or not (`FALSE`) the treatment has a causal effect on the outcome. If `TRUE`, the setting used is that of the main text of Vegetabile, et al (2021). If `FALSE`, the setting is that used in the Appendix of Vegetabile, et al (2021).

Value

An list with elements:

`data`	A simulated dataset with `nobs` rows
`true_adrf`	A function that inputs values of the treatment `A` and outputs the true ADRF, E(Y(A)), of the data-generating mechanism used to generate `data`.

A list with the following elements

`data`	a `data.frame` with the response (`Y`), treatment (`A`), confounders (`Z1` to `Z5`), and true average dose response function `truth`
`true_adrf`	a function; true average dose response function
`original_covariates`	original, untransformed covariates in the simulation setup. Do not use, as it makes the simulation setup significantly easier.

References

Vegetabile, B. G., Griffin, B. A., Coffman, D. L., Cefalu, M., Robbins, M. W., and McCaffrey, D. F. (2021). Nonparametric estimation of population average dose-response curves using entropy balancing weights for continuous exposures. Health Services and Outcomes Research Methodology, 21(1), 69-110.

Examples


simdat <- simulate_confounded_data(seed = 999, nobs = 500)

str(simdat$data)

A <- simdat$data$A
y <- simdat$data$Y

trt_vec <- seq(min(simdat$data$A), max(simdat$data$A), length.out=500)
ylims <- range(c(simdat$data$Y, simdat$true_adrf(trt_vec)))
plot(x = simdat$data$A, y = simdat$data$Y, ylim = ylims)
lines(x = trt_vec, y = simdat$true_adrf(trt_vec), col = "blue", lwd=2)

## naive estimate of ADRF without weights
adrf_hat_unwtd <- weighted_kernel_est(A, y, rep(1, length(y)), trt_vec)$est
lines(x = trt_vec, y = adrf_hat_unwtd, col = "green", lwd=2)


simdat <- simulate_confounded_data(seed = 999, nobs = 500)

str(simdat$data)

A <- simdat$data$A
y <- simdat$data$Y

trt_vec <- seq(min(simdat$data$A), max(simdat$data$A), length.out=500)
ylims <- range(c(simdat$data$Y, simdat$true_adrf(trt_vec)))
plot(x = simdat$data$A, y = simdat$data$Y, ylim = ylims)
lines(x = trt_vec, y = simdat$true_adrf(trt_vec), col = "blue", lwd=2)

## naive estimate of ADRF without weights
adrf_hat_unwtd <- weighted_kernel_est(A, y, rep(1, length(y)), trt_vec)$est
lines(x = trt_vec, y = adrf_hat_unwtd, col = "green", lwd=2)

Calculation of weighted energy statistics for weighted dependence

Description

Calculates weighted energy statistics used to quantify weighted dependence

Usage

weighted_energy_stats(A, X, weights, dimension_adj = TRUE)
weighted_energy_stats(A, X, weights, dimension_adj = TRUE)

Arguments

`A`	treatment vector indicating values of the treatment/exposure variable.
`X`	matrix of covariates with number of rows equal to the length of `weights` and each column is a covariate
`weights`	a vector of sample weights
`dimension_adj`	logical scalar. Whether or not to add adjustment to energy distance terms that account for the dimensionality of `x`. Defaults to `TRUE`.

Value

a list with the following components

`D_w`	The value of the weighted dependence distance of Huling, et al. (2021) using the optimal estimated weights. This is the weighted dependence statistic without the ridge penalty on the weights.
`distcov_unweighted`	The unweighted distance covariance term. This is the standard distance covariance of Szekely et al (2007). This term is always equal to `D_unweighted`.
`distcov_weighted`	The weighted distance covariance term. This term itself does not directly measure weighted dependence but is a critical component of it.
`energy_A`	The weighted energy distance between `A` and its weighted version
`energy_X`	The weighted energy distance between `X` and its weighted version
`ess`	The estimated effective sample size of the weights using Kish's effective sample size formula.

An object of class "weighted_energy_terms".

`D_w`	the value of the DCOW measure
`distcov_unweighted`	the unweighted distance covariance between treatment and covariates
`distcov_weighted`	the weighted distance covariance between treatment and covariates
`energy_A`	the (energy) distance between the treatment distribution and the weighted treatment distribution. Smaller values mean the marginal distribution of the treatment is preserved after weighting
`energy_x`	the (energy) distance between the covariate distribution and the weighted covariate distribution. Smaller values mean the marginal distribution of the covariates is preserved after weighting
`ess`	the expected sample size after weighting. Kish's approximation is used

References

Szekely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics 35(6) 2769-2794 doi:10.1214/009053607000000505

Huling, J. D., Greifer, N., & Chen, G. (2021). Independence weights for causal inference with continuous exposures. arXiv preprint arXiv:2107.07086. https://arxiv.org/abs/2107.07086

Examples


simdat <- simulate_confounded_data(seed = 999, nobs = 100)

str(simdat$data)

A <- simdat$data$A
X <- as.matrix(simdat$data[c("Z1", "Z2", "Z3", "Z4", "Z5")])

wts <- runif(length(A))

weighted_energy_stats(A, X, wts)

simdat <- simulate_confounded_data(seed = 999, nobs = 100)

str(simdat$data)

A <- simdat$data$A
X <- as.matrix(simdat$data[c("Z1", "Z2", "Z3", "Z4", "Z5")])

wts <- runif(length(A))

weighted_energy_stats(A, X, wts)

Calculation of weighted nonparametric regression estimate of the dose response function

Description

Calculates weighted nonparametric regression estimate of the causal average dose response function

Usage

weighted_kernel_est(A, y, weights, Aseq)
weighted_kernel_est(A, y, weights, Aseq)

Arguments

`A`	vector indicating the value of the treatment or exposure variable. Should be a numeric vector.
`y`	vector of responses
`weights`	a vector of sample weights of length equal to the length of `y`
`Aseq`	a vector of new points for which to obtain estimates of E(Y(a))

Value

A list with the following elements

`fit`	A fitted model object from the `lp` function
`estimated`	a vector of estimates of a causal ADRF at the values of the treatment specified by `Aseq`

Package 'independenceWeights'

Help Index

Construction of distance covariance optimal weights weights

Description

Usage

Arguments

Value

References

See Also

Examples

Printing results for estimated energy balancing weights

Description

Usage

Arguments

Value

See Also

Simulation of confounded data with a continuous treatment

Description

Usage

Arguments

Value

References

Examples

Calculation of weighted energy statistics for weighted dependence

Description

Usage

Arguments

Value

References

Examples

Calculation of weighted nonparametric regression estimate of the dose response function

Description

Usage

Arguments

Value