Title: | Fast and Stable Fitting of Generalized Linear Models using 'RcppEigen' |
---|---|
Description: | Fits generalized linear models efficiently using 'RcppEigen'. The iteratively reweighted least squares implementation utilizes the step-halving approach of Marschner (2011) <doi:10.32614/RJ-2011-012> to help safeguard against convergence issues. |
Authors: | Jared Huling [aut, cre], Douglas Bates [cph], Dirk Eddelbuettel [cph], Romain Francois [cph], Yixuan Qiu [cph] |
Maintainer: | Jared Huling <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.0.3 |
Built: | 2024-10-27 04:20:28 UTC |
Source: | https://github.com/jaredhuling/fastglm |
big.matrix prod
big.matrix prod
## S4 method for signature 'big.matrix,vector' x %*% y ## S4 method for signature 'vector,big.matrix' x %*% y
## S4 method for signature 'big.matrix,vector' x %*% y ## S4 method for signature 'vector,big.matrix' x %*% y
x |
big.matrix |
y |
numeric vector |
deviance method for fastglm fitted objects
## S3 method for class 'fastglm' deviance(object, ...)
## S3 method for class 'fastglm' deviance(object, ...)
object |
fastglm fitted object |
... |
not used |
The value of the deviance extracted from the object
family method for fastglm fitted objects
## S3 method for class 'fastglm' family(object, ...)
## S3 method for class 'fastglm' family(object, ...)
object |
fastglm fitted object |
... |
not used |
returns the family of the fitted object
fast generalized linear model fitting
bigLm default
fastglm(x, ...) ## Default S3 method: fastglm( x, y, family = gaussian(), weights = NULL, offset = NULL, start = NULL, etastart = NULL, mustart = NULL, method = 0L, tol = 1e-08, maxit = 100L, ... )
fastglm(x, ...) ## Default S3 method: fastglm( x, y, family = gaussian(), weights = NULL, offset = NULL, start = NULL, etastart = NULL, mustart = NULL, method = 0L, tol = 1e-08, maxit = 100L, ... )
x |
input model matrix. Must be a matrix object |
... |
not used |
y |
numeric response vector of length nobs. |
family |
a description of the error distribution and link function to be used in the model.
For |
weights |
an optional vector of 'prior weights' to be used in the fitting process. Should be a numeric vector. |
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be a numeric vector of length equal to the number of cases |
start |
starting values for the parameters in the linear predictor. |
etastart |
starting values for the linear predictor. |
mustart |
values for the vector of means. |
method |
an integer scalar with value 0 for the column-pivoted QR decomposition, 1 for the unpivoted QR decomposition, 2 for the LLT Cholesky, or 3 for the LDLT Cholesky |
tol |
threshold tolerance for convergence. Should be a positive real number |
maxit |
maximum number of IRLS iterations. Should be an integer |
A list with the elements
coefficients |
a vector of coefficients |
se |
a vector of the standard errors of the coefficient estimates |
rank |
a scalar denoting the computed rank of the model matrix |
df.residual |
a scalar denoting the degrees of freedom in the model |
residuals |
the vector of residuals |
s |
a numeric scalar - the root mean square for residuals |
fitted.values |
the vector of fitted values |
x <- matrix(rnorm(10000 * 100), ncol = 100) y <- 1 * (0.25 * x[,1] - 0.25 * x[,3] > rnorm(10000)) system.time(gl1 <- glm.fit(x, y, family = binomial())) system.time(gf1 <- fastglm(x, y, family = binomial())) system.time(gf2 <- fastglm(x, y, family = binomial(), method = 1)) system.time(gf3 <- fastglm(x, y, family = binomial(), method = 2)) system.time(gf4 <- fastglm(x, y, family = binomial(), method = 3)) max(abs(coef(gl1) - gf1$coef)) max(abs(coef(gl1) - gf2$coef)) max(abs(coef(gl1) - gf3$coef)) max(abs(coef(gl1) - gf4$coef)) ## Not run: nrows <- 50000 ncols <- 50 bkFile <- "bigmat2.bk" descFile <- "bigmatk2.desc" bigmat <- filebacked.big.matrix(nrow=nrows, ncol=ncols, type="double", backingfile=bkFile, backingpath=".", descriptorfile=descFile, dimnames=c(NULL,NULL)) for (i in 1:ncols) bigmat[,i] = rnorm(nrows)*i y <- 1*(rnorm(nrows) + bigmat[,1] > 0) system.time(gfb1 <- fastglm(bigmat, y, family = binomial(), method = 3)) ## End(Not run)
x <- matrix(rnorm(10000 * 100), ncol = 100) y <- 1 * (0.25 * x[,1] - 0.25 * x[,3] > rnorm(10000)) system.time(gl1 <- glm.fit(x, y, family = binomial())) system.time(gf1 <- fastglm(x, y, family = binomial())) system.time(gf2 <- fastglm(x, y, family = binomial(), method = 1)) system.time(gf3 <- fastglm(x, y, family = binomial(), method = 2)) system.time(gf4 <- fastglm(x, y, family = binomial(), method = 3)) max(abs(coef(gl1) - gf1$coef)) max(abs(coef(gl1) - gf2$coef)) max(abs(coef(gl1) - gf3$coef)) max(abs(coef(gl1) - gf4$coef)) ## Not run: nrows <- 50000 ncols <- 50 bkFile <- "bigmat2.bk" descFile <- "bigmatk2.desc" bigmat <- filebacked.big.matrix(nrow=nrows, ncol=ncols, type="double", backingfile=bkFile, backingpath=".", descriptorfile=descFile, dimnames=c(NULL,NULL)) for (i in 1:ncols) bigmat[,i] = rnorm(nrows)*i y <- 1*(rnorm(nrows) + bigmat[,1] > 0) system.time(gfb1 <- fastglm(bigmat, y, family = binomial(), method = 3)) ## End(Not run)
fast generalized linear model fitting
fastglmPure( x, y, family = gaussian(), weights = rep(1, NROW(y)), offset = rep(0, NROW(y)), start = NULL, etastart = NULL, mustart = NULL, method = 0L, tol = 1e-07, maxit = 100L )
fastglmPure( x, y, family = gaussian(), weights = rep(1, NROW(y)), offset = rep(0, NROW(y)), start = NULL, etastart = NULL, mustart = NULL, method = 0L, tol = 1e-07, maxit = 100L )
x |
input model matrix. Must be a matrix object |
y |
numeric response vector of length nobs. |
family |
a description of the error distribution and link function to be used in the model.
For |
weights |
an optional vector of 'prior weights' to be used in the fitting process. Should be a numeric vector. |
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be a numeric vector of length equal to the number of cases |
start |
starting values for the parameters in the linear predictor. |
etastart |
starting values for the linear predictor. |
mustart |
values for the vector of means. |
method |
an integer scalar with value 0 for the column-pivoted QR decomposition, 1 for the unpivoted QR decomposition, 2 for the LLT Cholesky, 3 for the LDLT Cholesky, 4 for the full pivoted QR decomposition, 5 for the Bidiagonal Divide and Conquer SVD |
tol |
threshold tolerance for convergence. Should be a positive real number |
maxit |
maximum number of IRLS iterations. Should be an integer |
A list with the elements
coefficients |
a vector of coefficients |
se |
a vector of the standard errors of the coefficient estimates |
rank |
a scalar denoting the computed rank of the model matrix |
df.residual |
a scalar denoting the degrees of freedom in the model |
residuals |
the vector of residuals |
s |
a numeric scalar - the root mean square for residuals |
fitted.values |
the vector of fitted values |
set.seed(1) x <- matrix(rnorm(1000 * 25), ncol = 25) eta <- 0.1 + 0.25 * x[,1] - 0.25 * x[,3] + 0.75 * x[,5] -0.35 * x[,6] #0.25 * x[,1] - 0.25 * x[,3] y <- 1 * (eta > rnorm(1000)) yp <- rpois(1000, eta ^ 2) yg <- rgamma(1000, exp(eta) * 1.75, 1.75) # binomial system.time(gl1 <- glm.fit(x, y, family = binomial())) system.time(gf1 <- fastglmPure(x, y, family = binomial(), tol = 1e-8)) system.time(gf2 <- fastglmPure(x, y, family = binomial(), method = 1, tol = 1e-8)) system.time(gf3 <- fastglmPure(x, y, family = binomial(), method = 2, tol = 1e-8)) system.time(gf4 <- fastglmPure(x, y, family = binomial(), method = 3, tol = 1e-8)) max(abs(coef(gl1) - gf1$coef)) max(abs(coef(gl1) - gf2$coef)) max(abs(coef(gl1) - gf3$coef)) max(abs(coef(gl1) - gf4$coef)) # poisson system.time(gl1 <- glm.fit(x, yp, family = poisson(link = "log"))) system.time(gf1 <- fastglmPure(x, yp, family = poisson(link = "log"), tol = 1e-8)) system.time(gf2 <- fastglmPure(x, yp, family = poisson(link = "log"), method = 1, tol = 1e-8)) system.time(gf3 <- fastglmPure(x, yp, family = poisson(link = "log"), method = 2, tol = 1e-8)) system.time(gf4 <- fastglmPure(x, yp, family = poisson(link = "log"), method = 3, tol = 1e-8)) max(abs(coef(gl1) - gf1$coef)) max(abs(coef(gl1) - gf2$coef)) max(abs(coef(gl1) - gf3$coef)) max(abs(coef(gl1) - gf4$coef)) # gamma system.time(gl1 <- glm.fit(x, yg, family = Gamma(link = "log"))) system.time(gf1 <- fastglmPure(x, yg, family = Gamma(link = "log"), tol = 1e-8)) system.time(gf2 <- fastglmPure(x, yg, family = Gamma(link = "log"), method = 1, tol = 1e-8)) system.time(gf3 <- fastglmPure(x, yg, family = Gamma(link = "log"), method = 2, tol = 1e-8)) system.time(gf4 <- fastglmPure(x, yg, family = Gamma(link = "log"), method = 3, tol = 1e-8)) max(abs(coef(gl1) - gf1$coef)) max(abs(coef(gl1) - gf2$coef)) max(abs(coef(gl1) - gf3$coef)) max(abs(coef(gl1) - gf4$coef))
set.seed(1) x <- matrix(rnorm(1000 * 25), ncol = 25) eta <- 0.1 + 0.25 * x[,1] - 0.25 * x[,3] + 0.75 * x[,5] -0.35 * x[,6] #0.25 * x[,1] - 0.25 * x[,3] y <- 1 * (eta > rnorm(1000)) yp <- rpois(1000, eta ^ 2) yg <- rgamma(1000, exp(eta) * 1.75, 1.75) # binomial system.time(gl1 <- glm.fit(x, y, family = binomial())) system.time(gf1 <- fastglmPure(x, y, family = binomial(), tol = 1e-8)) system.time(gf2 <- fastglmPure(x, y, family = binomial(), method = 1, tol = 1e-8)) system.time(gf3 <- fastglmPure(x, y, family = binomial(), method = 2, tol = 1e-8)) system.time(gf4 <- fastglmPure(x, y, family = binomial(), method = 3, tol = 1e-8)) max(abs(coef(gl1) - gf1$coef)) max(abs(coef(gl1) - gf2$coef)) max(abs(coef(gl1) - gf3$coef)) max(abs(coef(gl1) - gf4$coef)) # poisson system.time(gl1 <- glm.fit(x, yp, family = poisson(link = "log"))) system.time(gf1 <- fastglmPure(x, yp, family = poisson(link = "log"), tol = 1e-8)) system.time(gf2 <- fastglmPure(x, yp, family = poisson(link = "log"), method = 1, tol = 1e-8)) system.time(gf3 <- fastglmPure(x, yp, family = poisson(link = "log"), method = 2, tol = 1e-8)) system.time(gf4 <- fastglmPure(x, yp, family = poisson(link = "log"), method = 3, tol = 1e-8)) max(abs(coef(gl1) - gf1$coef)) max(abs(coef(gl1) - gf2$coef)) max(abs(coef(gl1) - gf3$coef)) max(abs(coef(gl1) - gf4$coef)) # gamma system.time(gl1 <- glm.fit(x, yg, family = Gamma(link = "log"))) system.time(gf1 <- fastglmPure(x, yg, family = Gamma(link = "log"), tol = 1e-8)) system.time(gf2 <- fastglmPure(x, yg, family = Gamma(link = "log"), method = 1, tol = 1e-8)) system.time(gf3 <- fastglmPure(x, yg, family = Gamma(link = "log"), method = 2, tol = 1e-8)) system.time(gf4 <- fastglmPure(x, yg, family = Gamma(link = "log"), method = 3, tol = 1e-8)) max(abs(coef(gl1) - gf1$coef)) max(abs(coef(gl1) - gf2$coef)) max(abs(coef(gl1) - gf3$coef)) max(abs(coef(gl1) - gf4$coef))
logLik method for fastglm fitted objects
## S3 method for class 'fastglm' logLik(object, ...)
## S3 method for class 'fastglm' logLik(object, ...)
object |
fastglm fitted object |
... |
not used |
Returns an object of class logLik
Obtains predictions and optionally estimates standard errors of those predictions from a fitted generalized linear model object.
## S3 method for class 'fastglm' predict( object, newdata = NULL, type = c("link", "response"), se.fit = FALSE, dispersion = NULL, ... )
## S3 method for class 'fastglm' predict( object, newdata = NULL, type = c("link", "response"), se.fit = FALSE, dispersion = NULL, ... )
object |
a fitted object of class inheriting from " |
newdata |
a matrix to be used for prediction |
type |
the type of prediction required. The default is on the scale of the linear predictors;
the alternative " The value of this argument can be abbreviated. |
se.fit |
logical switch indicating if standard errors are required. |
dispersion |
the dispersion of the GLM fit to be assumed in computing the standard errors.
If omitted, that returned by |
... |
further arguments passed to or from other methods. |
print method for fastglm objects
## S3 method for class 'fastglm' print(x, ...)
## S3 method for class 'fastglm' print(x, ...)
x |
object to print |
... |
not used |
residuals method for fastglm fitted objects
## S3 method for class 'fastglm' residuals( object, type = c("deviance", "pearson", "working", "response", "partial"), ... )
## S3 method for class 'fastglm' residuals( object, type = c("deviance", "pearson", "working", "response", "partial"), ... )
object |
fastglm fitted object |
type |
type of residual to be returned |
... |
not used |
a vector of residuals
summary method for fastglm fitted objects
## S3 method for class 'fastglm' summary(object, dispersion = NULL, ...)
## S3 method for class 'fastglm' summary(object, dispersion = NULL, ...)
object |
fastglm fitted object |
dispersion |
the dispersion parameter for the family used.
Either a single numerical value or |
... |
not used |
a summary.fastglm object
x <- matrix(rnorm(10000 * 10), ncol = 10) y <- 1 * (0.25 * x[,1] - 0.25 * x[,3] > rnorm(10000)) fit <- fastglm(x, y, family = binomial()) summary(fit)
x <- matrix(rnorm(10000 * 10), ncol = 10) y <- 1 * (0.25 * x[,1] - 0.25 * x[,3] > rnorm(10000)) fit <- fastglm(x, y, family = binomial()) summary(fit)