The fastglm
package
is intended to be a fast and
stable alternative to the glm()
and
glm2()
functions for fitting generalized lienar models. The
The fastglm
package is compatible with R
’s
family
objects (see ?family
). The package can
be installed via
and loaded via:
Currently, the fastglm
package does not allow for
formula-based data input and is restricted to matrices. We use the
following example to demonstrate the usage of fastglm
:
data(esoph)
x <- model.matrix(cbind(ncases, ncontrols) ~ agegp + unclass(tobgp)
+ unclass(alcgp), data = esoph)
y <- cbind(esoph$ncases, esoph$ncontrols)
gfit1 <- fastglm(x = x, y = y, family = binomial(link = "cloglog"))
summary(gfit1)
#>
#> Call:
#> fastglm.default(x = x, y = y, family = binomial(link = "cloglog"))
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -4.29199 0.29777 -14.414 < 2e-16 ***
#> agegp.L 3.30677 0.63454 5.211 1.88e-07 ***
#> agegp.Q -1.35525 0.57310 -2.365 0.018 *
#> agegp.C 0.20296 0.43333 0.468 0.640
#> agegp^4 0.15056 0.29318 0.514 0.608
#> agegp^5 -0.23425 0.17855 -1.312 0.190
#> unclass(tobgp) 0.33276 0.07285 4.568 4.93e-06 ***
#> unclass(alcgp) 0.80384 0.07538 10.664 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 61.609 on 87 degrees of freedom
#> Residual deviance: 96.950 on 80 degrees of freedom
#> AIC: 228
#>
#> Number of Fisher Scoring iterations: 6
The fastglm
package does not compromise computational
stability for speed. In fact, for many situations where
glm()
and even glm2()
do not converge,
fastglm()
does converge.
As an example, consider the following data scenario, where the
response distribution is (mildly) misspecified, but the link function is
quite badly misspecified. In such scenarios, the standard IRLS algorithm
tends to have convergence issues. The glm2()
package was
designed to handle such cases, however, it still can have convergence
issues. The fastglm()
package uses a similar step-halving
technique as glm2()
, but it starts at better initialized
values and thus tends to have better convergence properties in
practice.
set.seed(1)
x <- matrix(rnorm(10000 * 100), ncol = 100)
y <- (exp(0.25 * x[,1] - 0.25 * x[,3] + 0.5 * x[,4] - 0.5 * x[,5] + rnorm(10000)) ) + 0.1
system.time(gfit1 <- fastglm(cbind(1, x), y, family = Gamma(link = "sqrt")))
#> user system elapsed
#> 0.439 0.012 0.451
system.time(gfit2 <- glm(y~x, family = Gamma(link = "sqrt")) )
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: glm.fit: algorithm did not converge
#> user system elapsed
#> 1.724 2.850 1.180
system.time(gfit3 <- glm2::glm2(y~x, family = Gamma(link = "sqrt")) )
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated due to increasing deviance
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated due to increasing deviance
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated due to increasing deviance
#> Warning: step size truncated due to increasing deviance
#> Warning: step size truncated due to increasing deviance
#> Warning: step size truncated due to increasing deviance
#> Warning: step size truncated due to increasing deviance
#> Warning: step size truncated due to increasing deviance
#> Warning: step size truncated due to increasing deviance
#> Warning: step size truncated due to increasing deviance
#> Warning: step size truncated due to increasing deviance
#> Warning: step size truncated due to increasing deviance
#> Warning: step size truncated due to increasing deviance
#> user system elapsed
#> 1.340 2.311 0.926
## Note that fastglm() returns estimates with the
## largest likelihood
logLik(gfit1)
#> 'log Lik.' -16030.81 (df=102)
logLik(gfit2)
#> 'log Lik.' -16704.05 (df=102)
logLik(gfit3)
#> 'log Lik.' -16046.66 (df=102)
coef(gfit1)[1:5]
#> (Intercept) X1 X2 X3 X4
#> 1.429256009 0.125873599 0.005321164 -0.129389740 0.238937255
coef(gfit2)[1:5]
#> (Intercept) x1 x2 x3 x4
#> 1.431168e+00 1.251936e-01 -6.896739e-05 -1.281857e-01 2.366473e-01
coef(gfit3)[1:5]
#> (Intercept) x1 x2 x3 x4
#> 1.426864e+00 1.242616e-01 -9.860241e-05 -1.254873e-01 2.361301e-01
## check convergence of fastglm
gfit1$converged
#> [1] TRUE
## number of IRLS iterations
gfit1$iter
#> [1] 17
## now check convergence for glm()
gfit2$converged
#> [1] FALSE
gfit2$iter
#> [1] 25
## check convergence for glm2()
gfit3$converged
#> [1] TRUE
gfit3$iter
#> [1] 19
## increasing number of IRLS iterations for glm() does not help that much
system.time(gfit2 <- glm(y~x, family = Gamma(link = "sqrt"), control = list(maxit = 100)) )
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: glm.fit: algorithm did not converge
#> user system elapsed
#> 6.017 9.829 4.016
gfit2$converged
#> [1] FALSE
gfit2$iter
#> [1] 100
logLik(gfit1)
#> 'log Lik.' -16030.81 (df=102)
logLik(gfit2)
#> 'log Lik.' -16121.57 (df=102)