--- title: "Models and model comparisons" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Models and model comparisons} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Implemented model types The risks package implements all major regression models that have been proposed for relative risks and risk differences. By default (`approach = "auto"`), `riskratio()` and `riskdiff()` estimate the most efficient valid model that converges; in more numerically challenging cases, they default to marginal standardization, which does not return parameters for covariates. The following models are implemented in the risks package: #^1^ | `approach =`| RR | RD | Model | Reference --|-------------|-----|-----|-----------------------------|---------------------------------- 1 | `glm` | `riskratio` | `riskdiff` | Binomial model with a log or identity link | Wacholder S. Binomial regression in GLIM: Estimating risk ratios and risk differences. [Am J Epidemiol 1986;123:174-184](https://pubmed.ncbi.nlm.nih.gov/3509965). 2 | `glm_startp` | `riskratio` | `riskdiff` | Binomial model with a log or identity link, convergence-assisted by starting values from Poisson model | Spiegelman D, Hertzmark E. Easy SAS calculations for risk or prevalence ratios and differences. [Am J Epidemiol 2005;162:199-200](https://pubmed.ncbi.nlm.nih.gov/15987728). 3 | `margstd_delta` | `riskratio` | `riskdiff` | Marginally standardized estimates using binomial models with a logit link (logistic model) with standard errors calculated via the delta method. | This package. 4 | `margstd_boot` | `riskratio` | `riskdiff` | Marginally standardized estimates using binomial models with a logit link (logistic model) with bias-corrected accelerated (BC~a~) confidence intervals from parametric bootstrapping (see [Marginal standardization](margstd.html)). | This package. \ For marginal standardization with *nonparametric* bootstrapping, see: Localio AR, Margolis DJ, Berlin JA. Relative risks and confidence intervals were easily computed indirectly from multivariable logistic regression. [J Clin Epidemiol 2007;60(9):874-82](https://pubmed.ncbi.nlm.nih.gov/17689803). -- | `glm_cem` | `riskratio` | --- | Binomial model with log-link fitted via combinatorial expectation maximization instead of Fisher scoring | Donoghoe MW, Marschner IC. logbin: An R Package for Relative Risk Regression Using the Log-Binomial Model. [J Stat Softw 2018;86(9)](http://dx.doi.org/10.18637/jss.v086.i09). -- | `glm_cem` | --- | `riskdiff` | Additive binomial model (identity link) fitted via combinatorial expectation maximization instead of Fisher scoring | Donoghoe MW, Marschner IC. Stable computational methods for additive binomial models with application to adjusted risk differences. [Comput Stat Data Anal 2014;80:184-96](https://doi.org/10.1016/j.csda.2014.06.019). --| `robpoisson` | `riskratio` | `riskdiff` | Log-linear (Poisson) model with robust/sandwich/empirical standard errors | Zou G. A modified Poisson regression approach to prospective studies with binary data. [Am J Epidemiol 2004;159(7):702-6](https://pubmed.ncbi.nlm.nih.gov/15033648) --| `duplicate` | `riskratio` | -- | Case-duplication approach, fitting a logistic model with cluster-robust standard errors | Schouten EG, Dekker JM, Kok FJ, Le Cessie S, Van Houwelingen HC, Pool J, Vandenbroucke JP. Risk ratio and rate ratio estimation in case-cohort designs: hypertension and cardiovascular mortality. [Stat Med 1993;12:1733–45](https://pubmed.ncbi.nlm.nih.gov/8248665). --| `glm_startd` | `riskratio` | --- | Binomial model with a log link, convergence-assisted by starting values from case-duplication logistic model | This package. --| `logistic` | `riskratio`, for comparison only | --- | Binomial model with logit link (*i.e.*, the logistic model), returning odds ratios | Included for comparison purposes only. ^1^ Indicates the priority with which the legacy modelling strategy (`approach = "legacy"`) attempts model fitting (`glm_startp`: only for RR). Which model was fitted is always indicated in the first line of the output of `summary()` and in the `model` column of `tidy()`. In methods sections of manuscripts, the approach can be described in detail as follows: "Risk ratios (or risk differences) were obtained via (method listed in the first line of model `summary.risks(...)`) using the `risks` R package (reference to this package and/or the article listed in the column 'reference')." For example: "Risk ratios were obtained from binomial models with a log link, convergence-assisted by Poisson models (ref. Spiegelman and Hertzmark, AJE 2005), using the `risks` R package (https://stopsack.github.io/risks/)." # Model choice By default, automatic model fitting (`approach = "auto"`) reports results from marginal standardization using a logistic model with delta method standard errors (equivalent to `approach = "margstd_delta"`). An exception is made if interaction terms between exposure and confounders are included. This case, confidence intervals are calculated using bootstrapping (equivalent to requesting `approach = "margstd_boot"`). Alternatively, any of the options listed under `approach =` in the table can be requested directly. However, unlike with `approach = "auto"` (the default) or `approach = "legacy"`, the selected model may not converge. We load the same example data as in the [Get Started vignette](risks.html#an-example-cohort-study). ```{r load, message = FALSE} library(risks) # provides riskratio(), riskdiff(), postestimation functions library(dplyr) # For data handling library(broom) # For tidy() model summaries data(breastcancer) ``` We then select a binomial model with starting values from the Poisson model: ```{r selectapproach} riskratio(formula = death ~ stage + receptor, data = breastcancer, approach = "glm_startp") ``` \ However, the binomial model without starting values (`approach = "glm"`) does not converge, as expected. # Model comparisons With `approach = "all"`, all model types listed in the tables are fitted. The fitted object, *e.g.*, `fit`, is one of the converged models. A summary of the convergence status of all models is displayed at the beginning of `summary(fit)`: ```{r allmodels} fit_all <- riskratio(formula = death ~ stage + receptor, data = breastcancer, approach = "all") summary(fit_all) ``` \ Individual models can be accessed as `fit$all_models[[1]]` through `fit$all_models[[6]]` (or `[[7]]` if fitting a risk ratio model). `tidy()` shows coefficients and confidence intervals from all models that converged: ```{r allmodels2} tidy(fit_all) %>% select(-statistic, -p.value) %>% print(n = 50) ```