--- title: "Introduction to the risks package: Get Started" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to the risks package: Get Started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Installation Once risks is released on CRAN, `install.packages("risks")` will be available. Currently, the development version of `risks` can be installed from [GitHub](https://github.com/) using: ```r # install.packages("remotes") # The "remotes" package needs to be installed remotes::install_github("stopsack/risks") ``` # An example cohort study We use a cohort of women with breast cancer, as used by Spiegelman and Hertzmark ([Am J Epidemiol 2005](https://pubmed.ncbi.nlm.nih.gov/15987728)) and Greenland ([Am J Epidemiol 2004](https://pubmed.ncbi.nlm.nih.gov/15286014)). The the categorical exposure is `stage`, the binary outcome is `death`, and the binary confounder is `receptor`. ```{r load, message = FALSE} library(risks) # provides riskratio(), riskdiff(), postestimation functions library(dplyr) # For data handling library(broom) # For tidy() model summaries data(breastcancer) # Load sample data breastcancer %>% # Display the sample data group_by(receptor, stage) %>% summarize(deaths = sum(death), total = n(), risk = deaths/total) ``` # Adjusted risk ratios The risk of death is higher among women with higher-stage and hormone receptor-low cancers, which also tend to be of higher stage. Using `risks` models to obtain (possibly multivariable-adjusted) risk ratios or risk differences is similar to the standard code for logistic models in R. As customary in R, log(RR) is returned; see below for how to obtain exponentiated values. No options for model `family` or `link` need to be supplied: ```{r basic} fit_rr <- riskratio(formula = death ~ stage + receptor, data = breastcancer) summary(fit_rr) ``` # Adjusted risk differences To obtain risk differences, use `riskdiff`, which has the same syntax: ```{r basic2} fit_rd <- riskdiff(formula = death ~ stage + receptor, data = breastcancer) summary(fit_rd) ``` For example, the risk of death was 57 percentage points higher in women with stage III breast cancer compared to stage I (95% confidence interval, 38 to 77 percentage points), adjusting for hormone receptor status. The model summary in `riskratio()` and `riskdiff()` includes to two additions compared to a regular `glm()` model: * In the first line of `summary()`, the type of risk regression model is displayed (in the example, "`Risk ratio model, fitted as binomial model...`"). * At the end of the output, confidence intervals for the model coefficients are printed. # Accessing model coefficients The risks package provides an interface to `broom::tidy()`, which returns a data frame of all coefficients (risk differences in this example), their standard errors, *z* statistics, and confidence intervals. ```{r basic3} tidy(fit_rd) ``` \ In accordance with `glm()` standards, coefficients for relative risks are shown on the logarithmic scale. Exponentiated coefficients for risk ratios are easily obtained via `tidy(..., exponentiate = TRUE)`: ```{r basic4} tidy(fit_rr, exponentiate = TRUE) ``` For example, the risk of death was 6 times higher in women with stage III breast cancer compared to stage I (95% confidence interval, 3 to 13 times), adjusting for hormone receptor status. # More post-estimation commands Typical R functions that build on regression models can further process fitted `risks` models. In addition to `tidy()`, examples are: * `coef(fit)` or `coefficients(fit)` return model coefficients (*i.e.*, RDs or log(RR)) as a numeric vector * `confint(fit, level = 0.9)` returns *90%* confidence intervals. * `predict(fit, type = "response")` or `fitted.values(fit)` return predicted probabilities of the binary outcome. * `residuals(fit)` returns model residuals.