stargazer.Rmd

---
title: "A Stargazer Cheatsheet"
date: "Updated `r format(Sys.time(), '%d %B, %Y')`"
output: 
  html_document:
    theme: spacelab
    keep_md: true
    toc: yes
    highlight: haddock
---

## Dataset: dplyr and nycflights13

Setting up a dataset for this cheatsheet allows me to spotlight two recent
R packages created by [Hadley Wickham](http://had.co.nz/). The first, 
[dplyr](https://github.com/hadley/dplyr), is a set of new tools for data 
manipulation. Using `dplyr`, I will extract flights and weather data from 
another new package called 
[nycflights13](https://github.com/hadley/nycflights13). With this data I 
will show how to estimate a couple of regression models and nicely format the 
results into tables with `stargazer`.

Note: `stargazer` v. 5.1 does not play nicely with `dplyr`'s tbl_df class.
As a temporary work-around I pipe the merged dataset to `data.frame`.

```{r, echo = TRUE, warning = FALSE, message = FALSE}
library("dplyr")
library("nycflights13")
library("AER") # Applied Econometrics with R
library("stargazer")

daily <- flights %>%
  filter(origin == "EWR") %>%
  group_by(year, month, day) %>%
  summarise(delay = mean(dep_delay, na.rm = TRUE))

daily_weather <- weather %>%
  filter(origin == "EWR") %>%
  group_by(year, month, day) %>%
  summarise(temp   = mean(temp, na.rm = TRUE),
            wind   = mean(wind_speed, na.rm = TRUE),
            precip = sum(precip, na.rm = TRUE))

# Merge flights with weather data frames
both <- inner_join(daily, daily_weather) %>% 
  data.frame()  # Temporary fix

# Create an indicator for quarter
both$quarter <- cut(both$month, breaks = c(0, 3, 6, 9, 12), 
                                labels = c("1", "2", "3", "4"))

# Create a vector of class logical
both$hot <- as.logical(both$temp > 85)

head(both)

```

\
We can use the `both` data frame to estimate a couple of linear models 
predicting the average delay out of Newark controlling for the weather. The 
first model will use only the weather variables and in the second I'll add 
dummy variables indicating the quarter. I also estimate a third model, using 
using the `ivreg` command from package 
[AER](http://cran.r-project.org/web/packages/AER/index.html) to demonstrate 
output with mixed models. The raw R output:

```{r, echo = TRUE}
output  <- lm(delay ~ temp + wind + precip, data = both)
output2 <- lm(delay ~ temp + wind + precip + quarter, data = both)

# Instrumental variables model 
output3 <- ivreg(delay ~ temp + wind + precip | . - temp + hot, data = both)

summary(output)
summary(output2)
summary(output3)
```

\
[Back to table of contents](#TOC)

## Quick notes

Since I'm using [knitr](http://cran.rstudio.com/web/packages/knitr/index.html) 
and [R markdown](http://rmarkdown.rstudio.com/) to create this webpage, in the 
code that follows I will include the `stargazer` option `type = "html"`. 
`stargazer` is set to produce **LaTeX** output by default. If you desire 
**LaTeX** output, just remove the type option from the code below.

\
Also, while I have added an example for many of the available `stargazer` 
options, I have not included all of them. So while you're likely to find a 
relevant example don't assume if it's not listed that `stargazer` can't do 
it. Check the documentation for additional features and updates to the package.
It is often the case that an omitted argument is specific for **LaTeX** output 
and I can't demonstrate it here.

### HTML formatting

It is possible to change the formatting of html tables generated with `stargazer` 
via an html style sheet. See the [R Markdown documentation](http://rmarkdown.rstudio.com/html_document_format.html#custom_css) 
about incorporating a custom CSS. 

\
[Back to table of contents](#TOC)

## The default summary statistics table
```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(both, type = "html")
```

\
When supplied a data frame, by default `stargazer` creates a table with summary 
statistics. If the `summary` option is set to `FALSE` then stargazer 
will instead print the contents of the data frame.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
# Use only a few rows
both2 <- both %>% slice(1:6)

stargazer(both2, type = "html", summary = FALSE, rownames = FALSE)
```

### Remove row and column names

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(both2, type = "html", summary = FALSE,
          rownames = FALSE,
          colnames = FALSE)
```

### Change which statistics are displayed

In order to customize which summary statistics are displayed, change any of the
the following (logical) parameters, `nobs`, `mean.sd`, `min.max`, `median`, and
`iqr`. 

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(both, type = "html", nobs = FALSE, mean.sd = TRUE, median = TRUE,
          iqr = TRUE)
```

### Change which statistics are displayed (a second way)

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(both, type = "html", summary.stat = c("n", "p75", "sd"))
```

### Remove logical variables in the summary statistics

`stargazer` reports summary statistics for logical variables by default 
(0 = FALSE and 1 = TRUE). To supress the reporting of logical vectors change 
`summary.logical` to `FALSE`. Note the stats for our vector `hot` are gone. 

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(both, type = "html", summary.logical = FALSE)
```

### Flip the table axes

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(both, type = "html", flip = TRUE)
```

\
[Back to table of contents](#TOC)

## The default regression table

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html")
```

\
[Back to table of contents](#TOC)

## Change the style

`stargazer` includes several pre-formatted styles that imitate popular
academic journals. Use the `style` argument.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", style = "qje")
```

\
[Back to table of contents](#TOC)

## Labelling the table

### Add a title; change the variable labels 

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          title            = "These are awesome results!",
          covariate.labels = c("Temperature", "Wind speed", "Rain (inches)",
                               "2nd quarter", "3rd quarter", "Fourth quarter"),
          dep.var.caption  = "A better caption",
          dep.var.labels   = "Flight delay (in minutes)")
```

### Exclude the dependent variable label or the model numbers

Note the dependent variable caption stays. To additionally remove the caption 
add `dep.var.caption = ""`.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          dep.var.labels.include = FALSE,
          model.numbers          = FALSE)
```

### Change the column names

To change the column names just supply a character vector with the new labels, 
as shown below. 

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", column.labels = c("Good", "Better"))
```

### Apply a label to more than one column

The option `column.separate` allows for assigning a label to more than one 
column. In this example I told `stargazer` to report each regression twice, for a
total of four columns. Using `column.separate`, `stargazer` now applies the first
label to the first two columns and the second label to the next two columns.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output, output2, output2, type = "html", 
          column.labels   = c("Good", "Better"),
          column.separate = c(2, 2))
```

### Model names

When the results from different types of regression models (e.g., "OLS", 
"probit") are displayed in the same table `stargazer` adds a row indicating 
model type. Remove these labels by including `model.names = FALSE` (not shown).

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, output3, type = "html")
```

### Model names (again)

The example above shows the default behavior of `stargazer` is to display only 
one model name (and dependent variable caption) for adjacent columns with the 
same model type. To repeat these labels for all of the columns, do the 
following:

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, output3, type = "html",
          multicolumn = FALSE)
```

### Add a custom row to the reported statistics

I use this example to show how to add a row(s), such as reporting fixed effects.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html",
          add.lines = list(c("Fixed effects?", "No", "No"),
                           c("Results believable?", "Maybe", "Try again later")))
```

### Include R object names

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          object.names = TRUE)
```

\
[Back to table of contents](#TOC)

## Change the default output

### Report t-statistics or p-values instead of standard errors

Standard errors are reported by default. To report the t-statistics or
p-values instead, see the `report` argument. Notice that I've used this option 
to move the star characters to the t-statistics instead of being next to the
coefficients.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html",
          report = "vct*")
```

### Report confidence intervals

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html",
          ci = TRUE)
```

### Adjust the confidence intervals
 
By default `ci.level = 0.95`. You may also change the character that separates 
the intervals with the `ci.separator` argument.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html",
          ci = TRUE, ci.level = 0.90, ci.separator = " @@ ")
```

### Robust standard errors (replicating Stata's robust option)

If you want to use robust standard errors (or clustered), `stargazer` allows 
for replacing the default output by supplying a new vector of values to the 
option `se`. For this example I will display the same model twice and adjust the 
standard errors in the second column with the `HC1` correction from the `sandwich`
package (i.e. the same correction Stata uses). 

I also need to adjust the F statistic with the corrected variance-covariance 
matrix (matching Stata's results). Currently, this must be done manually 
(via `add.lines`) as `stargazer` does not (yet) have an option for directly 
replacing the F statistic.

Similar options exist to supply adjusted values to the coefficients, 
t-statistics, confidence intervals, and p-values. See `coef`, `t`, `ci.custom`, 
or `p`.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
library(sandwich)
library(lmtest)   # waldtest; see also coeftest.

# Adjust standard errors
cov1         <- vcovHC(output, type = "HC1")
robust_se    <- sqrt(diag(cov1))

# Adjust F statistic 
wald_results <- waldtest(output, vcov = cov1)

stargazer(output, output, type = "html",
          se        = list(NULL, robust_se),
          omit.stat = "f",
          add.lines = list(c("F Statistic (df = 3; 360)", "12.879***", "7.73***")))
```

### Move the intercept term to the top of the table

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html",
          intercept.bottom = FALSE)
```

### Compress the table output

We can condense the table output by placing all the the output on the same row.
When `single.row` is set to `TRUE`, the argument `no.space` is automatically 
set to `TRUE`.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html",
          single.row = TRUE)
```

\
[Back to table of contents](#TOC)

## Omit parts of the default output

In fixed effect model specifications it is often undesirable to report the 
fixed effect coefficients. To omit any coefficient, supply a regular 
expression to `omit`.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", omit = "quarter")
```

### Reporting omitted variables

Add the `omit.labels` parameter to report which variables have been omitted. 
Must be the same length as the number of regular expressions supplied to `omit`.
By default `omit.labels` reports "Yes" or "No". To change this supply 
a new vector of length 2 to `omit.yes.no = c("Yes", "No")`.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          omit        = "quarter",
          omit.labels = "Quarter dummies?")
```

### Omit summary statistics

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
## Remove r-square and f-statistic
stargazer(output, output2, type = "html", 
          omit.stat = c("rsq", "f"))
```

\
See also `keep.stat` a related argument with the opposite behavior.

### Omit whole parts of the table

If you just want to remove parts of the table it is easier to use 
`omit.table.layout` to explicitly specify table elements. See 
`table layout chracters` for a list of codes.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
# Remove statistics and notes sections completely
stargazer(output, output2, type = "html", 
          omit.table.layout = "sn")
```

### Omit whole parts of the table (a second way)

Another way to achieve the result above is through the argument `table.layout`. It 
also accepts a character string that tells `stargazer` which table elements to 
**include**.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
# Include everything except the statistics and notes sections
stargazer(output, output2, type = "html", 
          table.layout = "-ld#-t-")
```

### Omit the degrees of freedom

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          df = FALSE)
```

\
[Back to table of contents](#TOC)

## Statistical significance options

By default `stargazer` uses \*\*\*, \*\*, and \* to denote statistical 
significance at the one, five, and ten percent levels. This behavior can be 
changed by altering the `star.char` option. 

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          star.char = c("@", "@@", "@@@"))
```

### Change the cutoffs for significance

Notice that temperature, quarter3, and quarter4 have each lost a gold star 
because we made it tougher to earn them.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          star.cutoffs = c(0.05, 0.01, 0.001)) # star.cutoffs = NULL by default
```

\
[Back to table of contents](#TOC)

## Modifying table notes

### Make an addition to the existing note section

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          notes = "I make this look good!")
```

### Replace the note section

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          notes        = "Sometimes you just have to start over.", 
          notes.append = FALSE)
```

### Change note alignment

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          notes.align = "l")
```

### Change the note section label

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          notes.label = "New note label")
```

\
[Back to table of contents](#TOC)

## Table aesthetics

### Use html tags to modify table elements
```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}

# For LaTeX output you can also wrap table text in commands like \textbf{...}, 
# just remember to escape the first backslash, e.g., "A \\textbf{better} caption"

stargazer(output, output2, type = "html", 
          title = "These are <em> awesome </em> results!",  # Italics
          dep.var.caption  = "A <b> better </b> caption")   # Bold

```

### Change decimal character

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          decimal.mark = ",")
```

### Control the number of decimal places

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          digits = 1)
```

### Additional decimal controls

You may also specify the number of additional decimal places to be used if a 
number, when rounded to `digits` decimal places, is equal to zero (Use argument 
`digits.extra`).

My example models do not have any numbers in the thousands, so I won't show 
them, but `digit.separate` and `digits.separator` are also available for 
customizing the output of those characters.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          digits       = 1,
          digits.extra = 1)
```

### Drop leading zeros from decimals

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          initial.zero = FALSE)
```

### Change the order of the variables

The `order` argument will also accept a vector of regular expressions.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
stargazer(output, output2, type = "html", 
          order = c(4, 5, 6, 3, 2, 1))
```

### Select which variables to keep in the table

By default `keep = NULL` meaning all variables are included. `keep` accepts a
vector of regular expressions.

```{r, echo = TRUE, warning = FALSE, message = FALSE, results='asis'}
# Regex for keep "precip" but not "precipitation"
stargazer(output, output2, type = "html", 
          keep = c("\\bprecip\\b"))
```

\
[Back to table of contents](#TOC)

## Feedback

The .Rmd file for this cheatsheet [is on GitHub](https://github.com/JakeRuss/cheatsheets) 
and I welcome suggestions or pull requests.

\
[Back to table of contents](#TOC)