Monday, 16 March 2015

Publication-Quality Tables in Stata

Creating Publication-Quality Tables in Stata

Stata's tables are, in general, clear and informative. However, they are not in the format or of the aesthetic quality normally used in publications. Several Stata users have written programs that create publication-quality tables. This article will discuss esttab (think "estimates table") by Ben Jann. The esttab command takes the results of previous estimation or other commands, puts them in a publication-quality table, and then saves that table in a format you cause use directly in your paper such as RTF or LaTeX. Major topics for this article include creating tables of regression resultstables of summary statistics, and frequency tables.

The estout Package

The esttab command is just one member of a family of commands, or package, called estout. In fact, esttab is just a "wrapper" for a command called estout. The estout command gives you full control over the table to be created, but flexibility requires complexity and estout is fairly difficult to use. The esttab command runs estout for you and handles many of the details estout requires, allowing you to create the most common tables relatively easily. We will also discuss estpost, which puts results like summary statistics in a form esttab can work with. The ability to handle summary statistics and frequencies in addition to regression results is one of the reasons we elected to focus this article on esttab.

On the Workflow of Creating Tables

Keep in mind that you always have an alternative to using esttab: simply create the tables you want in Word or your favorite word processing program, copying and pasting the needed numbers from your Stata output. This is time-consuming and tedious. On the other hand, trying to figure out how to get esttab to give you the table you want can be time-consuming as well, and there's no guarantee it can make exactly the table you want. Be sure to consider the possibility that creating a particular table by hand may be quicker than using esttab. Much depends on how many tables you need to create, and how many numbers they contain. If you can get esttab to give you something close to what you want but are spending a lot of time trying to figure out how to get exactly what you want, consider just editing what you have.
Most people will find it's easier to first obtain a set of (hopefully) final results and then work on how to present them. We would not recommend running esttab until you are reasonably confident you've arrived at the results you want to publish.

Installing esttab

Since the estout package is not part of official Stata, you must install it before using it. It is available from the Statistical Software Components (SSC) archive and can be installed using the ssc install command in Stata:
ssc install estout
You only need to do this once—do not put this command in your research do files.
Check for updates periodically using adoupdate.

Basics

The esttab command needs some results to act on, so load the auto data set that comes with Stata and run a basic regression:
sysuse auto
reg mpg weight foreign
You can see the basic function of esttab simply by running it without any options at all:
esttab
----------------------------
                      (1)   
                      mpg   
----------------------------
weight           -0.00659***
                 (-10.34)   

foreign            -1.650   
                  (-1.53)   

_cons               41.68***
                  (19.25)   
----------------------------
N                      74   
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
This puts the model results in a table within Stata's Results window. Viewing it in the Results window is useful for testing a table specification, but when you've got what you want you'll have esttab save it in the file format you're using for your paper. The default table contains many of the features you expect from a table of regression results in a journal article, including rounded coefficients and stars for significance. Note, however, that the numbers in parentheses are the t-statistics. Use the se option if you want to replace them with standard errors:
esttab, se
----------------------------
                      (1)   
                      mpg   
----------------------------
weight           -0.00659***
               (0.000637)   

foreign            -1.650   
                  (1.076)   

_cons               41.68***
                  (2.166)   
----------------------------
N                      74   
----------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
The  esttab command uses the current contents of the e() vector (information about the last estimation command), not the results the last regression displayed. If you run a logit command with the or option Stata will display odds ratios:
logit foreign mpg, or
Logistic regression                               Number of obs   =         74
                                                  LR chi2(1)      =      11.49
                                                  Prob > chi2     =     0.0007
Log likelihood =  -39.28864                       Pseudo R2       =     0.1276

------------------------------------------------------------------------------
     foreign | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |   1.173232   .0616975     3.04   0.002      1.05833    1.300608
       _cons |   .0125396   .0151891    -3.62   0.000     .0011674    .1346911
------------------------------------------------------------------------------
However, e(b) still contains the coefficients, and by default that is what esttab will display. It also labels the test statistics as t statistics rather than z statistics like the logit output does:
esttab
----------------------------
                      (1)   
                  foreign   
----------------------------
foreign                     
mpg                 0.160** 
                   (3.04)   

_cons              -4.379***
                  (-3.62)   
----------------------------
N                      74   
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
If you want odds ratios in your table, give esttab the eform (exponentiated form) option. If you want the table to say "z statistics in parentheses" rather than t use the zoption (note that the z option does not change the numbers in any way):
esttab, eform z
----------------------------
                      (1)   
                  foreign   
----------------------------
foreign                     
mpg                 1.173** 
                   (3.04)   
----------------------------
N                      74   
----------------------------
Exponentiated coefficients; z statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Specifying the eform option prompts esttab to drop the constant term from the table, because it doesn't make much sense to talk about the odds ratio of the constant. However, you can override this behavior by specifying the constant option.

Saving the Table in the Format of Your Paper

To save a table as an RTF (Rich Text Format) file, add using filename.rtf to the command, right before the comma for options. Also add the replace option so it can overwrite previous versions of the file.
esttab using logit.rtf, replace eform z
Rich Text Format includes formatting information as well as the text itself, and can be opened directly by Word and other word processors. Click here to see what the RTF file looks like.
The process of saving the table as a LaTeX file is identical: just replace .rtf with .tex. There are some special options that apply to LaTeX, such as fragment to create a table fragment that can be added to an existing table. HTML (.html) is another useful format option, and there are many others.
You can save the table as a comma separated variables (CSV) file that can easily read into Excel by setting the file extension to .csv. However, consider carefully whether what you contemplate doing in Excel can't be done better (and especially more reproducibly) within Stata.

Tables with Multiple Models

To create a table containing the estimates from multiple models, the first step is to run each model and store their estimates for future use. You can store the estimates either with the official Stata command estimates store, usually abbreviated est sto, or with the variant eststo included in the estout package. The eststo variant adds a few features, but we won't use any of them in this article so it doesn't matter which command you use. The basic syntax is identical: the command, then the name you want to assign to that set of estimates. Use this to build a set of nested models:
reg mpg foreign
est sto m1
reg mpg foreign weight
est sto m2
reg mpg foreign weight displacement gear_ratio
est sto m3
To have esttab create a table based on a single set of stored estimates, simply specify the name of the estimates you want it to use:
esttab m1
But you are not limited to one set:
esttab m1 m2 m3
------------------------------------------------------------
                      (1)             (2)             (3)   
                      mpg             mpg             mpg   
------------------------------------------------------------
foreign             4.946***       -1.650          -2.246   
                   (3.63)         (-1.53)         (-1.81)   

weight                           -0.00659***     -0.00675***
                                 (-10.34)         (-5.80)   

displacement                                      0.00825   
                                                   (0.72)   

gear_ratio                                          2.058   
                                                   (1.17)   

_cons               19.83***        41.68***        34.52***
                  (26.70)         (19.25)          (5.17)   
------------------------------------------------------------
N                      74              74              74   
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Summary (Model-Level) Statistics

The N (number of observations) for each model is shown by default, but you can add other model-level statistics. Options include R-squared (r2), AIC (aic), and BIC (bic). Any other scalar in the e() vector can also be added using the scalar() option. For example, you could add the model's F statistic, stored as e(F), with the option scalar(F). You cannot control the order in which they are listed, but you can move N to the end with obslast. You can remove N entirely with noobs.
esttab m1 m2 m3, se aic obslast scalar(F) bic r2
------------------------------------------------------------
                      (1)             (2)             (3)   
                      mpg             mpg             mpg   
------------------------------------------------------------
foreign             4.946***       -1.650          -2.246   
                  (1.362)         (1.076)         (1.240)   

weight                           -0.00659***     -0.00675***
                               (0.000637)       (0.00116)   

displacement                                      0.00825   
                                                 (0.0114)   

gear_ratio                                          2.058   
                                                  (1.755)   

_cons               19.83***        41.68***        34.52***
                  (0.743)         (2.166)         (6.675)   
------------------------------------------------------------
R-sq                0.155           0.663           0.669   
AIC                 460.3           394.4           396.9   
BIC                 465.0           401.3           408.4   
F                   13.18           69.75           34.94   
N                      74              74              74   
------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Cell (Variable-Level) Statistics

In addition to t statistics, z statistics, and standard errors, esttab can put p-values and confidence intervals in the parentheses with the p and ci options. You can have no secondary quantity in parentheses at all with the not (no t) option.
You can replace the main numbers as well. The beta option replaces them with standardized beta coefficients. The main() option lets you replace them with any other quantity from the e() vector.
If you prefer to have the statistic in parentheses on the same row as the coefficient, use the wide option.
esttab m1 m2 m3, wide ci noobs
---------------------------------------------------------------------------------------------------------------------------------
                      (1)                                    (2)                                    (3)                          
                      mpg                                    mpg                                    mpg                          
---------------------------------------------------------------------------------------------------------------------------------
foreign             4.946***          [2.230,7.661]       -1.650            [-3.796,0.495]       -2.246            [-4.719,0.227]
weight                                                  -0.00659***    [-0.00786,-0.00532]     -0.00675***    [-0.00907,-0.00443]
displacement                                                                                    0.00825          [-0.0145,0.0310]
gear_ratio                                                                                        2.058            [-1.444,5.559]
_cons               19.83***          [18.35,21.31]        41.68***          [37.36,46.00]        34.52***          [21.21,47.84]
---------------------------------------------------------------------------------------------------------------------------------
95% confidence intervals in brackets
* p<0.05, ** p<0.01, *** p<0.001

Titles, Notes, and Labels

You can give the table an overall title with the title() option. Type the desired title in the parentheses.
If you want to remove the note at the bottom that explains the numbers in parentheses and the meaning of the stars, use the nonotes option. If you want to add notes, use the addnotes() option with the desired notes in the parentheses. If you want multiple lines of notes, put each line in quotes.
By default each model in a table is labeled with a number and a title. If you don't want the number to appear, use the nonumber option. The model title defaults to the the name of the model's dependent variable, but you can change model titles with  mtitle(). Each title goes in quotes inside the parentheses, and the order must match the order in which the stored estimates are listed in the main command.
The label option tells esttab to use the variable labels rather than the variable names. That means you can control exactly how a variable is listed by changing its label—just make sure the label provides an adequate description of the variable but is not too long. The labels below illustrate some of the potential problems.
esttab m1 m2 m3, label nonumber title("Models of MPG") 
mtitle("Model 1" "Model 2" "Model 3") 
Models of MPG
--------------------------------------------------------------------
                          Model 1         Model 2         Model 3   
--------------------------------------------------------------------
Car type                    4.946***       -1.650          -2.246   
                          (1.362)         (1.076)         (1.240)   

Weight (lbs.)                            -0.00659***     -0.00675***
                                       (0.000637)       (0.00116)   

Displacement .. in.)                                      0.00825   
                                                         (0.0114)   

Gear Ratio                                                  2.058   
                                                          (1.755)   

Constant                    19.83***        41.68***        34.52***
                          (0.743)         (2.166)         (6.675)   
--------------------------------------------------------------------
Observations                   74              74              74   
--------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
If you don't want to change the actual variable labels, you can override them with the coeflabel() option. Put the variable name/label pairs you want to use inside the parentheses. Any variable for which you do not specify a label will be listed with its actual name.
esttab m1 m2 m3, coeflabel(foreign "Foreign Car" 
displacement "Displacement" gear_ratio "Gear Ratio" _cons 
"Constant")
------------------------------------------------------------
                      (1)             (2)             (3)   
                      mpg             mpg             mpg   
------------------------------------------------------------
Foreign Car         4.946***       -1.650          -2.246   
                   (3.63)         (-1.53)         (-1.81)   

weight                           -0.00659***     -0.00675***
                                 (-10.34)         (-5.80)   

Displacement                                      0.00825   
                                                   (0.72)   

Gear Ratio                                          2.058   
                                                   (1.17)   

Constant            19.83***        41.68***        34.52***
                  (26.70)         (19.25)          (5.17)   
------------------------------------------------------------
N                      74              74              74   
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Formats

In general you can change the format of a number by placing the desired format in parentheses following the option that prompts that number to be displayed. Use b() to format the betas and t() to format t statistics.
esttab m1 m2 m3, b(%9.1f) t(%9.1f) r2(%9.6f)
------------------------------------------------------------
                      (1)             (2)             (3)   
                      mpg             mpg             mpg   
------------------------------------------------------------
foreign               4.9***         -1.7            -2.2   
                    (3.6)          (-1.5)          (-1.8)   

weight                             -.0066***       -.0068***
                                    (-10)          (-5.8)   

displacement                                        .0082   
                                                    (.72)   

gear_ratio                                            2.1   
                                                    (1.2)   

_cons                  20***           42***           35***
                     (27)            (19)           (5.2)   
------------------------------------------------------------
N                      74              74              74   
R-sq              .154762         .662703         .669463   
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Stars and Significance

The star() option lets you control when stars are used. Inside the parentheses you'll put a list of characters paired with the numeric threshold beneath which they will be applied to a coefficient. The default is equivalent to:
star(* 0.05 ** 0.01 *** 0.001)
Note that star() pays attention to both the numbers and how you format them: if you don't include the leading zeros they will not appear in the table.
esttab m1 m2 m3, p star(+ 0.1 * 0.05 ** 0.01)
---------------------------------------------------------
                      (1)            (2)            (3)  
                      mpg            mpg            mpg  
---------------------------------------------------------
foreign             4.946**       -1.650         -2.246+ 
                  (0.001)        (0.130)        (0.074)  

weight                          -0.00659**     -0.00675**
                                 (0.000)        (0.000)  

displacement                                    0.00825  
                                                (0.472)  

gear_ratio                                        2.058  
                                                (0.245)  

_cons               19.83**        41.68**        34.52**
                  (0.000)        (0.000)        (0.000)  
---------------------------------------------------------
N                      74             74             74  
---------------------------------------------------------
p-values in parentheses
+ p<0.1, * p<0.05, ** p<0.01

Tables of Summary Statistics

The esttab command is designed to draw information from the e() vector, which is only used by estimation commands. However,  estpost will take the results from the r() vector used by other commands and post them in the e() vector. This allows esttab to create tables based on those results, but you'll generally have to give more guidance about what that table should contain.
To store the results of a command in e(), put the estpost command before it:
estpost sum price foreign mpg
The resulting table is designed to tell you the official name of each quantity. You will use those names in subsequent esttab commands.
             |  e(count)   e(sum_w)    e(mean)     e(Var)      e(sd)     e(min)     e(max)     e(sum) 
-------------+----------------------------------------------------------------------------------------
       price |        74         74   6165.257    8699526   2949.496       3291      15906     456229 
     foreign |        74         74   .2972973   .2117734   .4601885          0          1         22 
         mpg |        74         74    21.2973   33.47205   5.785503         12         41       1576 
When working with regression results, esttab knows that e(b) is the primary quantity of interest and builds the table accordingly. With summary statistics, you need to tell esttab what the table should contain using the cell() option. This is technically an option for estout rather than esttab, but esttab will pass it along to estout while still doing some of the other work for you. However, if you want to read the full documentation for the cell() option you need to type help estout rather than help esttab.
If you want a table of just means, use cell(mean):
esttab, cell(mean)
-------------------------
                      (1)
                         
                     mean
-------------------------
price            6165.257
foreign          .2972973
mpg               21.2973
-------------------------
N                      74
-------------------------
You can list multiple quantities:
esttab, cell(mean sd)
-------------------------
                      (1)
                         
                  mean/sd
-------------------------
price            6165.257
                 2949.496
foreign          .2972973
                 .4601885
mpg               21.2973
                 5.785503
-------------------------
N                      74
-------------------------
If you want quantities to appear on a single row, you can group them with either quotes or parentheses. The following commands are equivalent:
esttab, cell("mean sd")
esttab, cell((mean sd))
--------------------------------------
                      (1)             
                                      
                     mean           sd
--------------------------------------
price            6165.257     2949.496
foreign          .2972973     .4601885
mpg               21.2973     5.785503
--------------------------------------
N                      74             
--------------------------------------
Note how in this case quotes do not indicate strings!
Model numbers and model titles make little sense for this table (especially since the title is empty at this point), so consider removing them with nonumber and nomtitle:
esttab, cell((mean sd)) nonumber nomtitle 
--------------------------------------
                     mean           sd
--------------------------------------
price            6165.257     2949.496
foreign          .2972973     .4601885
mpg               21.2973     5.785503
--------------------------------------
N                      74             
--------------------------------------
We've discussed putting formats in parentheses after a quantity to control the numeric format of that quantity, but there are many other options. A useful addition to this table is par for parentheses:
esttab, cell((mean sd(par))) nonumber nomtitle 
--------------------------------------
                     mean           sd
--------------------------------------
price            6165.257   (2949.496)
foreign          .2972973   (.4601885)
mpg               21.2973   (5.785503)
--------------------------------------
N                      74             
--------------------------------------
The column heading labels also leave somewhat to be desired. You can override them with a label() option associated with each quantity in cell(). This is different from the general label option, which tells esttab to replace the variable names at the beginning of each row with the variable labels. You are welcome to use both (or use coeflabel()to set the row labels yourself):
esttab, cell((mean(label(Mean)) sd(par label
(Standard Deviation)))) label nonumber nomtitle 
----------------------------------------------
                             Mean Standard D~n
----------------------------------------------
Price                    6165.257   (2949.496)
Car type                 .2972973   (.4601885)
Mileage (mpg)             21.2973   (5.785503)
----------------------------------------------
Observations                   74             
----------------------------------------------
The problem now is that "Standard Deviation" had to be truncated because its column is not wide enough. You can set the width of the columns with the modelwidth() option (recall that when dealing with regression results each column is a model). If you put a single number in the parentheses the width in characters of all the columns will be set to that number. If you give a list of numbers, they will be applied to the columns in order:
esttab, modelwidth(10 20) cell((mean(label(Mean)) sd(par label
(Standard Deviation)))) label nomtitle nonumber 
----------------------------------------------------
                           Mean   Standard Deviation
----------------------------------------------------
Price                  6165.257           (2949.496)
Car type               .2972973           (.4601885)
Mileage (mpg)           21.2973           (5.785503)
----------------------------------------------------
Observations                 74                     
----------------------------------------------------
Admittedly this will never be publication-quality when rendered as plain text. But consider this RTF version, created by:
esttab using means.rtf, modelwidth(10 20) cell((mean(label(Mean)) sd(par label(Standard Deviation)))) label nomtitle nonumber replace

Frequency Tables

Creating frequency tables also relies on using estpost to put the results in the e() vector:
estpost tab rep78 foreign
foreign      |                                            
       rep78 |      e(b)     e(pct)  e(colpct)  e(rowpct) 
-------------+--------------------------------------------
Domestic     |                                            
           1 |         2   2.898551   4.166667        100 
           2 |         8    11.5942   16.66667        100 
           3 |        27   39.13043      56.25         90 
           4 |         9   13.04348      18.75         50 
           5 |         2   2.898551   4.166667   18.18182 
       Total |        48   69.56522        100   69.56522 
-------------+--------------------------------------------
Foreign      |                                            
           1 |         0          0          0          0 
           2 |         0          0          0          0 
           3 |         3   4.347826   14.28571         10 
           4 |         9   13.04348   42.85714         50 
           5 |         9   13.04348   42.85714   81.81818 
       Total |        21   30.43478        100   30.43478 
-------------+--------------------------------------------
Total        |                                            
           1 |         2   2.898551   2.898551        100 
           2 |         8    11.5942    11.5942        100 
           3 |        30   43.47826   43.47826        100 
           4 |        18   26.08696   26.08696        100 
           5 |        11   15.94203   15.94203        100 
       Total |        69        100        100        100 
These are the same numbers you'd get from tab alone, just organized differently. Note that the frequencies themselves are called e(b), but we'll still use cell() because otherwise esttab will treat them like regression coefficients:
esttab, cell(b)
-------------------------
                      (1)
                         
                        b
-------------------------
Domestic                 
1                       2
2                       8
3                      27
4                       9
5                       2
Total                  48
-------------------------
Foreign                  
1                       0
2                       0
3                       3
4                       9
5                       9
Total                  21
-------------------------
Total                    
1                       2
2                       8
3                      30
4                      18
5                      11
Total                  69
-------------------------
N                      69
-------------------------
The model number, empty model title, and column label (b) are all useless here, so remove the number and title and change the label with collabels(). You could also remove the column label entirely with collabels(none).
esttab, cell(b) nonumber nomtitle collabels(Frequency)
-------------------------
                Frequency
-------------------------
Domestic                 
1                       2
2                       8
3                      27
4                       9
5                       2
Total                  48
-------------------------
Foreign                  
1                       0
2                       0
3                       3
4                       9
5                       9
Total                  21
-------------------------
Total                    
1                       2
2                       8
3                      30
4                      18
5                      11
Total                  69
-------------------------
N                      69
-------------------------
The unstack option converts the three sections into columns:
esttab, cell(b) unstack nonumber nomtitle collabels(none)
---------------------------------------------------
                 Domestic      Foreign        Total
---------------------------------------------------
1                       2            0            2
2                       8            0            8
3                      27            3           30
4                       9            9           18
5                       2            9           11
Total                  48           21           69
---------------------------------------------------
N                      69                          
---------------------------------------------------
To control the label for the row variable use eqlabels(), but esttab thinks of it as being the left-hand-side of an equation (remember esttab was built for models). Thus you have to use the lhs() suboption within eqlabels(). You can adjust the amount of space available to the label with varwidth():
esttab, cell(b) eqlabels(, lhs("Repair Record")) varwidth(15) unstack nonumber 
nomtitle collabels(none) 
------------------------------------------------------
Repair Record       Domestic      Foreign        Total
------------------------------------------------------
1                          2            0            2
2                          8            0            8
3                         27            3           30
4                          9            9           18
5                          2            9           11
Total                     48           21           69
------------------------------------------------------
N                         69                          
------------------------------------------------------
You can add additional quantities to cell() and control their appearance and structure using all the tools we discussed in the section on summary statistics. Consider adding a note to explain what each number represents with the note() option:
esttab, cell(b rowpct(fmt(%5.1f) par)) note(Row Percentages in Parentheses) 
unstack nonumber nomtitle collabels(none) eqlabels(, lhs("Repair Record")) 
varwidth(15) 
------------------------------------------------------
Repair Record       Domestic      Foreign        Total
------------------------------------------------------
1                          2            0            2
                     (100.0)        (0.0)      (100.0)
2                          8            0            8
                     (100.0)        (0.0)      (100.0)
3                         27            3           30
                      (90.0)       (10.0)      (100.0)
4                          9            9           18
                      (50.0)       (50.0)      (100.0)
5                          2            9           11
                      (18.2)       (81.8)      (100.0)
Total                     48           21           69
                      (69.6)       (30.4)      (100.0)
------------------------------------------------------
N                         69                          
------------------------------------------------------
Row Percentages in Parentheses
This is just a fraction of what esttab (let alone estout) can do. To learn more, we suggest reading the Stata Journal article that introduced it. For syntax details, type help esttab and/or help estout.
Source: HERE