Creating Publication-Quality Tables in Stata
Stata's tables are, in general, clear and informative. However, they are not in the format or of the aesthetic quality normally used in publications. Several Stata users have written programs that create publication-quality tables. This article will discuss esttab (think "estimates table") by Ben Jann. The esttab command takes the results of previous estimation or other commands, puts them in a publication-quality table, and then saves that table in a format you cause use directly in your paper such as RTF or LaTeX. Major topics for this article include creating tables of regression results, tables of summary statistics, and frequency tables.
The estout Package
The esttab command is just one member of a family of commands, or package, called estout. In fact, esttab is just a "wrapper" for a command called estout. The estout command gives you full control over the table to be created, but flexibility requires complexity and estout is fairly difficult to use. The esttab command runs estout for you and handles many of the details estout requires, allowing you to create the most common tables relatively easily. We will also discuss estpost, which puts results like summary statistics in a form esttab can work with. The ability to handle summary statistics and frequencies in addition to regression results is one of the reasons we elected to focus this article on esttab.
On the Workflow of Creating Tables
Keep in mind that you always have an alternative to using esttab: simply create the tables you want in Word or your favorite word processing program, copying and pasting the needed numbers from your Stata output. This is time-consuming and tedious. On the other hand, trying to figure out how to get esttab to give you the table you want can be time-consuming as well, and there's no guarantee it can make exactly the table you want. Be sure to consider the possibility that creating a particular table by hand may be quicker than using esttab. Much depends on how many tables you need to create, and how many numbers they contain. If you can get esttab to give you something close to what you want but are spending a lot of time trying to figure out how to get exactly what you want, consider just editing what you have.
Most people will find it's easier to first obtain a set of (hopefully) final results and then work on how to present them. We would not recommend running esttab until you are reasonably confident you've arrived at the results you want to publish.
Installing esttab
Since the estout package is not part of official Stata, you must install it before using it. It is available from the Statistical Software Components (SSC) archive and can be installed using the ssc install command in Stata:
ssc install estout
You only need to do this once—do not put this command in your research do files.
Check for updates periodically using adoupdate.
Basics
The esttab command needs some results to act on, so load the auto data set that comes with Stata and run a basic regression:
sysuse auto
reg mpg weight foreign
reg mpg weight foreign
You can see the basic function of esttab simply by running it without any options at all:
esttab
----------------------------
(1)
mpg
----------------------------
weight -0.00659***
(-10.34)
foreign -1.650
(-1.53)
_cons 41.68***
(19.25)
----------------------------
N 74
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
This puts the model results in a table within Stata's Results window. Viewing it in the Results window is useful for testing a table specification, but when you've got what you want you'll have esttab save it in the file format you're using for your paper. The default table contains many of the features you expect from a table of regression results in a journal article, including rounded coefficients and stars for significance. Note, however, that the numbers in parentheses are the t-statistics. Use the se option if you want to replace them with standard errors:
esttab, se
----------------------------
(1)
mpg
----------------------------
weight -0.00659***
(0.000637)
foreign -1.650
(1.076)
_cons 41.68***
(2.166)
----------------------------
N 74
----------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
The esttab command uses the current contents of the e() vector (information about the last estimation command), not the results the last regression displayed. If you run a logit command with the or option Stata will display odds ratios:
logit foreign mpg, or
Logistic regression Number of obs = 74 LR chi2(1) = 11.49 Prob > chi2 = 0.0007 Log likelihood = -39.28864 Pseudo R2 = 0.1276 ------------------------------------------------------------------------------ foreign | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | 1.173232 .0616975 3.04 0.002 1.05833 1.300608 _cons | .0125396 .0151891 -3.62 0.000 .0011674 .1346911 ------------------------------------------------------------------------------
However, e(b) still contains the coefficients, and by default that is what esttab will display. It also labels the test statistics as t statistics rather than z statistics like the logit output does:
esttab
----------------------------
(1)
foreign
----------------------------
foreign
mpg 0.160**
(3.04)
_cons -4.379***
(-3.62)
----------------------------
N 74
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
If you want odds ratios in your table, give esttab the eform (exponentiated form) option. If you want the table to say "z statistics in parentheses" rather than t use the zoption (note that the z option does not change the numbers in any way):
esttab, eform z
----------------------------
(1)
foreign
----------------------------
foreign
mpg 1.173**
(3.04)
----------------------------
N 74
----------------------------
Exponentiated coefficients; z statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Specifying the eform option prompts esttab to drop the constant term from the table, because it doesn't make much sense to talk about the odds ratio of the constant. However, you can override this behavior by specifying the constant option.
Saving the Table in the Format of Your Paper
To save a table as an RTF (Rich Text Format) file, add using filename.rtf to the command, right before the comma for options. Also add the replace option so it can overwrite previous versions of the file.
esttab using logit.rtf, replace eform z
Rich Text Format includes formatting information as well as the text itself, and can be opened directly by Word and other word processors. Click here to see what the RTF file looks like.
The process of saving the table as a LaTeX file is identical: just replace .rtf with .tex. There are some special options that apply to LaTeX, such as fragment to create a table fragment that can be added to an existing table. HTML (.html) is another useful format option, and there are many others.
You can save the table as a comma separated variables (CSV) file that can easily read into Excel by setting the file extension to .csv. However, consider carefully whether what you contemplate doing in Excel can't be done better (and especially more reproducibly) within Stata.
Tables with Multiple Models
To create a table containing the estimates from multiple models, the first step is to run each model and store their estimates for future use. You can store the estimates either with the official Stata command estimates store, usually abbreviated est sto, or with the variant eststo included in the estout package. The eststo variant adds a few features, but we won't use any of them in this article so it doesn't matter which command you use. The basic syntax is identical: the command, then the name you want to assign to that set of estimates. Use this to build a set of nested models:
reg mpg foreign
est sto m1
reg mpg foreign weight
est sto m2
reg mpg foreign weight displacement gear_ratio
est sto m3
est sto m1
reg mpg foreign weight
est sto m2
reg mpg foreign weight displacement gear_ratio
est sto m3
To have esttab create a table based on a single set of stored estimates, simply specify the name of the estimates you want it to use:
esttab m1
But you are not limited to one set:
esttab m1 m2 m3
------------------------------------------------------------
(1) (2) (3)
mpg mpg mpg
------------------------------------------------------------
foreign 4.946*** -1.650 -2.246
(3.63) (-1.53) (-1.81)
weight -0.00659*** -0.00675***
(-10.34) (-5.80)
displacement 0.00825
(0.72)
gear_ratio 2.058
(1.17)
_cons 19.83*** 41.68*** 34.52***
(26.70) (19.25) (5.17)
------------------------------------------------------------
N 74 74 74
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Summary (Model-Level) Statistics
The N (number of observations) for each model is shown by default, but you can add other model-level statistics. Options include R-squared (r2), AIC (aic), and BIC (bic). Any other scalar in the e() vector can also be added using the scalar() option. For example, you could add the model's F statistic, stored as e(F), with the option scalar(F). You cannot control the order in which they are listed, but you can move N to the end with obslast. You can remove N entirely with noobs.
esttab m1 m2 m3, se aic obslast scalar(F) bic r2
------------------------------------------------------------
(1) (2) (3)
mpg mpg mpg
------------------------------------------------------------
foreign 4.946*** -1.650 -2.246
(1.362) (1.076) (1.240)
weight -0.00659*** -0.00675***
(0.000637) (0.00116)
displacement 0.00825
(0.0114)
gear_ratio 2.058
(1.755)
_cons 19.83*** 41.68*** 34.52***
(0.743) (2.166) (6.675)
------------------------------------------------------------
R-sq 0.155 0.663 0.669
AIC 460.3 394.4 396.9
BIC 465.0 401.3 408.4
F 13.18 69.75 34.94
N 74 74 74
------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Cell (Variable-Level) Statistics
In addition to t statistics, z statistics, and standard errors, esttab can put p-values and confidence intervals in the parentheses with the p and ci options. You can have no secondary quantity in parentheses at all with the not (no t) option.
You can replace the main numbers as well. The beta option replaces them with standardized beta coefficients. The main() option lets you replace them with any other quantity from the e() vector.
If you prefer to have the statistic in parentheses on the same row as the coefficient, use the wide option.
esttab m1 m2 m3, wide ci noobs
---------------------------------------------------------------------------------------------------------------------------------
(1) (2) (3)
mpg mpg mpg
---------------------------------------------------------------------------------------------------------------------------------
foreign 4.946*** [2.230,7.661] -1.650 [-3.796,0.495] -2.246 [-4.719,0.227]
weight -0.00659*** [-0.00786,-0.00532] -0.00675*** [-0.00907,-0.00443]
displacement 0.00825 [-0.0145,0.0310]
gear_ratio 2.058 [-1.444,5.559]
_cons 19.83*** [18.35,21.31] 41.68*** [37.36,46.00] 34.52*** [21.21,47.84]
---------------------------------------------------------------------------------------------------------------------------------
95% confidence intervals in brackets
* p<0.05, ** p<0.01, *** p<0.001
Titles, Notes, and Labels
You can give the table an overall title with the title() option. Type the desired title in the parentheses.
If you want to remove the note at the bottom that explains the numbers in parentheses and the meaning of the stars, use the nonotes option. If you want to add notes, use the addnotes() option with the desired notes in the parentheses. If you want multiple lines of notes, put each line in quotes.
By default each model in a table is labeled with a number and a title. If you don't want the number to appear, use the nonumber option. The model title defaults to the the name of the model's dependent variable, but you can change model titles with mtitle(). Each title goes in quotes inside the parentheses, and the order must match the order in which the stored estimates are listed in the main command.
The label option tells esttab to use the variable labels rather than the variable names. That means you can control exactly how a variable is listed by changing its label—just make sure the label provides an adequate description of the variable but is not too long. The labels below illustrate some of the potential problems.
esttab m1 m2 m3, label nonumber title("Models of MPG")
mtitle("Model 1" "Model 2" "Model 3")
mtitle("Model 1" "Model 2" "Model 3")
Models of MPG -------------------------------------------------------------------- Model 1 Model 2 Model 3 -------------------------------------------------------------------- Car type 4.946*** -1.650 -2.246 (1.362) (1.076) (1.240) Weight (lbs.) -0.00659*** -0.00675*** (0.000637) (0.00116) Displacement .. in.) 0.00825 (0.0114) Gear Ratio 2.058 (1.755) Constant 19.83*** 41.68*** 34.52*** (0.743) (2.166) (6.675) -------------------------------------------------------------------- Observations 74 74 74 -------------------------------------------------------------------- Standard errors in parentheses * p<0.05, ** p<0.01, *** p<0.001
If you don't want to change the actual variable labels, you can override them with the coeflabel() option. Put the variable name/label pairs you want to use inside the parentheses. Any variable for which you do not specify a label will be listed with its actual name.
esttab m1 m2 m3, coeflabel(foreign "Foreign Car"
displacement "Displacement" gear_ratio "Gear Ratio" _cons
"Constant")
displacement "Displacement" gear_ratio "Gear Ratio" _cons
"Constant")
------------------------------------------------------------
(1) (2) (3)
mpg mpg mpg
------------------------------------------------------------
Foreign Car 4.946*** -1.650 -2.246
(3.63) (-1.53) (-1.81)
weight -0.00659*** -0.00675***
(-10.34) (-5.80)
Displacement 0.00825
(0.72)
Gear Ratio 2.058
(1.17)
Constant 19.83*** 41.68*** 34.52***
(26.70) (19.25) (5.17)
------------------------------------------------------------
N 74 74 74
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Formats
In general you can change the format of a number by placing the desired format in parentheses following the option that prompts that number to be displayed. Use b() to format the betas and t() to format t statistics.
esttab m1 m2 m3, b(%9.1f) t(%9.1f) r2(%9.6f)
------------------------------------------------------------
(1) (2) (3)
mpg mpg mpg
------------------------------------------------------------
foreign 4.9*** -1.7 -2.2
(3.6) (-1.5) (-1.8)
weight -.0066*** -.0068***
(-10) (-5.8)
displacement .0082
(.72)
gear_ratio 2.1
(1.2)
_cons 20*** 42*** 35***
(27) (19) (5.2)
------------------------------------------------------------
N 74 74 74
R-sq .154762 .662703 .669463
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Stars and Significance
The star() option lets you control when stars are used. Inside the parentheses you'll put a list of characters paired with the numeric threshold beneath which they will be applied to a coefficient. The default is equivalent to:
star(* 0.05 ** 0.01 *** 0.001)
Note that star() pays attention to both the numbers and how you format them: if you don't include the leading zeros they will not appear in the table.
esttab m1 m2 m3, p star(+ 0.1 * 0.05 ** 0.01)
---------------------------------------------------------
(1) (2) (3)
mpg mpg mpg
---------------------------------------------------------
foreign 4.946** -1.650 -2.246+
(0.001) (0.130) (0.074)
weight -0.00659** -0.00675**
(0.000) (0.000)
displacement 0.00825
(0.472)
gear_ratio 2.058
(0.245)
_cons 19.83** 41.68** 34.52**
(0.000) (0.000) (0.000)
---------------------------------------------------------
N 74 74 74
---------------------------------------------------------
p-values in parentheses
+ p<0.1, * p<0.05, ** p<0.01
Tables of Summary Statistics
The esttab command is designed to draw information from the e() vector, which is only used by estimation commands. However, estpost will take the results from the r() vector used by other commands and post them in the e() vector. This allows esttab to create tables based on those results, but you'll generally have to give more guidance about what that table should contain.
To store the results of a command in e(), put the estpost command before it:
estpost sum price foreign mpg
The resulting table is designed to tell you the official name of each quantity. You will use those names in subsequent esttab commands.
| e(count) e(sum_w) e(mean) e(Var) e(sd) e(min) e(max) e(sum)
-------------+----------------------------------------------------------------------------------------
price | 74 74 6165.257 8699526 2949.496 3291 15906 456229
foreign | 74 74 .2972973 .2117734 .4601885 0 1 22
mpg | 74 74 21.2973 33.47205 5.785503 12 41 1576
When working with regression results, esttab knows that e(b) is the primary quantity of interest and builds the table accordingly. With summary statistics, you need to tell esttab what the table should contain using the cell() option. This is technically an option for estout rather than esttab, but esttab will pass it along to estout while still doing some of the other work for you. However, if you want to read the full documentation for the cell() option you need to type help estout rather than help esttab.
If you want a table of just means, use cell(mean):
esttab, cell(mean)
------------------------- (1) mean ------------------------- price 6165.257 foreign .2972973 mpg 21.2973 ------------------------- N 74 -------------------------
You can list multiple quantities:
esttab, cell(mean sd)
-------------------------
(1)
mean/sd
-------------------------
price 6165.257
2949.496
foreign .2972973
.4601885
mpg 21.2973
5.785503
-------------------------
N 74
-------------------------
If you want quantities to appear on a single row, you can group them with either quotes or parentheses. The following commands are equivalent:
esttab, cell("mean sd")
esttab, cell((mean sd))
esttab, cell((mean sd))
--------------------------------------
(1)
mean sd
--------------------------------------
price 6165.257 2949.496
foreign .2972973 .4601885
mpg 21.2973 5.785503
--------------------------------------
N 74
--------------------------------------
Note how in this case quotes do not indicate strings!
Model numbers and model titles make little sense for this table (especially since the title is empty at this point), so consider removing them with nonumber and nomtitle:
esttab, cell((mean sd)) nonumber nomtitle
--------------------------------------
mean sd
--------------------------------------
price 6165.257 2949.496
foreign .2972973 .4601885
mpg 21.2973 5.785503
--------------------------------------
N 74
--------------------------------------
We've discussed putting formats in parentheses after a quantity to control the numeric format of that quantity, but there are many other options. A useful addition to this table is par for parentheses:
esttab, cell((mean sd(par))) nonumber nomtitle
--------------------------------------
mean sd
--------------------------------------
price 6165.257 (2949.496)
foreign .2972973 (.4601885)
mpg 21.2973 (5.785503)
--------------------------------------
N 74
--------------------------------------
The column heading labels also leave somewhat to be desired. You can override them with a label() option associated with each quantity in cell(). This is different from the general label option, which tells esttab to replace the variable names at the beginning of each row with the variable labels. You are welcome to use both (or use coeflabel()to set the row labels yourself):
esttab, cell((mean(label(Mean)) sd(par label
(Standard Deviation)))) label nonumber nomtitle
(Standard Deviation)))) label nonumber nomtitle
----------------------------------------------
Mean Standard D~n
----------------------------------------------
Price 6165.257 (2949.496)
Car type .2972973 (.4601885)
Mileage (mpg) 21.2973 (5.785503)
----------------------------------------------
Observations 74
----------------------------------------------
The problem now is that "Standard Deviation" had to be truncated because its column is not wide enough. You can set the width of the columns with the modelwidth() option (recall that when dealing with regression results each column is a model). If you put a single number in the parentheses the width in characters of all the columns will be set to that number. If you give a list of numbers, they will be applied to the columns in order:
esttab, modelwidth(10 20) cell((mean(label(Mean)) sd(par label
(Standard Deviation)))) label nomtitle nonumber
(Standard Deviation)))) label nomtitle nonumber
----------------------------------------------------
Mean Standard Deviation
----------------------------------------------------
Price 6165.257 (2949.496)
Car type .2972973 (.4601885)
Mileage (mpg) 21.2973 (5.785503)
----------------------------------------------------
Observations 74
----------------------------------------------------
Admittedly this will never be publication-quality when rendered as plain text. But consider this RTF version, created by:
esttab using means.rtf, modelwidth(10 20) cell((mean(label(Mean)) sd(par label(Standard Deviation)))) label nomtitle nonumber replace
Frequency Tables
Creating frequency tables also relies on using estpost to put the results in the e() vector:
estpost tab rep78 foreign
foreign |
rep78 | e(b) e(pct) e(colpct) e(rowpct)
-------------+--------------------------------------------
Domestic |
1 | 2 2.898551 4.166667 100
2 | 8 11.5942 16.66667 100
3 | 27 39.13043 56.25 90
4 | 9 13.04348 18.75 50
5 | 2 2.898551 4.166667 18.18182
Total | 48 69.56522 100 69.56522
-------------+--------------------------------------------
Foreign |
1 | 0 0 0 0
2 | 0 0 0 0
3 | 3 4.347826 14.28571 10
4 | 9 13.04348 42.85714 50
5 | 9 13.04348 42.85714 81.81818
Total | 21 30.43478 100 30.43478
-------------+--------------------------------------------
Total |
1 | 2 2.898551 2.898551 100
2 | 8 11.5942 11.5942 100
3 | 30 43.47826 43.47826 100
4 | 18 26.08696 26.08696 100
5 | 11 15.94203 15.94203 100
Total | 69 100 100 100
These are the same numbers you'd get from tab alone, just organized differently. Note that the frequencies themselves are called e(b), but we'll still use cell() because otherwise esttab will treat them like regression coefficients:
esttab, cell(b)
-------------------------
(1)
b
-------------------------
Domestic
1 2
2 8
3 27
4 9
5 2
Total 48
-------------------------
Foreign
1 0
2 0
3 3
4 9
5 9
Total 21
-------------------------
Total
1 2
2 8
3 30
4 18
5 11
Total 69
-------------------------
N 69
-------------------------
The model number, empty model title, and column label (b) are all useless here, so remove the number and title and change the label with collabels(). You could also remove the column label entirely with collabels(none).
esttab, cell(b) nonumber nomtitle collabels(Frequency)
-------------------------
Frequency
-------------------------
Domestic
1 2
2 8
3 27
4 9
5 2
Total 48
-------------------------
Foreign
1 0
2 0
3 3
4 9
5 9
Total 21
-------------------------
Total
1 2
2 8
3 30
4 18
5 11
Total 69
-------------------------
N 69
-------------------------
The unstack option converts the three sections into columns:
esttab, cell(b) unstack nonumber nomtitle collabels(none)
---------------------------------------------------
Domestic Foreign Total
---------------------------------------------------
1 2 0 2
2 8 0 8
3 27 3 30
4 9 9 18
5 2 9 11
Total 48 21 69
---------------------------------------------------
N 69
---------------------------------------------------
To control the label for the row variable use eqlabels(), but esttab thinks of it as being the left-hand-side of an equation (remember esttab was built for models). Thus you have to use the lhs() suboption within eqlabels(). You can adjust the amount of space available to the label with varwidth():
esttab, cell(b) eqlabels(, lhs("Repair Record")) varwidth(15) unstack nonumber
nomtitle collabels(none)
nomtitle collabels(none)
------------------------------------------------------
Repair Record Domestic Foreign Total
------------------------------------------------------
1 2 0 2
2 8 0 8
3 27 3 30
4 9 9 18
5 2 9 11
Total 48 21 69
------------------------------------------------------
N 69
------------------------------------------------------
You can add additional quantities to cell() and control their appearance and structure using all the tools we discussed in the section on summary statistics. Consider adding a note to explain what each number represents with the note() option:
esttab, cell(b rowpct(fmt(%5.1f) par)) note(Row Percentages in Parentheses)
unstack nonumber nomtitle collabels(none) eqlabels(, lhs("Repair Record"))
varwidth(15)
unstack nonumber nomtitle collabels(none) eqlabels(, lhs("Repair Record"))
varwidth(15)
------------------------------------------------------
Repair Record Domestic Foreign Total
------------------------------------------------------
1 2 0 2
(100.0) (0.0) (100.0)
2 8 0 8
(100.0) (0.0) (100.0)
3 27 3 30
(90.0) (10.0) (100.0)
4 9 9 18
(50.0) (50.0) (100.0)
5 2 9 11
(18.2) (81.8) (100.0)
Total 48 21 69
(69.6) (30.4) (100.0)
------------------------------------------------------
N 69
------------------------------------------------------
Row Percentages in Parentheses
This is just a fraction of what esttab (let alone estout) can do. To learn more, we suggest reading the Stata Journal article that introduced it. For syntax details, type help esttab and/or help estout.
Source: HERE