Saturday 21 March 2015

Panel Technique using STATA

*0. Data preparation stage:


set more off

* Download and open the dataset using Stata: LINK

* To start recording the commands and outputs:

log using Panel_log.log, replace


*1. Create unique identifiers for countries:
sort Country
egen idc=group(Country)

*2. Convert data into panel format

reshape long y, i(idc Variable) j(Year)
reshape wide y, i(idc Year) j(Variable) string

*3. Rename variables:

rename (yGDP yIF ySP) (GDP IF SP)

*4. Add labels:

label variable Year "Period"
label variable GDP "Gross Domestic Product"
label variable IF "Inflation Rate"
label variable SP "Stock Market"

*5. Reorder the variables:

order Country idc Year GDP IF SP

*6. Specify cross-sectional (x) and time-series (t) variables:

xtset idc Year

*7. Get description of the data:

describe

*8. Get descriptive statistics:

xtsum

*9. Get correlation matrix with 5% significance level:

pwcorr GDP IF SP, star(5)

*10. Defining dependent and independent variables:
global ylist GDP
global xlist IF SP

* Tip: In case if you misdefined a global macro, delete it by running: macro drop xlist 

*11. To check for heterogeneity across countries visually, run:

bysort idc: egen y_mean_x=mean($ylist)
twoway scatter $ylist idc, msymbol(circle_hollow) || connected y_mean_x idc, msymbol(diamond)
graph save graph1, replace

*12. To check for heterogeneity across years visually, run:
bysort Year: egen y_mean_t=mean($ylist)
twoway scatter $ylist Year, msymbol(circle_hollow) || connected y_mean_t Year, msymbol(diamond)
graph save graph2, replace

*13. Sort the panel data:
sort idc Year

* If you haven't installed the estout package yet, run: ssc install estout, replace
* If you are not sure, then go to Help -> Stata Command -> type estout
* If says 'Not Found', then you need to install it.


*1. Static Models: Fixed Effects and Random Effects 

*1.1. Running the Fixed Effects:
xtreg $ylist $xlist, fe

* Storing the estimates in memory:
eststo fixed

* Comparing FE with regression:
regress $ylist $xlist i.idc

*1.2. Running Random Effects:
xtreg $ylist $xlist, re

* Storing the estimates in memory:
eststo random

* Running Hausman test to choose between FE and RE:
hausman fixed random, sigmamore


* If p-value > 5%, then it is safe to use RE

* If p-value < 5%, then you should use FE

* If FE is selected, then, to solve heteroscedasticity problem, run FE as:
xtreg $ylist $xlist, fe vce(robust)


*2. Dynamic Models: Mean Group, Pooled Mean Group, and Dynamic Fixed Effects


* If you haven't installed the XTPMG package yet, run: ssc install xtpmg, replace
* If you are not sure, then go to HELP -> Stata Command-> type xtpmg. 
* If says 'Not Found', then you need to install it.

* Running MG (average):
xtpmg d.GDP d.IF d.SP, lr(l.GDP IF SP) ec(ECT) replace mg

* Running MG (individual):
xtpmg d.GDP d.IF d.SP, lr(l.GDP IF SP) ec(ECT) replace full mg

*Running PMG (average):
xtpmg d.GDP d.IF d.SP, lr(l.GDP IF SP) ec(ECT) replace pmg

* 2.2. Running PMG (individual):
xtpmg d.GDP d.IF d.SP, lr(l.GDP IF SP) ec(ECT) replace full pmg

* Running Hausman test to choose between MG and PMG:
hausman mg pmg, sigmamore


* If p-value > 5%, then use PMG

* If p-value < 5%, then use MG

* Running DFE:
xtpmg d.GDP d.IF d.SP, lr(l.GDP IF SP) ec(ECT) replace dfe

* Running Hausman test to choose between MG and DFE:
hausman mg DFE, sigmamore

* Note: For panel unit root tests (xtunitroot), you can use Stata Menu --> Statistics --> Longitudinal/Panel data --> Unit Root Tests



* PANEL GMM

* GMM is applicable to the cases in which the number of periods is small relative to the number of cross-sectional observations (T < or = N). Otherwise - asymptotic imprecision and biases may arise. 

* Rule of thumb for avoiding over-identification of instruments is that the number of instruments be less than or equal to the number of groups in the regressions. (Barajas et al., 2013).

* Possible solution: If you are analysing 7 countries (N) for the period of 25 years (T), you can average each 5 years to make the number of observations equal to 5, hence T<N.

* Running Panel GMM (First Difference):
xtabond GDP IF SP, lags(1)
estat sargan
xtabond GDP IF SP, lags(1) vce(robust)
estat abond
xtabond GDP IF SP, lags(1) twostep vce(robust)
estat abond

* Running Panel GMM (System): 
xtdpdsys GDP IF SP, lags(1)
estat sargan
xtdpdsys GDP IF SP, lags(1) vce(robust)
estat abond
xtdpdsys GDP IF SP, lags(1) twostep vce(robust)
estat abond

* To close log journal, and clear all the commands and macros:

log close
clear all
macro drop _all

eststo clear




Notes:
  1. Static Models are Fixed Effects, Random Effects.
  2. Dynamic Models: Mean Group, Pooled Mean Group, Dynamic Fixed Effects.
  • Fixed effects (FE) is used to control for omitted variables that differ between cases but are constant over time
    • It lets you use the changes in the variables over time to estimate the effects of the independent variables on your dependent variable. This is equivalent to generating dummy variables for each of your cases and including them in a standard linear regression to control for these fixed "case effects". 
    • It works best when you have relatively fewer cases (N) and more time periods (T), as each dummy variable removes one degree of freedom from your model.
  • Random Effects (RE) is used if you believe that some omitted variables may be constant over time but vary between cases, and others may be fixed between cases but vary over time, then you can include both types by using RE. Stata's RE estimator is a weighted average of fixed and between effects.
  • Which one: Fixed Effects or Random Effects?: The generally accepted way of choosing between FE and RE is running a Hausman test
    • Statistically, FE are always a reasonable thing to do with panel data because they always give consistent results, but they may not be the most efficient model to run. 
    • RE will give you better p-values as they are a more efficient estimator, so you should run random effects if it is statistically justifiable to do so. 
    • The Hausman test checks a more efficient model against a less efficient but consistent model to make sure that the more efficient model also gives consistent results. 
    • The Hausman test tests the Null Hypothesis that the coefficients estimated by the efficient RE estimator are the same as the ones estimated by the consistent FE estimator. 

If p-value >5%, then use RE

If p-value <5%, then use FE



See a nice summary for Pooled OLS, Fixed effects and Random effects HERE


  • Why MG or PMG? 
    • If the number of time series is relatively large than cross section (T >N). For large T, Pesaran and Smith (1995) show that the traditional panel techniques (FE, instrumental variables, GMM estimators) can produce inconsistent, and potentially very misleading estimates of the average values of the parameters in dynamic panel data model unless the slope coefficients are in fact identical. 
    • If you are analyzing the long-run effects and the speed of adjustment to the long-run.

  • Mean Group (MG):
    • The least restrictive procedure and it allows for heterogeneity of all the parameters (imposes no cross-country restriction);
    • It consists of estimating separate regressions for each country and computing averages of the country-specific coefficients, which will provide consistent estimates of the long-run coefficients.
    • The assumptions are quite strong – require that the group-specific parameters are distributed independently of the regressors, and the regressors are strictly exogenous.
    • Does not take account of the fact that certain parameters may be the same across groups.

  • Dynamic Fixed Effect (DFE):
    • Individual-specific effects (such as country, states, firms) can be controlled for;
    • Generally imposes homogeneity of all slope coefficients, allowing only the intercepts to vary across countries.

  • Pooled Mean Group (PMG):
    • Intermediate estimator (between DFE and MG);
    • Allows the intercepts, short-run coefficients and error variances to differ freely across groups, but constrains the long-run coefficients to be similar across groups;
    • Have advantages to determine the long-run and short-run dynamic relationships.

  • MG vs PMG:
    • The MG estimator provides consistent estimates of the mean of the long-run coefficients, though these will be inefficient if slope homogeneity holds.
    • Under long-run slope homogeneity, the PMG estimators are consistent and efficient.

  • Which one: MG vs PMG? Hausman-type test is applied to the difference between the MG and the PMG. Where, under the null hypothesis, the difference in the estimated coefficients between the MG and PMG are not significantly different and PMG is more efficient. If p-value > 0.05, we conclude that the PMG estimator, the efficient estimator under the null hypothesis, is preferred.

If p-value >5%, then use PMG

If p-value <5%, then use MG
  • Which one: MG vs DFE? Hausman-type test is applied to the difference between the MG and the Differenced Fixed Effects (DFE). If p-value > 0.05, we conclude that the DFE model is preferred over the MG model. 

If p-value >5%, then use DFE

If p-value <5%, then use MG