Monday, 16 March 2015

Tips for Stata

                                                               

Tip 1: Steps for preparing data for Stata (panel format):

If you have data in this format:

Company______Accounts______2013______2014____2015
Petronas         Total Assets         1500           1700         2000
Petronas            EBIT                  750             850          1000

An, you need to transpose the data to make it look like this:



Company________Year____Total Assets__________EBIT
Petronas                 2013               1500                          750
Petronas                 2014               1700                          850
Petronas                 2015               2000                        1000

Do the following:

1. In Excel, insert letter 'y' in front of each year, for example:

Company____Accounts________y2013____y2014____y2015
Petronas           Total Assets         1500           1700            2000
Petronas              EBIT                  750             850             1000

2. Import the data into Stata,

3. Create unique IDs for Accounts and Companies:

sort Accounts
egen ida=group(Accounts)

sort Company
egen idc=group(Company)


4. Transpose data into long format:

reshape long y, i(idc ida) j(year)


5. Delete a column with Accounts, but keep ida (ID for Accounts):

drop Accounts


6. Transpose the data into wide format:

reshape wide y, i(idc year) j(ida)


7. Now, you can rename Accounts IDs (ida) as 'TotalAssets' and 'EBIT':

rename 1 'TotalAssets"
rename 2 "EBIT"


8. To locate the column 'idc' just before 'Company' column:

move Company idc


****************

Practice:

*0. Import the data:
use https://dl.dropboxusercontent.com/u/18255955/PhD-INCEIF/Panel%20Data_Tutorial.dta, clear

*1. Create unique identifiers for countries:
sort Country
egen idc=group(Country)

*2. Convert data into panel format

reshape long y, i(idc Variable) j(Year)
reshape wide y, i(idc Year) j(Variable) string

*3. Rename variables:

rename (yGDP yIF ySP) (GDP IF SP)

*4. Add labels:

label variable Year "Period"
label variable GDP "Gross Domestic Product"
label variable IF "Inflation Rate"
label variable SP "Stock Market"



Here is the video with instructions:


______________________________________________
Tip 2: Filling gaps in time within panel data

______________________________________________

A. For example, your panel data has observations for years 2010 2013 2014 2015, but there are missing years (2011 and 2012). Use the following commands to fill the gaps:

xtset idc year
tsfill

B. If you want to make your panel data balanced (equal number of years for all panels), then use these commands:

xtset idc year
tsfill, full

See details here: LINK

______________________________________________
Tip 3: Filling in missing values within panel data
______________________________________________

The next commands fill the missing values in your data by averaging previous and next values, if those numbers are available. If not, then it takes the nearest available value.

1. Install nnipolate package:

ssc install nnipolate, replace

2. Define panel data:

xtset idc year

3. Fill in missing values:

by idc: nnipolate variable year, gen(new_variable)



______________________________________________
Tip 4: Calculation of Herfindahl-Hirschman Index - HHI
______________________________________________

The market concentration (HHI) is calculated by squaring the market share of each bank/firm competing in a market, and then summing the resulting numbers.

First, before we calculate HHI using Stata, install the package:

ssc install hhi, replace

Then, run the code. Example:

hhi Market_Value, by (Year Country)

Done!


______________________________________________
Tip 5: Using loop function for transformation of variables
______________________________________________

Let's assume that you have the following list of variables: ROA, ASQ, LIQ, CAP, EFF.
Type and run the following code in the Stata command area (each line separately):

1. To transform all your variables into log form:

foreach var of varlist ROA ASQ LIQ CAP EFF {
gen log_`var' = log( `var' )
}

2. To transform all your variables into log form with zero skewness (normal distribution):

foreach var of varlist ROA ASQ LIQ CAP EFF {
lnskew0 L`var' = `var'
}

3. To transform all your variables using Box-Cox procedure for zero skewness (normal distribution):

foreach var of varlist ROA ASQ LIQ CAP EFF {
bcskew0 BC`var' = `var'
}

NOTE: BE CAREFUL WITH THE TYPE OF APOSTROPHE IN THE SECOND LINE!

  • ` is different from '
  • The first one is located on the left of number 1 on your keyboard, another one is besides Enter.
______________________________________________
Tip 6: Correlations Table with stars as significance indicators
______________________________________________
  • Option 1: Using PWCORR:
pwcorr YOUR_LIST_OF_VARIABLES, star(0.05)

  • Option 2: Using ESTOUT package (to install: ssc install estout, replace ):
estpost correlate LIST_OF_YOUR_VARIABLES, matrix listwise
esttab, unstack not noobs compress


______________________________________________

______________________________________________