Research Tips: Tips for Stata

Tip 1: Steps for preparing data for Stata (panel format):

If you have data in this format:

Company______Accounts______2013______2014____2015
Petronas Total Assets 1500 1700 2000
Petronas EBIT 750 850 1000

An, you need to transpose the data to make it look like this:

Company________Year____Total Assets__________EBIT
Petronas 2013 1500 750
Petronas 2014 1700 850
Petronas 2015 2000 1000

Do the following:

1. In Excel, insert letter 'y' in front of each year, for example:

Company____Accounts________y2013____y2014____y2015
Petronas Total Assets 1500 1700 2000
Petronas EBIT 750 850 1000

2. Import the data into Stata,

3. Create unique IDs for Accounts and Companies:

sort Accounts
egen ida=group(Accounts)

sort Company
egen idc=group(Company)

4. Transpose data into long format:

reshape long y, i(idc ida) j(year)

5. Delete a column with Accounts, but keep ida (ID for Accounts):

drop Accounts

6. Transpose the data into wide format:

reshape wide y, i(idc year) j(ida)

7. Now, you can rename Accounts IDs (ida) as 'TotalAssets' and 'EBIT':

rename 1 'TotalAssets"
rename 2 "EBIT"

8. To locate the column 'idc' just before 'Company' column:

move Company idc

****************
Practice:

*0. Import the data:
use https://dl.dropboxusercontent.com/u/18255955/PhD-INCEIF/Panel%20Data_Tutorial.dta, clear

*1. Create unique identifiers for countries:
sort Country
egen idc=group(Country)

*2. Convert data into panel format
reshape long y, i(idc Variable) j(Year)
reshape wide y, i(idc Year) j(Variable) string

*3. Rename variables:
rename (yGDP yIF ySP) (GDP IF SP)

*4. Add labels:
label variable Year "Period"
label variable GDP "Gross Domestic Product"
label variable IF "Inflation Rate"
label variable SP "Stock Market"

Here is the video with instructions:

______________________________________________

Tip 2: Filling gaps in time within panel data

______________________________________________

A. For example, your panel data has observations for years 2010 2013 2014 2015, but there are missing years (2011 and 2012). Use the following commands to fill the gaps:

xtset idc year
tsfill

B. If you want to make your panel data balanced (equal number of years for all panels), then use these commands:

xtset idc year
tsfill, full

See details here: LINK

______________________________________________

Tip 3: Filling in missing values within panel data

______________________________________________

The next commands fill the missing values in your data by averaging previous and next values, if those numbers are available. If not, then it takes the nearest available value.

1. Install nnipolate package:

ssc install nnipolate, replace

2. Define panel data:

xtset idc year

3. Fill in missing values:

by idc: nnipolate variable year, gen(new_variable)

______________________________________________

Tip 4: Calculation of Herfindahl-Hirschman Index - HHI

______________________________________________

The market concentration (HHI) is calculated by squaring the market share of each bank/firm competing in a market, and then summing the resulting numbers.

First, before we calculate HHI using Stata, install the package:

ssc install hhi, replace

Then, run the code. Example:

hhi Market_Value, by (Year Country)

Done!

______________________________________________

Tip 5: Using loop function for transformation of variables

______________________________________________

Let's assume that you have the following list of variables: ROA, ASQ, LIQ, CAP, EFF.
Type and run the following code in the Stata command area (each line separately):

1. To transform all your variables into log form:

foreach var of varlist ROA ASQ LIQ CAP EFF {
gen log_`var' = log( `var' )
}

2. To transform all your variables into log form with zero skewness (normal distribution):

foreach var of varlist ROA ASQ LIQ CAP EFF {
lnskew0 L`var' = `var'
}

3. To transform all your variables using Box-Cox procedure for zero skewness (normal distribution):

foreach var of varlist ROA ASQ LIQ CAP EFF {
bcskew0 BC`var' = `var'
}

NOTE: BE CAREFUL WITH THE TYPE OF APOSTROPHE IN THE SECOND LINE!

` is different from '
The first one is located on the left of number 1 on your keyboard, another one is besides Enter.

______________________________________________

Tip 6: Correlations Table with stars as significance indicators

______________________________________________

Option 1: Using PWCORR:

pwcorr YOUR_LIST_OF_VARIABLES, star(0.05)

Option 2: Using ESTOUT package (to install: ssc install estout, replace ):

estpost correlate LIST_OF_YOUR_VARIABLES, matrix listwise
esttab, unstack not noobs compress

______________________________________________

Labels

Pages

Monday, 16 March 2015

Tips for Stata