*==============================================================* * Ex2.do * *==============================================================* **************************************************************** * This do file includes a number of lines that say: * * *STOP HERE AND THINK!! * * When Stata reaches the first of these, it will not * * reconise it as a valid command and will come to a juddering * * halt. At this point, there will be a few questions for you * * to think about. When you have answered them, delete the * * offending line and re-run the do-file. It will crash at the * * next point where you have some questions to think about * **************************************************************** **************************************************************** * In this exercise: * * Hausman test of the random-effects model * * Examples of the linear probability model * * Random effects probit, logit and conditional (fixed effects) * * logit models for a binary dependent variable. * * Calculation of predicted probabilities * * Instrumental variable estimation (using xtivreg, xtivreg2 * * and xthtaylor) for models involving endogenous explanatory * * variable * **************************************************************** version 13 clear all set more off set matsize 800 clear mata set maxvar 20000 capture log close local working "P:\working" log using "`working'\ex2.log", replace *************************************************************** * Load the panel data and tell Stata it is panel data * *************************************************************** use "`working'\longperson_unbal_2.dta" xtset id wave *************************************************************** * The RE estimates are only consistent if the regressors are * * uncorrelated with the individual effects. Test this * * assumption using a Hausman test. * *************************************************************** xtreg lwage_hr age agesq cohort married female jbemp degree further /// tucov permanent nsw if keeper==1, fe estimates store fixed //store results under name fixed xtreg lwage_hr age agesq cohort married female jbemp degree further /// tucov permanent nsw if keeper==1, re estimates store random //store results under name random hausman fixed random *************************************************************** * Now test the zero correlation using the mundlak formulation.* * Recall that this involves augmenting the RE model with the * * individual means of the time varying characteristics. * *************************************************************** * Create individula means for time-varying covariates foreach x in age agesq married jbemp /// degree further tucov permanent nsw { capture drop m`x' egen m`x'=mean(`x'), by(id) //individual means } ***************************************************************** * Questions to think about: * * * * (1) Re-run the random effects regression adding the covariate * * means as additional RHS variables. How do the coefficients* * compare with the ones you produced earlier? * * (2) Test the RE assumption using the Mundlak approach, using * * the test command to test the significance of the added * * variables * * (3) Based on the results of the two tests, what do you * * conclude about the appropriateness of the RE model? * * what do the differences between FE and RE estimates say * * about possible omitted variable biases? * ***************************************************************** *STOP HERE AND THINK!! ************************************************************** * Continuing with the wage example, suppose we specifically * * want to examine the determinants of low pay. Create a dummy* * variable to indicate an hourly wage of less than $14/hour. * * Examine the proportion of low paid workers, how it has * * changed over time and low pay transitions. * ************************************************************** gen lopay=w_hr<14 xttab lopay xttrans lopay bysort wave: su lopay ************************************************************** * Estimate a linear probability model (LPM) to explain low * * pay, using the same specification as the wage equation. * * Estimate a RE model. * ************************************************************** xtreg lopay age agesq cohort married female further degree /// jbemp permanent nsw if keeper==1, re ************************************************************** * Predict the probability of being low paid for each person. * * Type help xtreg post-estimation for details of the predict * * command. * ************************************************************** predict reprob if keeper==1, xb predict reprob1 if keeper==1, xbu //including predicted individual effect summ lopay reprob reprob1, de ************************************************************** * Estimate an FE LPM and examine the predicted probabilities.* ************************************************************** xtreg lopay age agesq cohort married female further degree /// jbemp permanent nsw if keeper==1, fe predict feprob if keeper==1, xb predict feprob1 if keeper==1, xbu //including predicted individual effect summ lopay feprob feprob1, de ************************************************************** * Questions to think about: * * (1) This is a very crude model. Do you think the * * covariates are likely to be adequate to explain * * movements in the prevalence and pattern of low pay * * over time? How would you interpret the coefficients * * in these linear probability models? [Note they can be * * interpreted directly, unlike in the probit and logit * * models below.] * * (2) Examine the predictions. What do you notice? * * Compare the covariate means for cases where the * * predictions are clearly implausible with those for * * cases where the predictions are a priori plausible. * * Why might there be a problem for policy analysis? * ************************************************************** *STOP HERE AND THINK!! ************************************************************** * Estimate the model as a RE probit and examine the * * predicted probabilities (type help xtprobit post-estimation* * for details of the predict command). * * What do you notice? How have the probabilities been * * calculated? * ************************************************************** xtprobit lopay age agesq cohort married female further degree /// jbemp permanent nsw if keeper==1, re predict repprob if keeper==1, pu0 summ lopay repprob, de ************************************************************** * Now estimate the model as a RE logit and examine the * * predicted probabilities (type help xtlogit postestimation * * for details of the predict command). * * What do you notice? How have the probabilities been * * calculated? * ************************************************************** xtlogit lopay age agesq cohort married female further degree /// jbemp permanent nsw if keeper==1, re predict relprob if keeper==1, pu0 summ lopay relprob, de ************************************************************** * Do the linear probability model, logit model and probit * * model produce similar predictions of Pr(y=1) for different * * types of individuals? Use scatter plots of predicted * * probabilities from the different (random effects) models. * ************************************************************** *First plot logit and probit predicted probabilities for first 2000 obs scatter relprob repprob in 1/2000, /// title(Logit and Probit predicted probabilities) /// xtitle(Predicted probabilities from the probit) /// ytitle(Predicted probability from the logit) /// scheme(s1color) legend(off) graph save "`working'\Logit vs. Probit.gph", replace * Then plot LPM and logit predicted probabilities * scatter reprob relprob in 1/2000, msize(small) /// title(LPM and Logit predicted probabilities) /// xtitle(Predicted probabilities from the logit) /// ytitle(Predicted probabilities from the LPM) scheme(s1color) graph save "`working'\Logit vs. LPM.gph", replace * now combine the graphs and export them to a Windows * MetaFile for inclusion in a Word document graph combine "`working'\Logit vs. Probit.gph" /// "`working'\Logit vs. LPM.gph", cols(2) graph export "`working'/Logit vs. Probit vs. LPM.wmf", /// as(wmf) replace ************************************************************** * Questions to think about: * * * * (1) What do these plots tell you about the choice between * * logit, probit and linear models? * * (2) Produce a plot of the logit vs. linear predicted * * probabilities, restricting attention to cases where * * the linear model predicts a probability below 0.1 * ************************************************************** *STOP HERE AND THINK!! ************************************************************** * Estimate the FE logit (recall there is no FE probit). * * Why does Stata report that it has dropped (a lot of!) * * observations? Why is there no constant reported? * ************************************************************** xtlogit lopay age agesq cohort married female further degree /// jbemp permanent nsw if keeper==1, fe ************************************************************** * Predict the probabilites implied by the FE logit. What do * * you notice? How are the probabilities calculated? * ************************************************************** predict felprob, pu0 summ felprob, de ************************************************************** * We wish to test the RE assumption that the individual * * effects are uncorrelated with the regressors. We compare * * the RE and FE logit models. Re-estimate both models and * * store the results. Carry out a Hausman test * ************************************************************** xtlogit lopay age agesq cohort married further degree jbemp /// permanent nsw if keeper==1, fe estimates store fixed xtlogit lopay age agesq cohort married further degree jbemp /// permanent nsw if keeper==1, re estimates store random hausman fixed random ************************************************************** * Questions to think about: * * * * (1) What's the interpretation of the Hausman test? * * (2) How do the FE coefficients differ from the RE model? * * (3) Which model do you prefer? * ************************************************************** *STOP HERE AND THINK!! ************************************************************** * Predicting the probability for a person with selected * * characteristics. * * The predict command predicts the probabilities * * for each observation separately (given their * * characteristics). * * To make predictions for a specific type of person we need * * to use the model formulae directly and plug in the chosen * * values. Use command nlcom (which also gives standard * * errors). Note this is a rather clumsy manual approach: * * the more complicated margins command could be used instead * * * * Calculate the probability of being low paid for a reference* * male aged 40 and born in 1967 (i.e. current yr is 2007), * * who is not married, has no further or higher education, * * non-unionised permanent job, has been on his job for 1 year* * and does not live in nsw. * * Calculate the probability for * * 3 values of individual effect, low (-1 sd), medium (0) and * * high (+1 sd). What impact do unobserved individual level * * factors have on the probability of being low paid? * ************************************************************** * For convenience, use model just estimated (RE logit) * "replay" previous results for reference xtlogit ************************************************************** * write down xb for ref person using stored coefficients, * accessed using _b notation * use a local "macro" to store the text for future use ************************************************************** local refxb _b[_cons] + _b[age]*4 + _b[agesq]*16 + /// _b[cohort]*1967 + _b[jbemp]*1 + _b[permanent] * use a macro to store the standard deviation of u(i) * local ui=e(sigma_u) //estimate of sd from model display e(sigma_u) ************************************************************** * NOTE: macro subtlety * * A local statement without the = sign stores the * * following text directly in the macro * * A local statement with the = sign evaluates the expression * * and stores the resulting number in the macro * * You will see difference below when the macro contents are * * displayed. * ************************************************************** * calculate probabilities using logit formula* *low individual effect -use -1 st dev of u(i)* * Note single quotes used to access macro contents nlcom prlo:exp(`refxb'-`ui')/(1+exp(`refxb'-`ui')) * medium individual efefct, u(o)=0 nlcom prmed: exp(`refxb') / (1+exp(`refxb')) * High individual effects - use +1 st dev of u(i) nlcom prhi: exp(`refxb'+`ui') / (1+exp(`refxb'+`ui')) * calculate xb for ref person who has done further studies * local refxb1 _b[_cons] + _b[age]*4 + _b[agesq]*16 + /// _b[cohort]*1967 + _b[jbemp]*1 + _b[permanent]+_b[degree] local ui=e(sigma_u) * calculate probabilities * nlcom prlo: exp(`refxb1'-`ui') / (1+exp(`refxb1'-`ui')) nlcom prmed: exp(`refxb1') / (1+exp(`refxb1')) nlcom prhi: exp(`refxb1'+`ui') / (1+exp(`refxb1'+`ui')) ************************************************************** * Questions to think about: * * * * (1) Consider the impact of the individual effect (u_i) vs. * * education (degree). How successful is the model in * * "explaining" low pay? * * (2) Using the same method, try exploring the impact of * * other covariates on the probability of low pay. * * (3) How would the impact of education have differed, if we * * had chosen a baseline individual with different * * characteristics? * ************************************************************** *STOP HERE AND THINK!! ************************************************************** * We continue with the wage equation. We suspect that the * * error terms may be correlated with job tenure. * * Various sources of bias are possible: * * - people with higher earning ability tend to hold jobs * * for longer(?). Implies positive correlation between u(i) * * and jbemp. Note this is controlled for in the FE model. * * - people stay longer in high-paid jobs (high lwage_hr)). * * Implies positive correlation between high lwage_hr and * * jbemp. * * - people move to good (high paid) jobs. Implies negative * * correlation between high lwage_hr and jbemp. * * - Any others? * * To control for a possible correlation of jbemp with * * lwage_hr. we will use instrumental variables. We need * * variables which help explain jbemp, but are uncorrelated * * with wages (conditional on regressors). Suggested * * instruments are hours worked by spouse (spousehr) [zero if * * not employed employed or no spouse], preference to stay in * * current home (wantstay), and whether renting home (tenant).* * To allow for the effects to vary over men and women, * * interact the instruments with a gender dummy variable. * * Based on Ex3, the appropriate model is FE.Before estimating* * the model it is necessary to check that the instruments * * help explain jbemp (using FE (why?) equation). * ************************************************************** * Derive extra variables for the model * * preference to stay in current home * sort id wave tab losathl replace losathl =. if losathl<0 drop if losathl==. capture drop wantstay recode losathl(6/10=1) (else=0), gen (wantstay) tab wantstay *whether renting home* capture drop tenant recode hstenur(2=1) (else=0), gen(tenant) drop if hstenur==. save "`working'\temp.dta",replace ************************************************************** * Create a file of spouse information * ************************************************************** use "`working'\longperson_unbal.dta" keep hhpxid wave hgage hgsex jbhruc edhigh1 esdtl tab wave if hhpxid =="" drop if hhpxid=="" rename hhpxid xwaveid rename hgage spouseage rename hgsex spousesex rename jbhruc spousehr rename edhigh1 spouseedhigh1 rename esdtl spouseesdtl sort xwaveid wave save "`working'\temp2.dta", replace clear ************************************************************** * Merge with original dataset to add spouse characteristics * * as additional variables ************************************************************** use "`working'\temp.dta" sort xwaveid wave merge 1:1 xwaveid wave using "`working'\temp2.dta" tab _merge drop if _merge==2 * if no spouse assign value zero * replace spouseage=0 if _merge==1 replace spousesex=0 if _merge==1 replace spouseesdtl=0 if _merge==1 replace spousehr=0 if _merge==1 replace spouseedhigh1=0 if _merge==1 drop _merge * set spousehr=0 if not employed * replace spousehr=0 if spouseesdtl!=1 & spouseesdtl!=2 & spouseesdtl!=7 replace spousehr=. if spousehr<0 drop if spousehr==. //drop the non-responding records from the analysis * Possible interactions with gender gen femten=female*tenant gen femstay=female*wantstay gen femsphr=female*spousehr ************************************************************** * Estimate a FE reduced form model for job tenure, including * * the instrumental variables as covariates. This is the * * "1st stage" regression of 2SLS/IV * ************************************************************** xtreg jbemp age agesq cohort married female degree further /// permanent nsw tenant femten wantstay femstay spousehr femsphr /// if keeper==1, fe test tenant femten wantstay femstay spousehr femsphr test femten femstay femsphr * Note that xtivreg will show this first stage regression as an option: xtivreg lwage_hr age agesq cohort married female degree further /// permanent nsw /// (jbemp=tenant femten wantstay femstay spousehr femsphr) /// if keeper==1, first fe ************************************************************** * Questions to think about: * * (1) Good instruments must satisfy a VALIDITY condition - * * i.e they must have no direct causal impact on the * * dependent variable. Is this plausible for our chosen * * IVs: tenant femten wantstay femstay spousehr femsphr? * * (2) Good instruments must also satisfy the RELEVANCE * * condition - they must be strongly correlated with the * * endogenous covariate (jbemp), after allowing for the * * other exogenous covariates. Is this so here? * * (3) Note that using lots of weak instruments doesn't help * * We can probably strengthen the 1st-stage regression by * * dropping the interaction variables femten, femstay and * * femsphr from the instrument list. Re-run the estimates * * Does the first stage regression look stronger now? * * (4) The validity of the spousehr instrument is especially * * questionable. What happens when you drop it from the * * instrument list * ************************************************************** *STOP HERE AND THINK!! ************************************************************** * Now test whether job tenure is, in fact, exogenous in the * * FE model. We can use a Hausman test for this, assuming our * * instruments are valid. Recall we need an estimator that is * * consistent under both H0 (tenure is exogenous) and H1 * * (tenure is endogenous); and we need a second estimator * * that is efficient under H0, but inconsistent under H1. * ************************************************************** * consistent under H0 and H1 estimates store ivfe * efficient under H0; inconsistent under H1 xtreg lwage_hr age agesq cohort married female degree further /// permanent nsw jbemp if keeper==1, fe estimates store fe hausman ivfe fe ************************************************************** * What do you conclude? What is your best estimate of the * * tenure effect (with confidence interval)? * ************************************************************** ************************************************************** * Now we are going to illustrate the Hausman Taylor Method. * * Its main attraction is to allow some time-invariant * * characteristics to be correlated with u(i). To identify * * their coefs, we required at least as many time-varying * * characteristics which are uncorrelated with u(i). We are * * going to modify the wage equation to include some highest * * education someone ever attained, ant try to estimate the * * returns to education based on this variation across * * individuals. Estimate returns using RE model for comparison* ************************************************************** egen everdeg=max(degree), by(id) //ever got degree egen everfur=max(further), by(id) //ever got further edu replace everfur=0 if everdeg //replace with highest assert everfur+everdeg<2 if everdeg != . //check only got the highest xtreg lwage_hr age agesq cohort married female everdeg everfur /// permanent nsw jbemp if keeper==1, re ************************************************************** * Assume that age is uncorrelated with u(i), but that all * * other time-varying characteristics are correlated with * * u(i). Does this model satisfy the identification condition?* * Check how strongly age and age squared are correlated * * with education. * ************************************************************** correlate age agesq further degree ************************************************************** * Do these correlations suggest that age and age squared are * * good instruments? * * Estimate HT model * ************************************************************** xthtaylor lwage_hr age agesq cohort married female everdeg everfur /// permanent nsw jbemp if keeper==1, /// endog(married everdeg everfur jbemp permanent nsw) ************************************************************** * What do you conclude? * * Other time-varying characteristics can be included in the * * instrument set, but only if we are satisfied they are not * * correlated with the individual effect. Otherwise estimates * * are biased. For example, assume that the living in NSW is * * not correlated with u(i) (is this plausible?): * ************************************************************** correlate age agesq nsw further degree xthtaylor lwage_hr age agesq cohort married female /// everdeg everfur jbemp permanent nsw if keeper==1, /// endog(married everdeg everfur jbemp permanent) ************************************************************** * A difficulty with HT is finding good instruments from * * within the model which are also uncorrelated with u(i). * * You might like to consider other instruments. * ************************************************************** ************************************************************** * An alternative strategy would be use external instruments * * and a conventional RE model. But all regressors that are * * not instrumented must be uncorrelated with u(i). * * Parental background measures are sometimes used to * * instrument educational attainment (assumed correlated with * * children's education but not with their wage). Examine the * * dummy variables for father's one digit occupation. What do * * you notice? * ************************************************************** * derive the dummy variables for father's occupation * replace fmfo62 =. if fmfo62<0 recode fmfo62(10/19=1) (else=0), gen(pamanager) recode fmfo62(20/29=1) (else=0), gen(paprof) recode fmfo62(30/39=1) (else=0), gen (patechtrade) recode fmfo62(40/49=1) (else=0), gen (pacomserv) recode fmfo62(50/59=1) (else=0), gen (paclerical) recode fmfo62(60/69=1) (else=0), gen (pasales) recode fmfo62(70/79=1) (else=0), gen (pamachop) recode fmfo62(80/89=1) (else=0), gen (palabour) ************************************************************** * To deal with missing values, create a dummy variable to * * indicate father's occupation missing. * ************************************************************** gen pamiss=1 if pamanager==0 & paprof==0 & patechtrade==0 & pacomserv==0 /// & paclerical==0 & pasales==0 & pamachop==0 /// & palabour==0 replace pamiss=0 if pamiss==. xtsum pamanager paprof patechtrade /// pacomserv paclerical pasales pamachop palabour * Estimate by RE IV: xtivreg lwage_hr age agesq cohort married female jbemp permanent /// (everdeg everfur=pamanager paprof patechtrade /// pacomserv paclerical pasales pamachop /// palabour pamiss) if keeper==1, re est store ivre ************************************************************** * Questions to think about: * * (1) re-run xtivreg using the "first" option. Look at the * * first stage regressions. Are these instruments better * * than the internal ones used by HT? * * (2) Assuming the instruments are OK, is education * * exogenous here? Use a Hausman test to compare the IV * * RE regression. What is your comnclusion? * ************************************************************** *STOP HERE AND THINK!! **************************************************************** * Save the new data set for day 3 * **************************************************************** save "`working'\longperson_unbal_3.dta", replace set more on log close exit ************************************************************** * FURTHER MATERIAL - A MORE SOPHISTICATED IV COMMAND * * * * The "canned" Stata routine for panel IV is xtivreg. We * * are now going to install a more advanced command which * * provides more diagnostic statistics (but is for FE models * * only). * ************************************************************** ssc install ranktest, replace ssc install ivreg2, replace ssc install xtivreg2, replace ssc describe ivreg2 ************************************************************** * Questions to think about again: * * (1) Good instruments must satisfy a VALIDITY condition - * * i.e they must have no direct causal impact on the * * dependent variable. Is this plausible for our chosen * * IVs: tenant femten wantstay femstay spousehr femsphr? * * What does the Sargan test say about this? * * (2) Good instruments must also satisfy the RELEVANCE * * condition - they must be strongly correlated with the * * endogenous covariate (jbemp), after allowing for the * * other exogenous covariates. Is this so here? * * (3) Note that using lots of weak instruments doesn't help * * We can probably strengthen the 1st-stage regression by * * dropping the interaction variables femten, femstay and * * femsphr from the instrument list. Re-run the estimates * * Does the first stage regression look stronger now? * * What is the outcome of the Sargan test for instrument * * validity? * * (4) The validity of the spousehr instrument is especially * * questionable. What happens when you drop it from the * * instrument list * ************************************************************** xtivreg2 lwage_hr age agesq cohort married female degree /// further permanent nsw /// (jbemp=tenant femten wantstay femstay spousehr femsphr) /// if keeper==1, fe xtivreg2 lwage_hr age agesq cohort married female degree /// further permanent nsw /// (jbemp= tenant wantstay spousehr) if keeper==1, fe first xtivreg2 lwage_hr age agesq cohort married female degree /// further permanent nsw /// (jbemp= tenant wantstay) if keeper==1, fe first