Introduction to Stata Lecture 3: Panel Data Hayley Fisher 1 March 2010 Key reference: Cameron and Trivedi (2009), chapter 8. 1Data used in this lecture This lecture uses data from the fi rst four waves of the British Household Panel Survey (BHPS). I will make the relevant source fi les temporarily available on my website but cannot host them there permanently. You can get the full set of fi les from the ESDS (http:/www.esds.ac.uk/fi ndingData/bhps.asp). If you want to learn more about the BHPS and how to use it in Stata, I recommend the BHPS introductory courses provided by the UK Longitudinal Studies Centre (ULSC) at the University of Essex details and course materials are available at http:/www.iser.essex.ac.uk/survey/bhps/courses.These notes have been loosely based on parts of their course. 2The British Household Panel Survey The BHPS began in 1991 and has interviewed its initial sample, and additional household members, every year since then.5,500 households were selected initially, with additional samples of Scotland, Wales and Northern Ireland added since. Currently 17 waves of the survey are available. For a full description of the survey see Taylor, Brice, Buck and Prentice-Lane (2009). Longitudinal datasets such as the BHPS are rarely provided in a format that is straightforward to read into Stata and start working with. The BHPS is available in a number of formats including Stata, but as a series of fi les containing diff erent variables, split by year and diff erent parts of the survey to give manageable fi le sizes. I am using the individual response and household response fi les from the fi rst four waves of the survey. A substantial part of this lecture will be devoted to putting together a panel from these datasets. 2.1Assembling a cross section using individual and household data Start your do-fi le to assemble your dataset by defi ning a macro for the folder in which the original datafi les are stored. This makes it easy to alter the folder referenced if necessary in future. Here I use a global macro: global dir BHPS To recall a global macro we prefi x it with a $ sign so here $dir. We could simply read in the entire fi rst fi le in question (aindresp), but this is a large dataset with many variables. Instead, we can load in just specifi c variables.We need to look at the code- book accompanying the dataset to choose the variables (alternatively look at the online codebooks at http:/www.iser.essex.ac.uk/survey/bhps/documentation/volume-b-codebooks). We read in the specifi c variables by typing: use ahid apno asex aage pid amastat ahgspn aqfachi afiyr afiyrl using $dir/aindresp Note that all variables except pid have the prefi x a. This is a convention in the BHPS data fi les all fi les and variables associated with wave 1 have the prefi x a, for wave 2 it is b and so on. Lets describe the data to see what has been loaded here. 1 . describe Contains data from BHPS/aindresp.dta obs:10,264 vars:10 size:348,976 (99.9% of memory free) - storagedisplayvalue variable nametypeformatlabelvariable label - ahidlong%12.0ghousehold identification number apnobyte%8.0gperson number asexbyte%8.0gasexsex pidlong%12.0gcross-wave person identifier amastatbyte%8.0gamastatmarital status ahgspnbyte%8.0gahgspnpno of spouse/partner aagebyte%8.0gaageage at date of interview aqfachibyte%8.0gaqfachihighest academic qualification afiyrldouble %10.0gafiyrlannual labour income (1.9.90-1.9.91) afiyrdouble %10.0gafiyrannual income (1.9.90-1.9.91) - Sorted by: Three variables here are vital for the construction of our panel dataset. ahid is a household identifi cation number which we will use to match data from the household fi le, and apno is a person identifi cation number within a given household.This can be used in combination with, for example, ahgspan to match couples together. pid is a cross-wave person identifi er it has no a prefi x since it matches the same variable in all waves this connects people over time. We also have data on individuals sex, age, academic qualifi cations, labour income and total income. We are going to merge in data from the household fi le, so we need to sort the individual data by the household identifi cation number and save it. . sort ahid . save aind, replace Then we load data from the household response fi le hhresp.dta. use ahid atenure ahhsize ankids afihhyr using $dir/ahhresp . describe Contains data from BHPS/ahhresp.dta obs:5,511 vars:5 size:104,709 (99.9% of memory free) - storagedisplayvalue variable nametypeformatlabelvariable label - ahidlong%12.0ghousehold identifi
