Weights

Cross–Sectional Weights

Wave 1

In wave 1, we essentially had a complex cross–sectional survey. The initial (or design) weights are derived from the probability of selecting the households into the sample. These household weights are initially adjusted according to information collected about all selected households (both responding and non–responding) and further adjusted so that weighted household estimates from the HILDA Survey match several known household–level benchmarks.

The person–level weights are based on the household–level weights, with adjustments made based on information collected about all the people listed in the responding households. These weights are also adjusted to ensure that the weighted person estimates match several known person–level benchmarks.

More information about the weighting procedure can be found in Watson and Fry (2002). See the section below for a description of the benchmarks as these have been modified after Release 1.

 

Wave 2 Onwards

From wave 2 onwards, the ‘selection’ of the sample is dependent on the wave 1 responding sample and the household and individual attrition after waves 1. The cross-sectional weights for wave 2 onwards opportunistically include temporary members into the sample (i.e., those people who are part of the sample only because they currently live with a continuing sample member). The underlying probability of selection for these households is amended to account for the various pathways from wave 1 into the relevant wave household. Following this, non-response adjustments are made which require within-sample modelling of non-response probabilities and benchmarking to known population estimates at both the household and person level.

The weighting process for wave 2 onwards is detailed in Watson (2004b).29 See the section below for a descriptions of the benchmarks as these have been modified after Release 2.

 

Longitudinal Weights

By comparison, the construction of the longitudinal weights is more straightforward and only includes an adjustment for attrition and benchmarking back to the intial wave characteristics. The longitudinal weights are described in Watson (2004b) but see the following section for a description of the benchmarks used.

We have provided longitudinal weights for the balanced panel of responding persons or enumerated persons from every wave to every other wave and for the balanced panel of any combination of a pair of waves. These weights adjust for attrition from the initial wave and are benchmarked back to the key characteristics of the initial wave. For instance if you were interested in a panel of respondents from waves 2 through 6, the weight provided for this panel would adjust for attrition from the balanced panel from wave 2 to 6 and would ensure key characteristics of the wave 2 population are matched.

 

Benchmarks

The benchmarks used in the weighting process are listed in Table 4.2430. The changes made to the benchmarking process originally documented in Watson (2004b)include:

Note also that the benchmarks exclude people living in non-private dwellings, so people that move into these dwellings after wave 1 are given zero cross-sectional weights.

 

Table 4.24: Benchmarks used in weighting
  Household weights Enumerated person weights Responding person weights
Cross–sectional weights
  • Number of adults by number of children
  • State by part of State

  • Determined jointly with enumerated person weights
  • Sex by broad age
  • State by part of State
  • Labour force status
  • Marital status

  • Determined jointly with household weights
  • Sex by broad age
  • State by part of State
  • State by labour force status
  • Marital status
  • Household composition (number of adults and children)
  •  
    Longitudinal weights Not applicable
  • Sex by broad age
  • State by part of State
  • Labour force status
  • Marital status
  • Household composition (number of adults and children)
  • Sex by broad age
  • State by part of State
  • State by labour force status
  • Marital status
  • Household composition (number of adults and children)
  •  

    Replicate Weights

    Replicate weights have been provided for users to calculate standard errors that take into account the complex sample design of the HILDA Survey. These weights can be used by the SAS GREGWT macro, the STATA ‘svy jackknife’ commands (more detail is provided below in the section on calculating standard errors), or you can write your own routine to use these weights. Weights for 45 replicate groups are provided.

     

    Weights Provided on the Data Files

    Table 4.25 provides a list of the weights provided on the data files together with a description of those weights. The longitudinal weights provided on the enumerated and responding person files are the ones you are most likely to use, though other longitudinal weights are provided on the Longitudinal Weights File.

    Irrespective of the modifications made in how the weights are constructed, some changes are expected to the weights with each new release. There are three reasons for this. Firstly, corrections may be made to age and sex variables when these are confirmed with individuals in subsequent wave interviews. Secondly, the benchmarks are updated from time to time. Thirdly, duplicate or excluded people in the sample may be identified after the release (very occasionally).

     

    Table 4.25: Weights
    File Weights Description
    Household File _hhwth The household weight is the cross–section population weight for all households responding in the relevant wave. Note the sum of these household weights for wave 1 is approximately 7.4 million.
      _hhwths This is the cross–section household population weight rescaled to the sum of the sample size for the relevant wave (i.e. 7682 responding households in wave 1). Use this weight when the statistical package requires the weights to sum to the sample size.
      _hhwte01 to _hhwte16 The enumerated person weights are provided on both the household file and the enumerated person file. See description below.
      _rwh1 to _rwh45 Cross–section household population replicate weights.
    Enumerated Person File _hhwte The enumerated person weight is the cross–section population weight for all people who are usual residents of the responding households in the relevant wave (this includes children, non–respondents and respondents). The sum of these enumerated person weights for wave 1 is 19.0 million.
      _hhwtes This is the cross-section enumerated person population weight rescaled to the sum of the sample size for the relevant wave (i.e. for wave 1, 19,914 enumerated persons). Use this weight when the statistical package requires the weights to sum to the sample size.
      _lnwte

    This longitudinal enumerated person weight is the longitudinal population weight for all people who were enumerated (i.e. in responding households) each wave from wave 1 to the wave where this variable resides. This weight applies to the following people in responding households: children, non-respondents, intermittent respondents, and full respondents.

    blnwte is for the balanced panel of enumerated persons from wave 1 to 2;
    clnwte is for the balanced panel from wave 1 to 3;
    dlnwte is for the balanced panel from wave 1 to 4, etc.

    These variables are also on the Longitudinal Weights File, but are named differently: wlea_b; wlea_c; wlea_d, etc.

      _rwe1 to _rwe45 Cross–section enumerated person population replicate weights.
      _rwlne1 to _rwlne45 Longitudinal enumerated person population replicate weights.
    Responding Person File _hhwtrp The responding person weight is the cross–section population weight for all people who responded in the relevant wave (i.e. they provided a personal interview). The sum of these responding person weights for wave 1 is 15.0 million.
      _hhwtrps This is the cross–section responding person population weight rescaled to sum to the number of responding persons in the relevant wave (i.e. 13,969 in wave 1). Use this weight when the statistical package requires the sum of the weights to be the sample size.
      _lnwtrp

    This longitudinal responding person weight is the longitudinal population weight for all people responding (i.e. provided an interview) each wave from wave 1 to the wave where this variable resides.

    blnwtrp is for the balanced panel of respondents from wave 1 to 2;
    clnwtrp is for the balanced panel from wave 1 to 3;
    dlnwtrp is for the balanced panel from wave 1 to 4, etc.

    These variables are also on the Longitudinal Weights File, but are named differently: wlra_b; wlra_c; wlra_d, etc.

      _rwrp1 to _rwrp45 Cross–sectional responding person population replicate weights.
      _rwlnr1 to _rwlnr45 Longitudinal responding person population replicate weights.
    Longitudinal Weights File wlet1_tn Longitudinal enumerated person weight for the balanced panel of all people who were enumerated (i.e. part of a responding household) each wave from wave t1 to tn. Wave letters are used in place to t1 and tn. For example, wlec_f is the longitudinal enumerated person weight for the balanced panel from wave 3 to 6.
      wlet1tn Longitudinal enumerated person weight for the balanced panel of all people who were enumerated (i.e. part of a responding household) in wave t1 and tn. Wave letters are used in place of t1 and tn. The paired longitudinal weights do not restrict individuals in any way based on their response status in waves between t1 and tn. For example, wlecf is the longitunal enumerated person weight for the balanced panel of enumerated people in wave 3 and 6 (they may or may not have been enumerated in other waves).
      wlrt1_tn Longitudinal responding person weight for the balanced panel of all people who were interviewed each wave from wave t1 to tn. Wave letters are used in place to t1 and tn. For example, wlrc_f is the longitudinal responding person weight for the balanced panel of respondents from wave 3 to 6.
      wlrt1tn Longitudinal responding person weight for the balanced panel of all people who were interviewed in wave t1 and tn. Wave letters are used in place of t1 and tn. The paired longitudinal weights do not restrict individuals in any way based on their response status in waves between t1 and tn. For example, wlrcf is the longitudinal responding person weight for the balanced panel of respondents in wave 3 and 6 (they may or may not have been responding in other waves).
    Longitudinal Replicate Weights File1 wlet1_tn1 to wlet1_tn45 Longitudinal enumerated person replicate weights for the balanced panel from t1 to tn.
      wlet1tn1 to wlet1tn45 Longitudinal enumerated person replicate weights for the balanced panel for t1 and tn.
      wlrt1_tn1 to wlrt1_tn45 Longitudinal responding person replicate weights for the balanced panel from t1 to tn.
      wlrt1tn1 to wlrt1tn45 Longitudinal responding person replicate weights for the balanced panel for t1 and tn.
    1 The Longitudinal Replicate Weights File is available on request. Please email us.

     

    Advice on Using Weights

    Which Weight to Use

    For some users, the array of weights on the dataset may seem confusing. This section provides examples of when it would be appropriate to use the different types of weights.

    If you want to make inferences about the Australian population from frequencies or cross–tabulations of the HILDA sample then you will need to use weights. If you are only using information collected during the wave 4 interviews (either at the household level or person level) then you would use the wave 4 cross–section weights. Similarly, if you are only using wave 3 information, then you would use the wave 3 cross–section weights, and so on. If you want to infer how people have changed across the five years between waves 1 and 6, then you would use the longitudinal weights for the balanced panel from waves 1 to 6.

    The following five examples show how the various weights may be used to answer questions about the population:

    When constructing regression models, the researcher needs to be aware of the sample design and non–response issues underlying the data and will need to take account of this in some way.

     

    Calculating Standard Errors

    The HILDA Survey has a complex survey design that needs to be taken into account when calculating standard errors. It is:

    Some options available for the calculation of appropriate standard errors and confidence intervals include:

    A User Guide for calculating the standard errors in HILDA is provided as part of our technical paper series, see Hayes (2008). Example code is provided in SAS, SPSS and STATA.

    To assist you in the calculation of appropriate standard errors, the wave 1 area (cluster), and proxy stratification variables have been included on the master file. These are listed in Table 4.26 and need to be specified for the standard error calculations Taylor Series approximation method as suggested above. Any new entrants to the household are assigned to the same sample design information as the permanent sample member they join. As of Release 6 the proxy stratification variable (ahhstrat) has replaced major statistical region (ahhmsr) on the master file as the variable to be used in the Taylor Series approximation method. The new stratification variable is essentially a collapsed area unit variable that approximates the effect of both the systematic selection and stratification of the survey selection better than only using the variable for the major statistical region.

     

    Table 4.26: Sample design variables
    Variable Description Design element
    ahhraid DV: randomised area id Cluster
    ahhstrat DV: Wave 1 Strata Proxy stratification

     

    Also, a few users may be interested in the sample design weight in wave 1 before any benchmark or non-response adjustments have been made. This is available on the household file as ahhwtdsn.


    Endnotes:

    29 While this paper is written in relation to the wave 2 weighting, the process in later waves follows the same methodology. Back to where you were
    30 We thank the Demography Section and the Labour Force Estimates team from the Australian Bureau of Statistics for the provision of the benchmarks used in the weighting process. Back to where you were
    31 For example, the number of people living in a household with two people can be derived by two methods. Firstly, this can be calculated from the household file by estimating the number of two person households and multiplying by two. Secondly, it can be estimated from the enumerated file by summing the weights of people living in two person households. Back to where you were
    32 An occupation benchmark was included from Release 4 to 6, but this was later removed following concerns about the occupation coding as outlined by Watson and Summerfield (2009). Back to where you were
    33 Due to updates to the household propensities used by the ABS to create the household benchmarks, the total number of households based on the 2006 Census is quite different from that based on the 2001 Census. For example, the number of households in Australia in September 2001 based on the 2001 Census was 7.43 million, whereas the corresponding number based on the 2006 Census was 7.32 million. In order to minimise the impact on our estimates caused by changes to the benchmarks, an incremental combination of the two sets of household benchmarks was taken. Back to where you were
    34 This stemmed from a change in the benchmarks available from the ABS to align with the remoteness area classification rather than a ‘sparsely settled’ definition. Back to where you were