Panel data - 2 - simple example of pooled cross section across years


I will try to explore in a millennial way about Panel data, and believe me, statistics and mathematically combined into econometric is not as scary as it sounds before we are going to prepare your data in the panel format. It is worthed seeing the chapter from Wooldridge's book. Don't be scared! It's not as scary as it is. 


Many surveys of individuals, families, and firms are repeated at regular intervals, often each year. An example is the Current Population Survey (or CPS), which randomly samples households each year. If a random sample is drawn at each time period, pooling the resulting random samples gives us an independently pooled cross-section.

One reason for using independently pooled cross-sections is to increase the sample size. By pooling random samples drawn from the same population, we can get more precise estimators and test statistics with more power at different points in time. Pooling is helpful only in so far as the relationship between the dependent variable and at least some of the independent variables remain constant over time.

Using pooled cross-sections raises only minor statistical complications. Typically, to reflect that the population may have different distributions in different time periods, we allow the intercept to differ across periods, usually years. This is easily accomplished by including dummy variables for all but one year, where the earliest year in the sample is usually chosen as the base year. It is also possible that the error variance changes over time, something we discuss later.

Sometimes, the pattern of coefficients on the year dummy variables is itself of interest.

Example of the simple cross-section with the time that resembles panel data 

First, Stata offers a great resource when you want to practice the code. You can find the source here

See also: And let say we want to use some of the data from the book of Wooldridge, In chapter 10,

The fertility topic from Wooldridge or this site in chapter 13

Try this
Simple panel data regression with one subject

 For example, a demographer may be interested in the following question: After controlling for education, has fertility pattern among women over age 35 changed between 1972 and 1984? The following example illustrates how this question is answered using multiple regression analysis with year dummy variables.

The data set in the link below, which is similar to Sander (1992), comes from the National Opinion Research Center’s General Social Survey for the even years from 1972 to 1984, inclusively. We use these data to estimate a model explaining the total number of kids born to a woman (kids).

reg kids educ age agesq black east northcen west farm othrural town smcity y74 y76 y78 y80 y82 y84

One question of interest is:

After controlling for other observable factors, what has happened to fertility rates over time? The factors we control for are years of education, age, race, region of the country were living at age 16, and living environment at age 16.

The base year is 1972. The coefficients on the year dummy variables show a sharp drop in fertility in the early 1980s. For example, the coefficient on y82 implies that holding education, age, and other factors fixed; a woman had on average .52 fewer children, or about one-half a child, in 1982 than in 1972. This is a substantial drop: holding Educ, age, and the other factors fixed, 100 women in 1982 are predicted to have about 52 fewer children than 100 comparable women in 1972.

Since we control education, this drop is separate from the decline in fertility due to the increase in average education levels. (The average years of education are 12.2 for 1972 and 13.3 for 1984.) The coefficients on y82 and y84represent drops in infertility for reasons not captured in the explanatory variables. Given that the 1982 and 1984 year dummies are individually quite significant, it is not surprising that as a group, the year dummies are jointly very significant: the R-squared for the regression without the year dummies is .1019, and this leads to F6,1111 5 5.87 and p-value < 0.

  • Women with more education have fewer children, and the estimate is very statistical
  • Women with more education have fewer children, and the estimate is very statistically significant.
  • Other things being equal, 100 women with a college education will have about 

51 fewer children on average than 100 women with only a high school education: .128(4) 5 .512. Age has a diminishing effect on fertility. (The turning point in the quadratic is at about age 5 46, by which most women have finished having children.)

The model estimated assumes that each explanatory variable's effect, particularly education, has remained constant. This may or may not be true; you will be asked to explore this issue in Computer Exercise C1. Finally, there may be heteroskedasticity in the error term underlying the estimated equation. 

There is one interesting difference here: now, the error variance may change over time even if it does not change with the values of Educ, age, black, etc.

The heteroskedasticity-robust standard errors and test statistics are nevertheless valid. The Breusch-Pagan test would be obtained by regressing the squared OLS residuals on all of the independent variables in the table below, including the year dummies. (For the special case of the White statistic, the fitted values in kid and the squared fitted values are used as the independent variables, as always.) A weighted least-squares procedure should account for variances that possibly change over time. In the procedure discussed in Section 8.4, year dummies would be included in equation (8.32).

      Source |       SS       df       MS              Number of obs =    1129
-------------+------------------------------           F( 17,  1111) =    9.72
       Model |  399.610888    17  23.5065228           Prob > F      =  0.0000
    Residual |  2685.89841  1111  2.41755033           R-squared     =  0.1295
-------------+------------------------------           Adj R-squared =  0.1162
       Total |   3085.5093  1128  2.73538059           Root MSE      =  1.5548

        kids |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        educ |  -.1284268   .0183486    -7.00   0.000    -.1644286    -.092425
         age |   .5321346   .1383863     3.85   0.000     .2606065    .8036626
       agesq |   -.005804   .0015643    -3.71   0.000    -.0088733   -.0027347
       black |   1.075658   .1735356     6.20   0.000     .7351631    1.416152
        east |    .217324   .1327878     1.64   0.102    -.0432192    .4778672
    northcen |    .363114   .1208969     3.00   0.003      .125902    .6003261
        west |   .1976032   .1669134     1.18   0.237    -.1298978    .5251041
        farm |  -.0525575     .14719    -0.36   0.721    -.3413592    .2362443
    othrural |  -.1628537    .175442    -0.93   0.353    -.5070887    .1813814
        town |   .0843532    .124531     0.68   0.498    -.1599893    .3286957
      smcity |   .2118791    .160296     1.32   0.187    -.1026379    .5263961
         y74 |   .2681825    .172716     1.55   0.121    -.0707039    .6070689
         y76 |  -.0973795   .1790456    -0.54   0.587     -.448685    .2539261
         y78 |  -.0686665   .1816837    -0.38   0.706    -.4251483    .2878154
         y80 |  -.0713053   .1827707    -0.39   0.697      -.42992    .2873093
         y82 |  -.5224842   .1724361    -3.03   0.003    -.8608214    -.184147
         y84 |  -.5451661   .1745162    -3.12   0.002    -.8875846   -.2027477
       _cons |  -7.742457   3.051767    -2.54   0.011    -13.73033   -1.754579
test y74 y76 y78 y80 y82 y84

 ( 1)  y74 = 0.0
 ( 2)  y76 = 0.0
 ( 3)  y78 = 0.0
 ( 4)  y80 = 0.0
 ( 5)  y82 = 0.0
 ( 6)  y84 = 0.0

       F(  6,  1111) =    5.87
            Prob > F =    0.0000

reg kids educ age agesq black east northcen west farm othrural town smcity y74 y76 y78 y80 y82 y84

Thanks for keep following!

Feel got helped, support the blog by buying me a coffee 

Current rating: 3.8



22nd Jul- 2020, by: Editor in Chief
524 Shares 4 Comments
Generic placeholder image
20 Oct- 2019, by: Editor in Chief
524 Shares 4 Comments
Generic placeholder image
20Aug- 2019, by: Editor in Chief
524 Shares 4 Comments
10Aug- 2019, by: Editor in Chief
424 Shares 4 Comments
Generic placeholder image
10Aug- 2015, by: Editor in Chief
424 Shares 4 Comments

More News  »

Harga BBNI meluncur tajam di bulan April, apakah kita perlu khawatir?

Recent news

Harga BBNI meluncur tajam di bulan April 2024, apakah kita perlu khawatir! Tentu tidak! Karena dari analisa nya sebenarnya sudah sangat aman!

read more
3 weeks, 5 days ago

BBRI di Bulan April kok turun tajam? Apakah kita perlu khawatir?

Recent news

Apakah kita perlu khawatir dengan harga saham yang meluncur tajam? 

read more
3 weeks, 6 days ago

Template that you need to know if you want to be pro in after effect

Recent news
1 month ago

What does the Fed do in 2008

Recent news
1 month, 2 weeks ago

What does the Fed do in 2008

Recent news

Today, one of the popular topic related to financial policy is the question on

read more
1 month, 2 weeks ago

What is Lifetime Value of customer

Recent news

Have you ever heard about LTV? well if you talk about Macroprudential policy, it will be loan to value. But if you talk about startups and the world of tech, it refers to the Lifetime value of a company. 

read more
2 months, 2 weeks ago

Mengenal lebih dalam kurikulum merdeka

Recent news

Akhirnya Indonesia menerapkan kurikulum merdeka, namun sebenarnya apa sih itu kurikulum merdeka? 

read more
2 months, 3 weeks ago

How to understand the impact of interactive variable from interaction model to depended variable

Recent news

I tried from my own research. And here it is

read more
3 months ago

More News »

Generic placeholder image

Collaboratively administrate empowered markets via plug-and-play networks. Dynamically procrastinate B2C users after installed base benefits. Dramatically visualize customer directed convergence without