Wednesday, December 23, 2009

Unhappy Research

If you've been surfing the web lately, chances are you've come across some version of this news story about a research study that shows that New York is the unhappiest state in the country while Louisiana is the happiest. A finding that is, prima facie, ridiculous.

Before you start moving your family from Manhattan to New Orleans it's worth considering what's wrong with the story - which strikes me as being the perfect combination of dubious research wedded to journalistic misinterpretation.

As I understand it, Oswald and Wu basically construct a subjective measure of happiness by state by taking survey results on people's stated level of satisfaction and running a regression predicting these satisfaction levels as a function of a range of individual level attributes (such as income, education, employment category, etc.) plus dummies for each state (except Alabama - the omitted category). And therein lies the misinterpretation: the subjective coefficients they report are not telling us how happy people in each state are, they are telling us what the net effect of the state is after all other individual level factors are controlled for. In other words, the negative coefficient of New York means that a person with exactly the same income, education , employment, etc. would be less satisfied in New York than in Alabama.

Now, this would make sense if individual attributes that contributed to happiness were uncorrelated with state of residence, but this is clearly not the case. If states differ substantially in the average levels of happiness-causing attributes (i.e. if people in New York are likely to have higher levels of education, higher income, etc.) then the coefficients for the state dummies by themselves are not meaningful; in particular, we are likely to see a negative bias in the coefficients of states with high levels of positive attributes. What's more, this bias is going to be considerably amplified if the dependent variable of happiness / satisfaction is right-censored, that is to say if the measure of satisfaction used does not adequately capture differences in satisfaction levels at the higher end of the range (which, btw, is the case with the data used in the study - on a 1 to 4 scale the average score is 3.4).

To see this in (exceedingly) simple terms, imagine that we have only two people from two states - Louisiana (L) and New York (N); that we have only one other explanatory variable - Income (I); and that the satisfaction score for both people, on a 1 to 4 scale, is 4, i.e. they both claim to be 'Very Satisfied'. The regression would then try to solve

4=B1.Il + Bl


4=B1.In + Bn

where B1 is the coefficient for Income, Bl and Bn are the satisfaction coefficients for the states, and Il and In are the income levels of the person in Louisiana and the person in New York. Now, imagine that the person in New York has twice the income of the person in Louisiana. We then have

4=B1.Il + Bl = B1.In + Bn = B1.2Il + Bn

Now, if B1.Il + Bl = B1.2Il + Bn, and assuming B1>0 (more income means greater happiness), this would mean that Bl>Bn, i.e. the satisfaction coefficient of Louisiana is greater than the satisfaction coefficient of New York. Notice that this doesn't really mean anything about living in New York, it's simply an artifact of the fact that satisfaction measures top out at 4 and that New York has twice the income levels of Louisiana.

On the whole then, it's unclear that the coefficients of the state dummies actually mean anything. But even in the best case, all they mean is that moving from New York to Louisiana will increase your satisfaction, provided you can find the identical job and continue to make the same amount of money. Good luck with that.

Finally, let's think for a moment about the researcher's claim that their study shows a surprisingly strong correlation between subjective and objective measures of satisfaction. Again, let's think about what the subjective state coefficient really is. It's the average difference between the satisfaction of a person with a certain level of income (uncorrected for cost of living), education, etc. living in the focal state (New York) vs. a person with the same level of income, education, etc. living in Alabama. Now what might cause a person making the same dollar amount to be less satisfied in New York than in Alabama? Obviously, cost of living. And what is a major component of the 'objective' measure the study uses to rank states? Why, it's cost of living. Is it really surprising then that the two measures turn out to be highly correlated? I don't think so.

What would be interesting, of course, would be to see a version of the study that a) controlled for the location choices of individuals through some kind of simultaneous equation model and b) included income levels adjusted for cost of living in the regression equation to predict satisfaction levels. Then we might actually learn something.

Ironically, this is one instance where a naive application of the satisfaction scores - a simple table of the mean satisfaction scores by state - may actually be more accurate and representative than the subjective coefficients calculated by the authors. I'm not sure how the mean satisfaction score for New York compares to the mean satisfaction score for Louisiana, but I'd be amazed if New York scored lower than Louisiana, let alone if New York was the lowest of all states. Now that would be surprising.


Ana said...

I will need to read the actual study in order to comment properly on the results, but from its presentation and abstract it seems to me that it entails that cost of living is not the only factor to happiness. For example: 9 out of the top 10 states are in the South – is it a mere coincidence or a result of the correlation between exposure to sun and levels of depression?
The only issue I have: they consider it an objective measurement of happiness, like in 100% objective, but if happiness is conditioned by subjective factors such as our ability to deal with stress ….?

Falstaff said...

Ana: It's true that other factors go into the objective measure, climate being one of them. But in a sense, they're all have the same problem as cost of living. The whole point of compensating differentials is that people are paid more to live in places with higher cost of living, less pleasant climate, etc. If you take the average error for a location from a regression on uncorrected income (which is all the subjective coefficient is) and correlate it with the factors that cause incomes to be higher in that location, of course you're going to find a high correlation. All that proves is that the objective measures are valid, in that they do really explain income differences.