Predicting Depression

I created simple statistical model (on a sample of people in the U.S.) to help predict how depressed someone is, based on 91 variables about them. I was attempting to predict the severity of the depression by their PHQ9 score, a simple subjective scale that averages scores on 9 common symptoms of depression. For instance, it asks how often you have experienced feeling “down, depressed, or hopeless” and how often you have experienced feeling “tired or having little energy” in the past two weeks.

The results of the predictive model surprised me! 

Before scrolling down, take a moment now to try to guess the top 5 variables that you think would be useful in predicting the severity of a person’s depression! I tried to include a very broad range of demographic and personality variables in the model – testing 91 in total.

Important note: these variables are predictive, not necessarily causal. In each case, we aren’t certain whether a higher incidence of that variable leads to worse depression, whether more severe depression causes a higher incidence of that variable, or if some outside factor happens to cause both depression and that variable. Also note that the variables shown below those that were most predictive of depression when controlling for all the other 90 variables. That means that a one standard deviation increase in each of these variables was associated with a greater increase in PHQ9 scores than any of the other variables (when training a model with all the variables at once). But all but of these variables are all associated with depression in the same direction on there own (when not controlling for other variables) except for one exception that is mentioned.

Variables That Most Strongly Predicted Depression

(1) Introversion (related to the Big 5 personality trait of Extroversion), as measured by the statements: “I see myself as extraverted, enthusiastic” or “I see myself as reserved, quiet”

(2) Feeling Poorly Rested after sleeping their ideal number of hours, as measured by the question: “Typically, how rested do you feel upon waking when you’ve just slept your ideal number of hours?”

(3) Under-sleeping, that is, the amount of time lower than their ideal hours of sleep that they sleep per night, calculated by subtracting actual average hours of sleep from reported ideal hours.

(4) Poor Treatment From Caregivers in childhood, as measured by the question: “Overall, how well were you treated by the people who raised you (while you were growing up)?”

(5) Spirituality, as measured by the question “How spiritual do you consider yourself to be?” Interestingly enough, the correlation of this variable with the PHQ9 depression score was VERY weak (r=0.02) in itself, it became strong only when controlling for the others. This is a weird exception, as all the other variables mentioned here have strong correlations with depression, whether measured on their own, or among controlled variables. 

(6) Low Levels of Conscientiousness, as measured by these two statements: “I see myself as disorganized, careless” or “I see myself as dependable, self-disciplined.”

NOTE: I did not include the Big 5 trait “emotional stability” a.k.a. “neuroticism” in the regression, since it contains aspects of depression already in its questions.

So this suggests that depression is linked to introversion, poor sleep quality, under-sleeping, poor treatment from caregivers growing up, spirituality, and low conscientiousness. Remember though that in each case it could be that it causes depression, is caused by depression, or is caused by something that also causes depression.

The predictive model I used was Lasso regression, with cross-validation, since I expected many of the variables to have almost no relationship to depression (i.e., the results would be sparse). The training set had 696 points, and the test set had 174 points. The training set error was: 54.7% of variance remaining, and the test set (i.e., out of sample error on new data) was 57.5% of variance remaining, so the results appear to be statistically robust (i.e., there appears to be very little overfitting).


  

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *