Demystifying p-values

December 31, 2022

There is a tremendous amount of confusion around what a p-value actually is, despite their widespread use in science. Here is my attempt to explain the concept of p-values concisely and clearly (including why they are useful and what often goes wrong with them). — What's a p-value? — If you run a study, then (all else equal, aside from rare edge cases) the lower the p-value, the lower the chance that your results are due to random chance or luck. More precisely: a p-value is the probab...

Importance Hacking: a major (yet rarely-discussed) problem in science

December 19, 2022

I first published this post on the Clearer Thinking blog on December 19, 2022, and first cross-posted it to this site on January 21, 2023. You have probably heard the phrase "replication crisis." It refers to the grim fact that, in a number of fields of science, when researchers attempt to replicate previously published studies, they fairly often don't get the same results. The magnitude of the problem depends on the field, but in psychology, it seems that something like 40% of studies i...

How can we look at the same dataset and come to wildly different conclusions?

November 30, 2022

Recently, a study came out where 73 research teams independently analyzed the same data, all trying to test the same hypothesis. Seventy-one of the teams came up with numerical results across a total of 1,253 models. Across these 1,253 different ways of looking at the data, about 58% showed no effect, 17% showed a positive effect, and 25% showed a negative effect. But that's not even the oddest part. The oddest part is that despite a heroic attempt to do so, the study authors failed to...

It can be shockingly hard just to understand three variables

April 19, 2021

In science (and when developing hypotheses more generally), it is very common to come across situations where a variable of interest (let’s call this the dependent variable, “Y”) is strongly correlated with at least two other variables (let’s call them “A” and “B”). Here are some examples: If you’re a psychology researcher investigating possible causes of depression (Y), you may have trouble disentangling the effects of poor sleep quality (A) and anxiety (B), both of which tend to be corre...

Disputes Over How to Use Statistics in the Real World

January 21, 2018

There is a surprising lack of consensus on how to do statistics, especially as applies to science. As the tool that underpins the scientific enterprise, you'd think we would have figured it out by now. You'd be wrong. The mathematical proofs are, of course, very rarely disputed. The use of mathematics is much more often disputed. Why do these disputes arise? I've observed five different types. Disputes in Applications of Statistics to Science (1) Disputes over philosophy: Exa...

Rules That Add Up to 100

December 19, 2017

100/0 rule - you should be 100% certain 0% of the time. 80/20 rule - 80% of output is caused by 20% of effort (not literally true, but true in spirit most of the time). 77/23 rule - the poorest 77% of the world’s population accounts for approximately 23% of the world’s income (GDP figures 2010) [1] 75/25 rule - not more than 25 percent of the total unlicensed seaman on board a documented vessel shall be aliens lawfully admitted to the United States for permanent residence [2] 70/30 ...

Testing Too Many Hypotheses

October 10, 2011

For each dataset, there is a limit to what we can use that dataset to test. Using the standard p-value based methods of science, the more hypotheses we check against the data, the more likely it will be that some of these checks give inaccurate conclusions. And this presents a big problem for the way science is practiced. Let's take an example to illustrate the principle. Suppose that you have information about 1000 people selected at random from the U.S. adult population. Your dataset includ...