By: Yiwen Zhu


We’ve all heard the motto: “correlation does not imply causation”. In fact, this expression is so commonly used by scientists that it even appears on bumper stickers and in internet memes.

But does the nature of epidemiological research prevent us from ever interpreting our results causally? Epidemiologists and other researchers often avoid using causal language (e.g., cause, effect, benefit) in their papers.  This is because the observational studies epidemiologists perform often lack one key ingredient: random assignment of exposure. Without randomization, it’s difficult to know whether the exposed and unexposed groups differ in both measured (and unmeasured!) characteristics that are also associated with the outcome of interest.

In the May 2018 issue of the American Journal of Public Health, researchers from different disciplines commented on the current state of burying causality in “scientific euphemisms” and discussed how we should use causal language responsibly.

Inspired by the commentaries in this issue, our research lab discussed whether and how we make causal inferences.  Here are our four main take-aways:    

1.     When we set out to ask a causal question, it doesn’t make sense to stop at the associations.

Rather than stopping at the associational conclusions, let’s own the causal question and do better at formulating it carefully. Even in an observational study, what we are truly interested in is the causal relationship. Acknowledging that the causal relationship is more difficult to establish in an observational study, we should define the exposure and outcome unambiguously and make sure we get the temporality right, so that the exposure precedes the outcome. As Hernán mentioned (Hernán, 2018), by emulating a hypothetical randomized controlled trial, we design our studies and obtain data more thoughtfully.

2.     Causal inference tools are not merely mathematical tricks; they are one of the best ways to understand relationships between variables and formalize your conceptual model.

 Modified figure from Murray A. Mittleman and Elizabeth Mostofsky, Department of Epidemiology, Harvard School of Public Health, Boston, MA

Modified figure from Murray A. Mittleman and Elizabeth Mostofsky, Department of Epidemiology, Harvard School of Public Health, Boston, MA

You may be thinking, “Causal inference sounds wonderful. But the terminologies and mathematical tests and models make me feel confused.” You’re not alone. But some tools are arguably friendlier than others. For example, the Directed Acyclic Graphs (DAGs), like the one shown here, are an intuitive way to systematically represent causal relationships.

By drawing directed arrows between the variables, you can visualize the pathway of interest directly and examine whether there may be confounding bias.

Compared to the DAGs, the mathematical notations for causal inference seem daunting at first.  However, they allow us to talk about counterfactuals (i.e., “what would have happened had the group been exposed?”) and define statistical models precisely.  So learn to love both.    



3.     Causal inference methods are only as good as your understanding of the concepts of interest. 

While the causal inference methods are helpful, it’s tempting to get over-confident about the statistical techniques and believe you can throw data in a cauldron of structural models and some valid causal claims will magically appear. Just as Jones and Schooling discussed, the theory behind our hypotheses and conclusions is key; you need to have a thorough understanding of the underlying mechanisms before setting out to study the causal relationship. After all, statistical analysis does not establish causation by itself; only through bringing the subject matter knowledge together with our methods can we carefully connect the causal and statistical questions.

4.     Communicate your science, enthusiastically and cautiously.

Even when associational studies get featured in media, they tend to be repackaged into a causal tale. This is not surprising — many readers find the causal tales much more meaningful and actionable. An epidemiological study that explicitly makes causal claims will probably be more susceptible to distortions and misinterpretations.

To prevent our science from being turned into “clickbait” headlines, we need to take a more proactive stance and communicate our findings to the public with accessible language. For example, when reporting the results of a study with wide confidence intervals, mention that your findings point to a significant effect, though the exact magnitude of that effect is uncertain.

We also need to discuss the implications of our work, while highlighting the assumptions and limitations.  For example, “This study tells us about the potential effect of financial stress on depressive symptoms in adolescence, but it doesn’t say that poverty leads to depression.”

Academic journals can play a big role too in effective science communication. Some journals now require authors to include a few main takeaways of the study without any undefined academic jargon and in a form ready to be consumed by the general public. We look forward to seeing all journals help make science accessible to our broader community.

So, in conclusion, let’s not be afraid to talk about causality.

Causal inference requires having a strong theory behind the hypotheses we test, checking the assumptions carefully, and choosing the appropriate statistical models. By talking about causality openly and honestly, we are holding our research to a higher standard and making it more relevant to the public. Even when we don’t feel comfortable drawing a causal conclusion, the process will bring us one step closer to knowing the answer.