Design and analysis of observational studies

Theme Co-ordinators: Neil Pearce, Simon Cousens

Appropriate study design forms the core of the body of epidemiological methods that has developed over the last 150 years, and particularly in the last 50 years since the advent of ‘modern’ epidemiological methods for investigating the causes of non-communicable disease.  This theme is not intended to facilitate the invention on new study designs (we recognize that this occurs only rarely). Rather the focus is on “applied study design and analysis”.

There has been an evolving theoretical and practical understanding of the fundamental links between the different epidemiological study designs, so that they are now seen as different manifestations of a single underlying paradigm, rather than being fundamentally different approaches. However, there is still continuing debate about the relative merits of cohort and case-control studies, and whether or they are fundamentally different designs, or can be viewed within an integrated theoretical framework.  A related issue is the sampling of controls for case-control studies. In particular, if controls are sampled using density matching (rather than the alternative approaches of cumulative sampling or case-cohort sampling), then the resulting case-control study will estimate the (cohort) incidence rate ratio without the need for any rare disease assumption. Other topics of methodological interest for case-control studies include selection bias, and matching and its effects on precision.

The issues are more complex with regards to cross-sectional studies, where there are considerably more assumptions required to draw valid conclusions. This complexity is exacerbated because standard epidemiological textbooks use a variety of study design classifications, with some defining cross-sectional studies as studies in which both exposure and disease are measured at one point in time, whereas others identify the measurement of the prevalence (rather than incidence) of disease as being the key feature of cross-sectional studies, which may in fact include historical exposure measurements (e.g. based on a work history and job-exposure matrix). This confusion is amplified because some textbooks describe cross-sectional studies as being akin to case-control studies, whereas others regard them as a separate species (involving prevalence rather than incidence) which can involve either studying the full source population (prevalence studies) or the cases and a sample of the non-cases in the source population (prevalence case-control studies).

All of these issues and debates are occurring within the context of a randomized controlled trial (RCT) paradigm, in which the aim of an observational study is to obtain the same findings (ideally) as would have been obtained with an RCT of a specific exposure and a specific disease outcome. This paradigm works well for many “lifestyle” risk factors for disease (e.g. smoking and lung cancer), but does not work so well for some of the more complex public health problems of the 21st century (e.g. health effects of climate change, socioeconomic factors and health, estimating the global burden of disease, evaluation of large-scale programmes and initiatives aimed at improvement of health in low and middle income countries).

In some instances these problems can be addressed with adaptations of existing study designs; in other cases, they require the development of new approaches. Thus more complex causal models (e.g. complexity theory), which do not neatly fit an RCT model, are beginning to be employed in epidemiology, and these are motivating a reconsideration of which study designs belong to the basic epidemiological toolkit. It is important that we take a problem-based approach and develop appropriate study designs to tackle the major public health problems of the 21st century, rather than taking a methods-based approach and regarding epidemiology as being limited to a small number of study designs (cohort, case-control, cross-sectional), based on the RCT paradigm. In this context it is important to recognize that the appropriateness of any research methodology depends on the phenomenon under study: its magnitude, the setting, the current state of theory and knowledge, the availability of valid measurement tools, and the proposed uses of the information to be gathered.  It is therefore inappropriate to have a rigid hierarchy of study designs; rather what is required is to develop and use “appropriate technology” to address the major public health issues of the 21st century. In particular, ecologic studies, despite all their limitations, will continue to play an important role in the cycle of epidemiological hypothesis development and testing.

These developments are closely linked to the current debate in causal inference, and in particular with regards to the use of Directed Acyclic Graphs (DAGs) to identify sources of bias, and the implementation of analytical methods such as structural equation and marginal structural modelling within a likelihood or Bayesian framework, as well as other, semi-parametric, modelling approaches.

Design of observational studies at LSHTM


Most epidemiologists and many statisticians at the School are engaged in the design of observational studies. In the interest of space, we include here only the names of those with a particular interest in methodological issues relating to the topics discuss above.

Simon Cousens
Neil Pearce
Bianca De Stavola

Suggested introductory reading

Rothman, K.J., Greenland, S., Lash, T.L. Modern epidemiology. Philadelphia: Lippincott Williams & Wilkins, 2008.

Pearce N. A short introduction to epidemiology (.pdf, 1.5MB): Wellington, CPHR, 2005.

Further references

Rodrigues L, Kirkwood BR. Case-control designs in the study of common diseases:  updates on the demise of the rare disease assumption and the choice of sampling scheme for controls. Int J Epidemiol 1990; 19: 2-5-213.

Pearce NE. What does the odds ratio estimate in a case-control study? Int J Epidemiol 1993; 22: 1189-1192.