Missing data

Theme Co-ordinators: Mike Kenward, James Carpenter, Jonathan Bartlett, Karla Diaz-Ordaz

The problem of missing data is almost ubiquitous in medical research, in both observational studies and randomized trials. Until the advent of sufficiently powerful computers, much of the research in this area was focused on the problem of how to handle, in a practicable way, the lack of balance caused by incompleteness. A example of such a development was the key idea of the EM algorithm (Dempster et al 1976). As routine computation became less of a problem, attention moved to the much more subtle issue of the consequences of missing data on the validity of subsequent analyses. The seminal work was Rubin (1976), from which all subsequent work in this area has developed to a greater or lesser degree.

Although the underlying missing data concepts are the same for observational and randomized studies, the emphases differ somewhat in practice in the two areas. However, both are the subject of development within the Centre. From 2002, supported by several grants from the Economic and Social Research Council, an entire programme has been developed around the handling of missing data in observational studies. This includes the development of multiple imputation in a multilevel setting (e.g. Goldstein et al 2009, Carpenter et al 2010), a series of short courses, and the establishment of a leading website devoted to the topic:


which contains background material, answers to frequently asked questions, course notes, software, details of upcoming courses and events, a bibliography, and a discussion forum.

A central problem in the clinical trial setting is the appropriate handling of dropout and withdrawal in longitudinal studies. This has been the subject of great debate among academics, trialists and regulators for the last 10-15 years. Members of the centre have had long involvement in this (e.g. Diggle and Kenward 1994, Carpenter et al 2002). A textbook was published by Wiley on the broad subject of missing data in clinical studies (Molenberghs and Kenward 2007). More recently the UK NHS National Co-ordinating Centre for Research on Methodology commissioned a monograph on the subject which was published in 2008 (Carpenter and Kenward 2008). Members of the Centre are also actively involved in current regulatory developments. Two important documents have recently appeared. In the US an FDA commissioned National Research Council Panel on Handling Missing Data in Clinical Trials, chaired by Professor Rod Little, produced in 2010 a report, ‘The Prevention and Treatment of Missing Data in Clinical Trials.’ James Carpenter was one of several experts invited to give a presentation to this panel. Implementation of the guidelines in this report is to be discussed at the 5th Annual FDA/DIA Statistics Forum in April 2011, where Mike Kenward is giving the one day pre-meeting tutorial on missing data methodology. In Europe, again in 2010, the CHMP released their ‘Guideline on Missing Data in Confirmatory Clinical Trials’. James Carpenter, Mike Kenward and James Roger were members of the PSI working party that provided a response to the draft of this document (Burzykowski T et al. 2009).

At the School there continues a broad research programme in both the observational study and randomized trials settings, and there is an active continuing programme of workshops. Missing data is an issue for many of the studies run and analysed within the School and there is much cross-fertilization across different research areas. There are also strong methodological links with other themes, especially causal inference, indeed one recent piece of work explicitly connects the two areas (Daniel et al. 2011).


Those most directly involved in missing data research are

Jonathan Bartlett, James Carpenter, Mike Kenward, James Roger (honorary), and two research students: Mel Smuk and George Vamvakis.

Many others have an interest in, and have contributed to, the area, including Rhian Daniel, Bianca de Stavola, George Ploubidis, and Stijn Vansteelandt (honorary).


Burzykowski T et al. (2009). Missing data: Discussion points from the PSI missing data expert group. Pharmaceutical Statistics. DOI: 10.1002/pst.391

Carpenter JR, Goldstein H and Kenward MG (2010). REALCOM-IMPUTE software for multilevel multiple imputation with mixed response types. Journal of Statistical Software, to appear.

Carpenter JR and Kenward MG (2008). Missing data in clinical trials – a practical guide. National Health Service Coordinating Centre for Research Methodology: Birmingham. Downloadable from http://www.haps.bham.ac.uk/publichealth/methodology/docs/invitations/Final_Report_RM04_JH17_mk.pdf.

Carpenter J, Pocock S and Lamm C (2002). Coping with missing values in clinical trials: a model based approach applied to asthma trials Statistics in Medicine, 21, 1043-1066.

Daniel RM, Kenward MG, Cousens S, de Stavola B (2009) Using directed acyclic graphs to guide analysis in missing data problems. Statistical Methods in Medical Research, to appear.

Dempster AP Laird NM and Rubin DB (2007). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1-38.

Diggle PJ and Kenward MG (1994). Informative dropout in longitudinal data analysis (with discussion). Applied Statistics, 43, 49-94.

Goldstein H, Carpenter JR, Kenward MG and Levin K (2009). Multilevel models with multivariate mixed response types. Statistical Modelling, 9, 173-197.

Molenberghs G and Kenward MG (2007). Missing Data in Clinical Studies. Chichester: Wiley.

Rubin DB (1976). Inference and missing data. Biometrika, 63, 581-592.