# Time series regression analysis

Theme Co-ordinators: Antonio Gasparrini, Ben Armstrong

Please see here for slides and audio recordings of previous seminars relating to this theme.

This page is split into the following sections:

- Time series analysis for biomedical data
- Methodological issues
- Contributions of LSHTM researchers
- LSHTM people involved in developing or using time series regression methodology
- Publications by LSHTM researchers
- Key references on methods

## 1. Time series analysis for biomedical data

A time series may be defined as a sequence of measurements taken at (usually equally-spaced) ordered points in time.

Statistical methods applied to time series data were originally developed mainly in econometrics, and then used in many other fields, such as ecology, physics and engineering. In the original application the focus was in *prediction*, and the aim was to produce an accurate forecast of future measurements given an observed series. The standard statistical approaches adopted for this purpose usually rely on *auto-regressive moving average* (ARIMA) and related models.

Time series designs are increasingly being exploited in biomedical data, due to the availability of routinely-collected series of administrative or medical data, such as mortality or morbidity counts, environmental measures, changes in socio-economic or demographic indices. Within this research area, time series methods have been subject to intense methodological developments over the last 20 years. In contrast with the original interest on prediction, the main aims of time series analysis in biomedical applications is commonly to assess the association between an outcome and either a predictor series or an intervention: here the focus is instead in *estimation*, and the models reduce to the more traditional regression framework although possibly non-standard versions.

Two main features characterize time series data from a statistical viewpoint: the *correlation* displayed by observations and their *temporal sequence*. Statistical models need to cope with the former, in order to provide accurate inferences, and may exploit the latter, with the intention to strengthen the evidence on the causal nature or clarify details of the association under study.

## 2. Applications of time series regression

### 2.1 Time series regression of short-term associations

A topic of intense methodological research and applications of time series analysis is the study of short-term health associations. In particular, time series methods have been hugely applied in environmental epidemiology during the last decades to investigate the acute health effects of air pollution, and more recently outdoor temperature and other weather parameters. This approach exploits well-known *decomposition* techniques of time series data, which filter out long-term and seasonality trends in the analysis of short-term dependencies between time-varying environmental factors and health outcomes. This method controls by design for time-fixed factors or other confounders that change slowly in time.

Time series studies of short-term associations compare the outcome and exposure series, such as in the example below illustrating the daily variation in mortality counts and outdoor temperature in a 14 years period in New York. The main methodological issues in this approach are selection of smoothing methods for the decomposition of the series, the presence and estimation of delayed effects and the potential confounding by other time-varying factors.

### 2.2 Interrupted time series for evaluating interventions or events

The importance of robust evaluation of public health interventions is increasingly recognised, yet public health interventions are often complex and evaluation methods used in clinical medicine (such as randomised controlled trials) are not always feasible. Other ‘quasi-experimental’ designs are therefore needed in order to explore the effect of an intervention on health outcomes, one of the strongest of which is the interrupted time series (ITS) design. ITS requires a series of observations taken repeatedly over time before and after an intervention. The underlying trend in the outcome is established and can be used to estimate the counterfactual, that is, what would have happened if the intervention had not taken place. The impact of the intervention is then assessed by examining any change in the post-intervention period given the trend in the pre-intervention period. The intervention may lead to a change in level, a change in slope or both. This framework is illustrated in the figure below.

Interrupted time series can be used to explore the impact of public health interventions or unplanned events. In the example illustrated in the figure below, an ITS design is adopted to assess the impact of the financial crisis in Spain on suicides. The main methodological issues surrounding the ITS design are represented by assumptions and methods to model trends and control for potential time-varying confounders of the before-after comparison.

Trend in monthly suicide rates for all of Spain before and since the financial crisis (Lopez Bernal et al 2013)

## 3. Methodological issues

The regression analysis of time series biomedical data poses several methodological problems, which result in an intense research carried out in the last few years. The main research directions are summarized below. References are provided in the related sections.

__Model selection__: time series model are usually built with a pre-defined set of potential confounders. However, some criteria are needed to select other model parameters, such as the degree of control for seasonal and long time trends, or the adequacy of assumptions on the shape of the exposure-response relationship of predictors showing potential non-linear effects. Some investigators have tested the comparative performance of selection criteria based on information criteria (Akaike, Bayesian or related), minimization of partial autocorrelation of residuals, (generalized) cross-validation and others. Further research is needed to produce robust and general selection criteria.

__Smoothing methods__: the specification of non-linear exposure-response relationship for predictors in the regression model is essential both to determine the association with the exposure of interest and to control for potential confounders. Smoothing techniques based on both parametric and non-parametric methods have been proposed in time series analysis. The former usually rely on regression splines within generalized linear models (GLM), while the latter are specified through smoothing or penalized splines within generalized additive models (GAM).

__Distributed lag (non-linear) models__: commonly the effect of an exposure is not limited to the day it occurs, but persists for further days or weeks. This introduces the additional problem of modelling the lag structure of the exposure-response relationship. This issue has been initially addressed by*distributed lag models*, which allows the linear effect of a single exposure event to be distributed over a specific period of time. More recently, this methodology has been generalized to non-linear exposure-response relationships through*distributed lag non-linear models*, a modelling framework which can flexibly describe simultaneously non-linear and delayed associations.

__Harvesting effect__(mortality displacement): this phenomenon arises when applying an ecological time series analysis to grouped data, for example mortality counts. The conceptual framework is based on the assumption that the exposure can affects mainly a pool of frail individuals, whose events are only brought forward by a brief period of time by the effect of exposure. For non-recurrent outcomes, the depletion of the pool following a high exposure event results in some reduction of cases few days later, thereby reducing the overall long-term impact (see figure below). Specific models are needed to account for this reduction in the overall effect and thereby produce accurate estimates.

__Two-stage analysis__: the usual approach to time series studies on environmental factors involves the analysis of series from multiple cities or regions. The complexity of the regression models prevents the specification of a very highly parameterized hierarchical structure in a single multilevel development. The analysis is instead carried out through a two-stage step, with a common city-specific model and then a meta-analysis to pool the results. The specification of complex exposure-response relationships in the first stage requires the development of non-standard meta-analytic techniques, such as*meta-smoothing*and*multivariate meta-analysis*.

__Time-varying confounders:__Whilst interrupted time series designs are rarely affected by normal confounders, such as differences in socio-economic status or age composition, which typically only change relatively slowly over time, they may be affected by time varying confounders. This is particularly an issue if the confounders are unmeasured and change over the same period as the intervention, for example other concurrent events or policies. Design adaptations may be introduced to address this limitation such as the introduction of a control series, multiple baseline designs (where the intervention is introduced in different locations at different times) and multiple phases (where the intervention is first introduced then removed to test whether the effect is reversed).

## 4. Contributions of LSHTM researchers

### 4.1 Methodological innovations:

Statisticians at the LSHTM have made contributions to time series regression methodology either in in explicitly methodological papers or in innovations published in reports of substantive epidemiological studies. A tutorial paper by several LSHTM researchers provides an overview of methods(Bhaskaran et al. 2013). Another paper summarizes several issues as potential candidates for methodological work, focusing in particular on temperature-health associations (Gasparrini and Armstrong 2010).

*Distributed lag non-linear models. *Several published methodological articles have proposed a new more flexible way to model lagged relationships, through the framework of distributed lag non-linear models (Armstrong 2006; Gasparrini et al. 2010), implemented in the R package dlnm (Gasparrini 2011) – see figure. Later work presents methods for estimating attributable number of deaths from distributed lag non-linear models (Gasparrini and Leone 2014; Gasparrini et al. 2015).

*Two stage analyses. *Other methodological efforts have explored ways to pool and explore heterogeneity in estimates of non-linear exposure-response relationships in two-stage analyses. The methods are based on multivariate meta-analytical techniques applied to estimates of multi-parameter associations from first-stage models, and implemented in the R package mvmeta (Gasparrini et al. 2011, 2012a) – see figure.

*Modifiers of exposure-response associations. *The modification of associations by area characteristics has been investigated as meta-regression using the two-stage approach discussed above (Tobías et al. 2014). Modification by individual characteristics (age, SES) varying within areas has been explored using interaction terms in simple models (Hajat et al. 2007). We have also proposed a version of the “case-only” approach designed originally for studying gene-environment interactions in time series context, to study how effects of time-varying risk factors (e.g. weather) might be modified by time-fixed factors, such as age or socio-economic status (Armstrong 2003).

*Heat waves. *Two other papers have explored models to allow estimation of the extent to which the excess deaths associated with heat waves can be explained by a continuous association between temperature and mortality, or whether rather an additional “wave effect” due to sustained heat is necessary (Gasparrini and Armstrong 2011; Hajat et al. 2006). Other work has developed ways to compare how performance of heat-health warning systems depend on heat wave definitions (Gasparrini et al. 2010).*Mortality displacement (harvesting). *Approaches have also been developed and applied to identify extent of short term “harvesting” (see above) (Hajat et al. 2005; Rehill et al. 2015).

*Interrupted time series (ITS).* A tutorial paper introduces the ARIMA and segmented regression approaches to ITS (Lagarde 2011). Other applied research has introduced methods innovations. A paper has evaluated the influence of alternative modelling assumptions on the estimate of the association between the introduction of state-wide smoking bans and the incidence of acute myocardial infarction (Gasparrini et al. 2009). Other papers have pioneered the use of conditional Poisson models for multiple ITS designs (Grundy et al. 2009) and methods for controlled ITS (Milojevic et al. 2012).

*Other recent and ongoing methodological work**.* A recent paper introduced the use of conditional Poisson models for the case-crossover and related formulations of time series (Armstrong et al. 2014). Methodological work continues, focused in particular on extending ways of characterising variation in distributed lag non-linear models across cities or sub-populations, and on adapting time series regression methods for infectious diseases (Imai et al. 2014).

### 4.2 Applied research:

The substantive research using time series regression methods carried out at the LSHTM or in which LSHTM researcher have collaborated has concerned mainly the associations between daily occurrences of health outcomes (such as deaths) and time-varying environmental factors. Earliest examples (Gouveia and Fletcher 2000) concerned associations of daily air pollution on mortality, and this interest continues (Milojevic et al. 2014; Pattenden et al. 2010). But most focus has been on associations of weather and season with health – of particular interest in the context of impending global warming. The most common health outcome has been mortality (Armstrong et al. 2010; Gasparrini et al. 2012b; Hajat et al. 2002; McMichael et al. 2008) but also: hospital admissions (Pudpong and Hajat 2011), GP visits (Hajat and Haines 2002), viral disease (Lopman et al. 2009), food-borne disease (Kovats et al. 2004; Tam et al. 2006), diarrhoea (Hashizume et al. 2007, 2008a; Hashizume et al. 2010), pregnancy outcome (Lee et al. 2008; Wolf and Armstrong 2012), myocardial infarctions (Bhaskaran et al 2010; Bhaskaran et al. 2011; Bhaskaran et al. 2012)); defibrilator activation (McGuinn et al. 2012).

Several studies have focused in particular on which groups are vulnerable to the acute effects identified in time series regression, in particular of weather (Hajat et al. 2007; Hajat and Kosatky 2010; Hashizume et al. 2008b; Wilkinson et al. 2004), but also those of limited daylight on injuries (Steinbach et al. 2014) . Others have predicted, from time series regressions, impact of climate change on deaths due to acute effects of heat and cold (Hajat et al. 2014; Vardoulakis et al. 2014) .

Time series regression methods have also been used to study association of circulating RSV and influenza with hospital admission (Mangtani et al. 2006) and how much vaccination reduces that association with mortality (Armstrong et al. 2004).

Studies applying interrupted time series methods include those exploring the association of the introduction of state-wide smoking bans with the cardiovascular morbidity (Barone-Adesi et al. 2011), the financial crisis with suicides rates in Spain (Lopez Bernal et al. 2013), 20 mph speed limits with road injuries (Grundy et al. 2009), and floods with mortality (Milojevic et al. 2011; Milojevic et al. 2012).

For other and in particular more recent relevant papers check out the personal web pages of the staff members, accessible from the list below.

## 5. LSHTM researchers involved in developing or using time series regression methodology

Antonio Gasparrini; Ben Armstrong; Clarence Tam; Jamie Lopez Bernal; Katherine Arbuthnott; Krishnan Bhaskaran; Mike Kenward; Mylene Lagarde; Paul Wilkinson; Punam Mangtani; Rebecca Steinbach; Sam Pattenden; Sari Kovats; Shakoor Hajat ; Zaid Chalabi

## 6. Publications by LSHTM researchers

Armstrong B. 2006. Models for the relationship between ambient temperature and daily mortality. Epidemiology 17:624-631.

Armstrong B, Chalabi Z, Fenn B, Hajat S, Kovats RS, Milojevic A, et al. 2010. The association of mortality with high temperatures in a temperate climate: England and wales. J Epidemiol Community Health [Epub ahead of print].

Armstrong BG. 2003. Fixed factors that modify the effects of time-varying factors: Applying the case-only approach. Epidemiology 14:467-472.

Armstrong BG, Mangtani P, Fletcher A, Kovats S, McMichael A, Pattenden S, et al. 2004. Effect of influenza vaccination on excess deaths occurring during periods of high circulation of influenza: Cohort study in elderly people. BMJ 329:660.

Armstrong BG, Gasparrini A, Tobias A. 2014. Conditional poisson models: A flexible alternative to conditional logistic case cross-over analysis. BMC medical research methodology 14:122.

Barone-Adesi F, Gasparrini A, Vizzini L, Merletti F, Richiardi L. 2011. Effects of italian smoking regulation on rates of hospital admission for acute coronary events: A country-wide study. PLoS One 6:e17419.

*Bhaskaran K, Hajat S, Haines A, Herrett E, Wilkinson P, Smeeth L. 2010. Short term effects of temperature on risk of myocardial infarction in england and wales: Time series regression analysis of the myocardial ischaemia national audit project (minap) registry. British Medical Journal 341:c3823.

*Bhaskaran K, Hajat S, Armstrong B, Haines A, Herrett E, Wilkinson P, et al. 2011. The effects of hourly differences in air pollution on the risk of myocardial infarction: Case crossover analysis of the minap database. BMJ 343:d5531.

*Bhaskaran K, Armstrong B, Hajat S, Haines A, Wilkinson P, Smeeth L. 2012. Heat and risk of myocardial infarction: Hourly level case-crossover analysis of minap database. BMJ 345:e8050.

*Bhaskaran K, Gasparrini A, Hajat S, Smeeth L, Armstrong B. 2013. Time series regression studies in environmental epidemiology. International journal of epidemiology 42:1187-1195.

Gasparrini A, Gorini G, Barchielli A. 2009. On the relationship between smoking bans and incidence of acute myocardial infarction. European Journal of Epidemiology 24:597-602.

*Gasparrini A, Armstrong B. 2010. Time series analysis on the health effects of temperature: Advancements and limitations. Environmental Research.

*Gasparrini A, Armstrong B, Kenward M. 2010. Distributed lag non-linear models. Statistics in Medicine 29(21): 2224-34.

*Gasparrini A. 2011. Distributed lag linear and non-linear models in r: The package dlnm. Journal of Statistical Software 43:1-20.

*Gasparrini A, Armstrong B. 2011. The impact of heat waves on mortality. Epidemiology 22:68.

*Gasparrini A, Armstrong B, Kenward MG. 2011. Multivariate meta-analysis: A method to summarize non-linear associations. Statistics in Medicine 30:2504–-2506.

*Gasparrini A, Armstrong B, Kenward MG. 2012a. Multivariate meta-analysis for non-linear and other multi-parameter associations. Statistics in Medicine 31:3821-3839.

*Gasparrini A, Armstrong B, Kovats S, Wilkinson P. 2012b. The effect of high temperatures on cause-specific mortality in england and wales. Occupational and Environmental Medicine 69:56-61.

Gasparrini A, Leone M. 2014. Attributable risk from distributed lag models. BMC Medical Research Methodology 14:55.

Gasparrini A, Guo Y, Hashizume M, Lavigne E, Zanobetti A, Schwartz J, et al. 2015. Mortality risk attributable to high and low ambient temperature: A multicountry observational study. The Lancet. In Press

*Gouveia N, Fletcher T. 2000. Time series analysis of air pollution and mortality: Effects by cause, age and socioeconomic status. Journal of epidemiology and community health 54:750.

Grundy C, Steinbach R, Edwards P, Green J, Armstrong B, Wilkinson P. 2009. Effect of 20 mph traffic speed zones on road injuries in london, 1986-2006: Controlled interrupted time series analysis. BMJ 339:b4469.

Hajat S, Haines A. 2002. Associations of cold temperatures with gp consultations for respiratory and cardiovascular disease amongst the elderly in london. Int J Epidemiol 31:825-830.

Hajat S, Kovats RS, Atkinson RW, Haines A. 2002. Impact of hot temperatures on death in london: A time series approach. J Epidemiol Community Health 56:367-372.

Hajat S, Armstrong BG, Gouveia N, Wilkinson P. 2005. Mortality displacement of heat-related deaths: A comparison of delhi, sao paulo, and london. Epidemiology 16:613-620.

Hajat S, Armstrong B, Baccini M, Biggeri A, Bisanti L, Russo A, et al. 2006. Impact of high temperatures on mortality: Is there an added heat wave effect? Epidemiology 17:632-638.

Hajat S, Kovats RS, Lachowycz K. 2007. Heat-related and cold-related deaths in england and wales: Who is at risk? Occup Environ Med 64:93-100.

Hajat S, Kosatky T. 2010. Heat-related mortality: A review and exploration of heterogeneity. Journal of Epidemiology and Community Health 64:753-760.

Hajat S, Vardoulakis S, Heaviside C, Eggen B. 2014. Climate change effects on human health: Projections of temperature-related mortality for the uk during the 2020s, 2050s and 2080s. Journal of epidemiology and community health 68:641-648.

*Hashizume M, Armstrong B, Hajat S, Wagatsuma Y, Faruque AS, Hayashi T, et al. 2007. Association between climate variability and hospital visits for non-cholera diarrhoea in bangladesh: Effects and vulnerable groups. Int J Epidemiol 36:1030-1037.

*Hashizume M, Armstrong B, Hajat S, Wagatsuma Y, Faruque AS, Hayashi T, et al. 2008a. The effect of rainfall on the incidence of cholera in bangladesh. Epidemiology 19:103-110.

*Hashizume M, Wagatsuma Y, Faruque AS, Hayashi T, Hunter PR, Armstrong B, et al. 2008b. Factors determining vulnerability to diarrhoea during and after severe floods in bangladesh. J Water Health 6:323-332.

*Hashizume M, Faruque ASG, Wagatsuma Y, Hayashi T, Armstrong B. 2010. Cholera in bangladesh: Climatic components of seasonal variation. Epidemiology 21:706-710.

Imai C, Armstrong B, Chalabi Z, Hashizume M, Mangtani P. 2014. Application of traditional time- series regression models for study of environmental determinants of infectious diseases. In: ISEE.

Kovats RS, Edwards SJ, Hajat S, Armstrong BG, Ebi KL, Menne B. 2004. The effect of temperature on food poisoning: A time-series analysis of salmonellosis in ten european countries. Epidemiol Infect 132:443-453.

Lagarde M. 2011. How to do (or not to do)… assessing the impact of a policy change with routine longitudinal data. Health policy and planning:czr004.

*Lee SJ, Hajat S, Steer PJ, Filippi V. 2008. A time-series analysis of any short-term effects of meteorological and air pollution factors on preterm births in london, uk. Environ Res 106:185-194.

*Lopez Bernal JA, Gasparrini A, Artundo CM, McKee M. 2013. The effect of the late 2000s financial crisis on suicides in spain: An interrupted time-series analysis. European Journal of Public Health 23:732-736.

Lopman B, Armstrong B, Atchison C, Gray JJ. 2009. Host, weather and virological factors drive norovirus epidemiology: Time-series analysis of laboratory surveillance data in england and wales. PLoS One 4:e6671.

Mangtani P, Hajat S, Kovats S, Wilkinson P, Armstrong B. 2006. The association of respiratory syncytial virus infection and influenza with emergency admissions for respiratory disease in london: An analysis of routine surveillance data. Clin Infect Dis 42:640-646.

*McGuinn L, Hajat S, Wilkinson P, Armstrong B, Anderson HR, Monk V, et al. 2012. Ambient temperature and activation of implantable cardioverter defibrillators. Int J Biometeorol.

McMichael AJ, Wilkinson P, Kovats RS, Pattenden S, Hajat S, Armstrong B, et al. 2008. International study of temperature, heat and urban mortality: The ‘isothurm’ project. Int J Epidemiol.

Milojevic A, Armstrong B, Kovats S, Butler B, Hayes E, Leonardi G, et al. 2011. Long-term effects of flooding on mortality in england and wales, 1994-2005: Controlled interrupted time-series analysis. Environ Health 10:11.

Milojevic A, Armstrong B, Hashizume M, McAllister K, Faruque A, Yunus M, et al. 2012. Health effects of flooding in rural bangladesh. Epidemiology 23:107-115.

Milojevic A, Wilkinson P, Armstrong B, Bhaskaran K, Smeeth L, Hajat S. 2014. Short-term effects of air pollution on a range of cardiovascular events in england and wales: Case-crossover analysis of the minap database, hospital admissions and mortality. Heart:heartjnl-2013-304963.

Pattenden S, Armstrong B, Milojevic A, Barratt B, Chalabi Z, Doherty R, et al. 2010. Ozone, heat and mortality in fifteen british conurbations. Occup Environ Med.

*Pudpong N, Hajat S. 2011. High temperature effects on out-patient visits and hospital admissions in chiang mai, thailand. Science of the Total Environment 409:5260-5267.

*Rehill N, Armstrong B, Wilkinson P. 2015. Clarifying life lost due to cold and heat: A new approach. BMJ Open In Press.

*Steinbach R, Edwards P, Green J, Armstrong B. 2014. The contribution of light levels to ethnic differences in child pedestrian injury risk: A case-only analysis. Journal of Transport & Health 1:33-39.

*Tam CC, Rodrigues LC, O’Brien SJ, Hajat S. 2006. Temperature dependence of reported campylobacter infection in england, 1989-1999. Epidemiol Infect 134:119-125.

Tobías A, Armstrong B, Gasparrini A, Diaz J. 2014. Effects of high summer temperatures on mortality in 50 spanish cities. Environmental Health 13:48.

Vardoulakis S, Dear K, Hajat S, Heaviside C, Eggen B. 2014. Comparative assessment of the effects of climate change on heat- and cold-related mortality in the united kingdom and australia. Environmental Health Perspectives 122:1285–-1292.

Wilkinson P, Pattenden S, Armstrong B, Fletcher A, Kovats RS, Mangtani P, et al. 2004. Vulnerability to winter mortality in elderly people in britain: Population based study. Bmj 329:647.

*Wolf J, Armstrong B. 2012. The association of season and temperature with adverse pregnancy outcome in two german states, a time-series analysis. PLoS One 7:e40228.

* Research undertaken while the first author was a student at the LSHTM.

Last updated 1 April 2015. For more up to date publications refer to researchers’ personal web pages

## 7. Key references on methods

### General:

Bhaskaran K, Gasparrini A, Hajat S, Smeeth L, Armstrong B. Time series regression studies in environmental epidemiology. *International Journal of Epidemiology*. 2013;42(4):1187-1195

Peng, R. D. and F. Dominici (2008). *Statistical Methods for Environmental Epidemiology with R – A Case Study in Air Pollutioon and Health*. New York, Springer.

Zeger, S. L., R. Irizarry and R. D. Peng (2006). On time series analysis of public health and biomedical data. *Annual Review of Public Health* 27: 57-79.

Armstrong, B. (2006). Models for the relationship between ambient temperature and daily mortality. *Epidemiology* 17(6): 624-31.

Dominici, F. (2004). Time-series analysis of air pollution and mortality: a statistical review. *Research report – Health Effects Institute* 123: 3-27; discussion 9-33.

Dominici, F., A. McDermott and T. J. Hastie (2004). Improved semiparametric time series models of air pollution and mortality. *Journal of the American Statistical Association* 99(468): 938-49.

Touloumi, G., R. Atkinson, A. Le Tertre, et al. (2004). Analysis of health outcome time series data in epidemiological studies. *EnvironMetrics* 15(2): 101-17.

### On model selection:

Dominici, F., C. Wang, C. Crainiceanu, et al. (2008). Model selection and health effect estimation in environmental epidemiology. *Epidemiology* 19(4): 558-60.

Crainiceanu, C. M., F. Dominici and G. Parmigiani (2008). Adjustment uncertainty in effect estimation. *Biometrika* 95(3): 635.

Baccini, M., A. Biggeri, C. Lagazio, et al. (2007). Parametric and semi-parametric approaches in the analysis of short-term effects of air pollution on health. *Computational Statistics and Data Analysis* 51(9): 4324-36.

He, S., S. Mazumdar and V. C. Arena (2006). A comparative study of the use of GAM and GLM in air pollution research. *EnvironMetrics* 17(1): 81-93.

Peng, R. D., F. Dominici and T. A. Louis (2006). Model choice in time series studies of air pollution and mortality. *Journal of the Royal Statistical Society: Series A* 169(2): 179-203.

### On smoothing methods:

Marra, G. and R. Radice (2010). Penalised regression splines: theory and application to medical research. *Statistical Methods in Medical Research* 19(2): 107-25.

Schimek, M. G. (2009). Semiparametric penalized generalized additive models for environmental research and epidemiology. *EnvironMetrics* 20(6): 699-717.

Wood, S. N. (2006). *Generalized Additive Models: an Introduction with R*, Chapman \& Hall/CRC.

Dominici, F., M. J. Daniels, S. L. Zeger, et al. (2002a). Air pollution and mortality: estimating regional and national dose-response relationships. *Journal of the American Statistical Association* 97: 100-11.

Dominici, F., A. McDermott, S. L. Zeger, et al. (2002b). On the use of generalized additive models in time-series studies of air pollution and health. *American Journal of Epidemiology* 156(3): 193-203.

### On harvesting effect:

Rabl, A. (2005). Air pollution mortality: harvesting and loss of life expectancy. *Journal of Toxicology and Environmental Health: Part A* 68(13-14): 1175-80.

Schwartz, J. (2001). Is there harvesting in the association of airborne particles with daily deaths and hospital admissions? *Epidemiology* 12(1): 55-61.

Schwartz, J. (2000b). Harvesting and long term exposure effects in the relation between air pollution and mortality. *American Journal of Epidemiology* 151(5): 440-8.

### On distributed lag (non-linear) models:

Gasparrini A. Modeling exposure-lag-response associations with distributed lag non-linear models. *Statistics in Medicine*. 2014;33(5):881-899

Gasparrini, A. (2011). Distributed Lag Linear and Non-Linear Models in R: The Package dlnm. *J Stat Softw* 43(8): 1-20.

Gasparrini, A., B. Armstrong and M. G. Kenward (2010). Distributed lag non-linear models. *Statistics in Medicine* 29(21): 2224-34.

Muggeo, V. M. (2008). Modeling temperature effects on mortality: multiple segmented relationships with common break points. *Biostatistics* 9(4): 613-20.

Schwartz, J. (2000a). The distributed lag between air pollution and daily deaths. *Epidemiology* 11(3): 320-6.

### On meta-analytic techniques:

Gasparrini, A., B. Armstrong, et al. (2012). Multivariate meta-analysis for non-linear and other multi-parameter associations. *Statistics in Medicine*. 31:3821-3839.

Dominici, F., J. M. Samet and S. L. Zeger (2000). Combining evidence on air pollution and daily mortality from the 20 largest US cities: a hierarchical modelling strategy. *Journal of the Royal Statistical Society: Series A* 163(3): 263-302.

Schwartz, J. and A. Zanobetti (2000). Using meta-smoothing to estimate dose-response trends across multiple studies, with application to air pollution and daily death. *Epidemiology* 11(6): 666-72.

### On interrupted time series:

Wagner, A. K., S. B. Soumerai, F. Zhang, et al. (2002). Segmented regression analysis of interrupted time series studies in medication use research. *Journal of Clinical Pharmacy and Therapeutics* 27(4): 299-309.

Shadish WR, Cook TD, et al. (2002) Experimental and quasi-experimental designs for generalized causal inference

*Last updated 1 April 2015*