WHO home
All WHO This site only
  Global environmental change
  WHO > WHO sites > Global environmental change > Publications
printable version

Using climate to predict disease outbreaks: a review: Previous page: Historical early warning systems | 1,2,3,4,5,6,7,8,9,10,11,12

Conceptual framework for developing climate-based EWS for infectious disease


Attempts to initiate EWS development within a specific country should be preceded by a decision-making process which identifies the principal disease(s) of interest. This will depend on the burden of various infectious diseases in the region and on levels of national and international funding available for disease-specific activities.

On the basis of an extensive literature review, the following framework for constructing climate-based infectious disease EWS is proposed (Figure 1). The framework comprises four preliminary phases, the EWS itself, and the response and assessment phases.

1. Preliminary phases

1.1 Evaluating epidemic potential

An EWS for an infectious disease should be developed only if the disease is epidemic-prone. Before assessing the epidemic potential of a disease, the word epidemic should be defined (Last 2001):

The occurrence in a community or region of cases of an illness, specific health-related behaviour, or other health-related events clearly in excess of normal expectancy. The community or region and the period in which the cases occur are specified precisely. The number of cases indicating the presence of an epidemic varies according to the agent, size, and type of population exposed; previous experience or lack of exposure to the disease; and time and place of occurrence.

‘Outbreak’ is also commonly used, and is defined by Last (2001) as “an epidemic limited to localized increase in the incidence of a disease, e.g. in a village, town or closed institution.”

If it is assumed that outbreaks and epidemics differ only in the scale of their effects rather than their aetiology, the concept of climate-based EWS will be applicable equally to both.

Generally, a disease that exhibits large inter-annual variability can be considered as epidemic.

The transmission of many infectious diseases varies markedly by season. For example, the majority of influenza outbreaks in the northern hemisphere occur in mid to late winter (WHO 2000) while, even in relatively stable trans-mission areas, peak malaria transmission generally follows periods of heavy rain (Macdonald 1957). Where disease is present in an area, fluctuations in its incidence are considered epidemics only if the number of cases exceeds a certain threshold. A commonly used definition of an outbreak is a situation where reported disease cases exceed a threshold of 1.96 multiplied by the standard deviation of the mean for at least two weeks (Snacken et al. 1992). For influenza, the duration of an epidemic also has been defined as the number of weeks when virus has been isolated from at least 10% of samples (Snacken et al. 1992). In all cases, an epidemic is defined best by examining continuous long-term datasets, therefore setting up surveillance centres is an important preliminary requirement.

1.2 Identifying the geographical location of epidemic areas

Even if an infectious disease is widespread throughout a country or entire region, geographically the risk of epidemics is not equal at all locations and will reflect, inter alia, the distribution and behaviour of disease vectors and hosts. Geographical variation in risk of epidemics is widely acknowledged, but epidemic-prone areas are seldom defined formally. This is due partly to the difficulties in defining epidemics, partly to lack of long-term surveillance data and changing epidemiology of diseases over time. For example, malaria transmission in many lowland areas of Africa often is characterized as holoendemic, with year round transmission, while neighbouring regions at higher altitude are considered to be epidemic-prone. In these areas, environmental conditions (presumably tempe-rature) are on average less favourable, and transmission occurs in the form of epidemics only on occasions when changes in environmental conditions and/or population immunity create permissive conditions. However, the difficulties in characterization are shown by a recent study by Hay et al. (2002a). This showed no evidence of greater instability in transmission in three study sites with altitudes over 1 600 m, than occurred in low altitude areas.

When testing research hypotheses it is important to apply consistent definitions in order to identify epidemic areas. Conversely, to improve public health this may be less important than consideration of whether the pattern of transmission in a particular area is sufficiently different to require a qualitatively distinct type of operational response.

1.3 Identifying climatic and non-climatic disease risk factors

Also known as risk assessment or modelling, this phase provides a vital input to EWS development. An extensive number of studies have been undertaken to identify environmental risk factors, including climate (see section 5). There are two main approaches: statistical and biological modelling. Statistical models are used to identify the direct statistical correlations between predictor (e.g. climate) variables and the outcome of interest (e.g. disease incidence). Biological models contain complete re-presentations of climate's effects on the population dynamics of pathogens and vectors. The majority of past studies have used statistical modelling of locality-specific historical disease measures and/or vector distributions. Biological models potentially offer greater insights into the mechanisms driving variation in disease incidence but require more extensive understanding of climatic effects on all aspects of pathogen and vector dynamics. They therefore have been applied on very few occasions (e.g. Randolph and Rogers 1997).

Whichever modelling approach is used, it is important to take account of non-climatic factors. These include indicators of the vulnerability of populations to disease outbreaks such as (in the case of malaria) low immunity, high prevalence of HIV, malnutrition, drug and insecticide resistance (WHO 2001). Failure to take account of such influences can lead to either variation in disease incidence being incorrectly attributed to climate effects and/or poor predictive accuracy.

1.4 Quantifying the link between climate variability and disease outbreaks; constructing predictive models

The relationship between disease incidence and the climate factors identified in section 3.1.3 can be quantified in a statistical or biological model that may subsequently form the basis of future predictions of disease outbreaks. Before this can be initiated, it is necessary to ensure that both disease and explanatory data are available at appropriate spatial and temporal resolutions and for a sufficient time- frame.

Climate data for use in EWS are available in two forms: direct, ground-based measurements and surrogate measures derived by remote sensing. Usually ground-based data are measured at standard synoptic weather stations. They have the advantage of being accurate, direct measurements of meteorological conditions – but these data will be representative only of a small area in the vicinity of the station itself. If the area of interest does not contain meteorological stations, the use of ground-based data depends on appropriate extrapolation methods being applied to the data.

The use of satellite remote sensing data obviates the need for interpolation, as measurements are taken repeatedly for all locations. Raw remote sensing data can be transformed to provide a number of indices that constitute proxies for standard meteorological variables (Hay et al. 1996; Hay and Lennon 1999). Data from the Advanced Very High Resolution Radio-meter (AVHRR) sensor on board National Oceanic and Atmospheric Administration (NOAA) satellites, for example, can be used to provide daily data at up to 1.1 km spatial resolution for land surface temperature, as well as an assessment of vegetation status (greenness) through the normalized difference vegetation index (NDVI). The AVHRR data archive goes back as far as 1981. Meteosat, a geostationary satellite operated by EUMETSAT, provides information on cloud-top temperatures that has been used to construct a proxy variable for rainfall (cold cloud duration or CCD). For Africa, NOAA’s Climate Prediction Center (CPC) produces 10 day estimates of rainfall based on CCD and these, together with NDVI, are disseminated free of charge through the Africa Data Dissemination Service . Software for extracting and analysing these data for specific localities (WinDisp) also is available as freeware. CCD data go back to 1988, although CPC rainfall estimates are available only from 1995.

The analytical steps involved in quantifying climate-disease links can be separated into four main steps:

  • Fitting trend lines and sine-cosine waves (or similar) to remove long-term trends and seasonal variation from outcome and predictor variables.
  • Testing for correlations between climate variability and variability in the outcome variable.
  • Using the derived equations to make predictions for subsequent time points not included in the original model.
  • Measuring levels of agreement between predictors and outcomes.

Quantifying the relationship between climate parameters and the occurrence of infectious diseases and/or their vectors in order to predict geographical and temporal patterns of disease has been attempted numerous times (see sections 2 and 5). Although these predictions allow us to map disease and vector ranges, the majority are not EWS, either because they aim to make spatial rather than temporal predictions (i.e. predict disease rates in locations that have not previously been surveyed), or because they are used to explore possible effects of long-term changes in climate over decades, rather than for the next few weeks or months.

For EWS the specific analytical methods used, and associated accuracy measures, depend on the specific purpose. For example, one major aim of EWS is to predict the likelihood of an epidemic (i.e. whether a pre-defined threshold of incidence will be exceeded). For this purpose it is appropriate to use techniques for predicting a binary outcome, such as logistic regression or discriminant analysis, with climatic and non-climatic data as the predictor variables and the occurrence or non-occurrence of an epidemic as the outcome. Various measurements can be used to represent different aspects of predictive accuracy. These include the overall proportion of correct predictions, the sensitivity (proportion of epidemics correctly predicted), specificity (pro-portion of non-epidemics correctly predicted), positive predictive value (proportion of predictions of an epidemic that were correct), negative predictive value (proportion of predictions of non-epidemics that were correct), and kappa statistics, a measure of increased predictive accuracy above that expected by chance alone (Brooker et al. 2002a).

Another major aim of EWS is to predict not only the occurrence, but also the size of an epidemic. In this case, it is appropriate to use regression techniques with a continuous outcome, such as traditional linear and non-linear regression, or more complex regression techniques such as ARIMA (autoregressive-moving average) models that incorporate trends and temporal autocorrelation into a single model. In this case, predictive accuracy can be represented by comparing the magnitude of the observed and predicted epidemic, using the root mean square error, or as correlation coefficients between observed and predicted case numbers (Abeku et al. 2002).

In either case, model accuracy should be assessed against independent data (i.e. not included in the original model building process) to give an accurate replication of an attempt to predict a future epidemic. Using the same data to both build and test a model will tend to exaggerate predictive accuracy.

2. Early warning systems

An EWS encompasses not only predictions of disease in time and space but also active disease surveillance and a pre-determined set of responses. The distinction between prediction and early warning must be clearly defined: early warning is prediction but not all prediction is early warning. In the context of this report, early warnings are considered to come from both model predictions and disease surveillance (i.e. early detection), and include consideration of operational conditions and responses.

2.1 Disease surveillance

Disease surveillance provides a means of monitoring disease incidence over time and, depending on the nature of the system, may be an appropriate instrument for detecting unusual patterns among incidence data. Strictly speaking, disease surveillance does not constitute early warning, even where surveillance is carried out within a specially designed network of sentinel sites. Surveillance provides a means of detecting rather than predicting the onset of an epidemic (there is therefore no lead-time as such). However, a properly designed system should bring forward significantly the point of intervention, thereby increasing the chances of intervention assisting disease control. As a means of validating disease predictions produced by climate-based models surveillance data constitute an integral part of any fully-fledged EWS. In most cases, the existence of accurate, validated predictive models depends on the availability of historical surveillance data.

An important first step in EWS development at national level is to assess current approaches to disease surveillance and the quality, quantity and completeness of associated disease data. In many cases – and especially for notifiable diseases in well resourced health systems – existing disease data may be suitable for model development and the system itself quite appropriate for epidemic early detection. In other situations existing systems may need extensive modification, either in the way in which disease data are collected (e.g. diagnostics), or the manner in which data from individual health facilities are collected, aggregated and communicated to higher levels in the health system. Standard health management information system (HMIS) data, for example, commonly aggregate data from individual facilities to the extent that localized disease outbreaks may be obscured. Many standard surveillance approaches also may lack sufficient temporal resolution for epidemic detection, especially where data are reported monthly.

Where appropriate disease surveillance systems are in place, tracking disease incidence with reference to expected normal levels of incidence can indicate the onset of an epidemic and (where surveillance data include information on the locality of cases) provide information about its geographical extent. However, aberrations in surveillance data indicating abnormal levels of disease transmission should be investigated before implementation of large-scale interventions aimed at epidemic control. Such ab-errations may constitute artefacts within the surveillance system (e.g. due to changes in diagnostic practices, shifts in the levels of usage of individual health facilities by the general public etc.) and may not reflect changes in levels of disease transmission. It should also be borne in mind that there is no single, standard approach available for detecting aberrations (i.e. outbreaks) on the basis of surveillance data. A number of detection algorithms have been proposed (for example, Hay et al. 2003) and the sensitivity and specificity of each will vary depending on the nature of the temporal distribution of cases associated with each disease type. Similarly, a number of issues concerning how best to construct a ‘reference’ disease baseline have yet to be resolved fully. For example, what is the minimum number of years of data required to develop a reliable baseline? Should the baseline lengthen with each year of new data, or should older data be discarded? Should data from known epidemic years be omitted from the baseline calculation? These and many other issues await full clarification.

2.2. Monitoring disease risk factors

As described in section 1.2, a range of weather monitoring datasets is available from earth observation satellites. These (and basic software for display and extraction of data) are free of charge but funds may need to be secured for GIS software capable of more advanced geographical processes and analysis. Also it is important to assess vulnerability indicators such as herd immunity, HIV prevalence, malnutrition and drug resistance at this stage. As discussed below these are difficult to monitor accurately, requiring much manpower and well-organized surveillance systems.

There are several vector-related risk factors for vector-borne diseases. These include local vector species composition and the human blood index (i.e. tendency to bite humans). It has been suggested that vector densities may be sufficient to forecast changes in malaria transmission (Lindblade et al. 2000) where surpassing an ‘epidemic threshold’ could indicate a potential epidemic. Alternatively, measures of malaria transmission intensity such as the entomological inoculation rate (EIR – the product of the infection rate in vectors and the biting rate on humans) have been used to assess variation in malaria transmission risk in Africa (Snow et al. 1999, Hay et al. 2000b) and theoretically could be monitored as indicators of potential epidemics. Unfortunately, in most cases, monitoring both EIR and vector densities is too expensive to be feasible (Thomson and Connor 2001). In addition, the quantitative relationships between these variables and the probability and intensity of epidemics remain at the research stage. To our knowledge, there are no published examples where such a system has been put into operation.

2.3. Model forecasts

Model forecasts can be based on relationships between disease and predictor variables to predict risk in both surveyed and unsurveyed areas. Inputs for such predictions can come from either direct monitoring of known risk factors (e.g. using rainfall measurements in one month to predict the probability of an epidemic of mosquito-borne disease in the next few months) or forecasting based on predictions of these risk factors (i.e. seasonal climate forecasts). The choice will depend on the relative importance of accuracy (usually maximized by using direct observations of risk factors) and lead-time (maximized by predictions of risk factors).

Likely predictor climatic variables include temperature, rainfall and the El Niño Southern Oscillation (ENSO), all of which are available. Future climate-based predictions of disease variability require projections of climate events. It is possible to predict weather relatively accurately up to a week ahead using complex atmospheric models (Palmer and Anderson 1994). In some regions and under some existing climate conditions, predictions of climatic conditions up to several months ahead can be made (from similar models). In particular there has been considerable interest in predicting the interannual variations of the atmosphere-ocean system, such as the onset, development and breakdown of ENSO. ENSO is a periodic appearance of warm and cool sea surface water in the central and eastern Pacific ocean (Wang et al. 1999). ENSO events are associated with increased probability of drought in some areas and excess rainfall in others, along with temperature increases in many regions. In the tropics, variability in the ocean-atmosphere associated with ENSO can be predicted with a lead-time of several seasons (Palmer and Anderson 1994). In Asia and south American regions, there is evidence that ENSO events have an intensifying effect on seasonal malaria transmission, including epidemics (Kovats et al. 2003).

Seasonal forecasts of some of these climate variables are available for specific regions of the world . Forecast lead-times vary for different climate parameters, from one to four months for rainfall in Africa to a year or more for the strength of an ENSO event. Although these forecasts allow relatively long potential lead-times which can be particularly useful for gathering resources necessary for control measures, forecasting climate introduces an additional source of uncertainty into the epidemic prediction. In addition, climate forecasts are not available at high spatial resolutions therefore the epidemic warning will be at a relatively coarse geographical scale.

The EWS options presented above demonstrate a trade-off between warning time and specificity. In each case, the precision of predictions depends on how disease and climate indicators are selected – are they long-term projections or short-term active observations? The important question of whether predictions should be relatively general one-year forecasts or more precise predictions for the following week depends mostly on the public health requirements. It has been suggested that epidemic forecasting is most useful to health services when case numbers are predicted two to six months ahead, allowing tactical decision-making (Myers et al. 2000). When longer- term strategic disease control is the objective (e.g. the Onchocerciasis Control Programme in west Africa), longer-term forecasts may be more pertinent.

The hierarchical system proposed for malaria EWS in Africa (Cox et al. 1999) takes account of all the different ranges of forecasts which can be developed to suit the various needs of the health sector:

  • Long range predictions based on seasonal climate forecasts. The resulting epidemic risk assessments will cover wide areas and have lead-times greater than six months.
  • Short range predictions based on active monitoring of risk factors (e.g. temperature and rainfall). Geographical resolution is much more specific and lead-times can be measured in weeks rather than months.
  • Early detection of epidemics using disease monitoring. There is no lead-time. per se, but this approach provides specific information on timing and location of an epidemic.

3. Response phase

Appropriate forms of epidemic response will be geographically and disease specific and may consist of either chemo-therapeutic or vector control measures, or a combination of both. Ultimately, responsibility for arranging relief or other measures necessary to contain an epidemic lies with national governments or non-governmental bodies. Response to an epidemic warning ideally should follow a preparedness plan that has been developed through an integrated multisectoral approach (FEWS 2000). The majority of infectious disease outbreaks occur in developing countries where funds are (usually) of crucial importance, an effective response may require the extensive involvement of international organizations.

4. Assessment/evaluation phase

  • How easy is the system to use?
  • Are the predictions accurate enough to contribute usefully to disease planning? (see below).
  • Is the system cost-effective and could resources have been used more effectively?

After the onset of an epidemic (preferably during the response phase), the EWS should be evaluated technically in consultation with end-users. Questions that need to be addressed include:

Despite many attempts to develop EWS for infectious diseases (and other areas), to our knowledge there are no practical guidelines for assessing the accuracy of an EWS. When an EWS is developed, end-users and researchers should agree on the required level of accuracy, although this may be difficult due to lack of communication and consultation between the different personnel involved in the various stages.

There are two separate principal aims of an EWS:

  • Identify whether an epidemic will occur within a specific population, according to a pre-defined threshold of cases.
  • Predict the number of cases within a period of time.

The relative importance of the two aims will depend on the control decisions to be taken and the degree of interannual variation in disease. For example, for diseases which are absent from the human population for long periods followed by explosive epidemics, early detection and/or predictions of the probability of an epidemic may be more important than predictions of epidemic size. Assessments should be performed as ‘value-of-information’ assessments; i.e. it must be determined whether collection and analysis of climate data adds sufficient predictive power, or if allocating the funds to collection of other information has a greater effect on predictive power. In terms of assessment, Woodruff et al. (2002) recommend that an EWS for arboviruses should predict an epidemic with at least 90% accuracy (assuming that an epidemic is defined as the number of cases exceeding the mean plus one standard deviation), while Abeku et al. (2002) proposed an assessment based on the forecast error (the log of the difference between observed and expected cases). It is the recommendation of this report that solid guidelines on determining and assessing the precision of EWS predictions should be established.

- Figure 1 . Framework for developing climate-driven early warning systems for infectious diseases [pdf 19kb]

Using climate to predict disease outbreaks: a review: 1,2,3,4,5,6,7,8,9,10,11,12 | Next page: Identifying candidate diseases for early warning systems

[an error occurred while processing this directive]