Bulletin of the World Health Organization

Exposing misclassified HIV/AIDS deaths in South Africa

Jeanette Kurian Birnbaum a, Christopher JL Murray a & Rafael Lozano a

a. Institute for Health Metrics and Evaluation, University of Washington, 2301 5th Avenue (Suite 600), Seattle, WA, 98121, United States of America.

Correspondence to Rafael Lozano (e-mail: rlozano@u.washington.edu).

(Submitted: 06 July 2010 – Revised version received: 18 January 2011 – Accepted: 27 January 2011 – Published online: 17 February 2011.)

Bulletin of the World Health Organization 2011;89:278-285. doi: 10.2471/BLT.11.086280


In 2007, South Africa’s epidemic of immunodeficiency virus (HIV) infection and acquired immunodeficiency syndrome (AIDS), one of the world’s largest, accounted for 17% of the global burden of HIV/AIDS.1,2 As in other African countries, the epidemic is monitored primarily through antenatal clinic data and data from population surveys. These data are modelled to yield incidence, prevalence and mortality estimates for HIV/AIDS.35 Yet unlike most other countries with a high burden of HIV/AIDS, South Africa has a national vital registration system that tracks deaths from these causes, although admittedly its coverage is incomplete and its death certification and coding are of questionable quality. For these reasons, the system is not very useful for generating HIV/AIDS statistics. While coverage has steadily improved – it was estimated at 85% in 1996 and 89% in 2000 for adults6 – data quality is still lacking. Death certificate audits have revealed errors in as many as 45% of all records, a situation that hampers cause of death analysis.710 Moreover, misclassification of HIV/AIDS deaths occurs for reasons beyond these general quality issues. According to the guidelines given in the International Classification of Diseases and Related Health Problems, tenth revision (ICD-10), HIV/AIDS is the underlying cause of death when an HIV-positive individual dies from a co-morbid condition resulting from the HIV infection (codes B20–B24).11 In South Africa, issuers of death certificates seldom know or have access to an individual’s HIV status, and rural community leaders often omit it when they fill out abbreviated certificates. In addition, many people are unwilling to be tested for HIV for fear of stigma or of losing health insurance benefits. These factors, together with concerns regarding the confidentiality of death certificates, result in an underreporting of deaths from HIV/AIDS.8,1214

Despite these issues, South Africa’s vital registration system remains a key source of data; it comprises the largest continuous data set for causes of death in southern Africa. Analytic techniques for adjusting for known biases are needed to find a middle ground between uncritically using the raw data and discarding them altogether. Groenewald et al. proposed adjusting a 2000–2001 data sample for misclassification of deaths from HIV/AIDS by comparing the age-specific death rates for selected “indicator” causes to 1996 rates and attributing deaths to HIV/AIDS when both a noticeable increase in mortality and an age pattern characteristic of HIV/AIDS were present.12 This method relied on the assumptions that misclassification was negligible in 1996 and that subsequent death rates for indicator causes remained constant at 1996 levels. No further national corrections of vital registration data have appeared in the literature, and the latest report on mortality and causes of death issued by Statistics South Africa did not include corrections for misclassification.15 We propose an alternative empirical method of quantifying misclassified deaths from HIV/AIDS in South Africa’s death registry and provide updated counts of the deaths attributable to HIV/AIDS in 1996–2006.



Death registration data based on ICD-10 codes were obtained from the mortality database of the World Health Organization (WHO) and aggregated to correspond to 48 mutually exclusive causes of death as listed in Naghavi et al.16 (Appendix A, available at: http://www.healthmetricsandevaluation.org/sites/default/files/publication_summary/ 2011/HIV_IHME_webappendix_0211.pdf). This list is composed of the 47 causes of death of public health importance that comprise most ICD-10 codes and contains a 48th category labelled “garbage codes”, which includes ill-defined and unspecified causes and modes of death that should never be given as underlying causes (e.g. “heart failure”).1619 The 48 causes are specific enough to provide meaningful distinctions but are also broad enough to not be influenced by minor certification or coding errors.

We obtained population data from the medium-fertility variant of the United Nations World Population Prospects 2008 estimates and aggregated both mortality and population data into 5-year age groups. Countries missing either mortality or population data were excluded from the analysis. We obtained South African data for 1996–2006 from the same sources. To correct for under-registration of deaths in South Africa, we scaled up the input mortality data to match the age- and sex-specific total mortality estimates from the most recent demographic model issued by the Actuarial Society of South Africa (ASSA) (Appendix A).3 Analyses were conducted in Stata 11.0 and graphics were made in R using the ggplot2 package.


We pooled vital registration and population data across all years and all countries (except South Africa) in the ICD-10 database to obtain aggregate distributions of deaths by cause, age and sex, and of population by age and sex. By including all countries with available data, we intended to maximize data quality and obtain a good representation of countries resembling South Africa in terms of epidemiological stage, given that many less-developed countries have poor data.20

Using the data in the preceding paragraph, we computed one set of global cause-specific death rates by age and sex. We computed analogous rates by cause, age and sex for each year of South African data for 1996–2006. For the global and South African rates separately, we then derived a reference death rate (DR) for each cause of death and sex by averaging the death rates in age groups 65–69, 70–74 and 75–79 years. We then converted the age-specific rates for each group of causes and sex into relative death rates (RDR) by comparing them against the group reference rates as in the following formula:

where a is age, s is sex and c is cause. The average of multiple age groups was used to construct the reference to avoid defining one age group as an anchor that could not itself be analysed for misclassification of HIV/AIDS deaths. We chose older age groups because we expected fewer HIV/AIDS deaths in them but made exceptions for perinatal and maternal causes, for which the average of age groups 0 and 1–4 years and of age groups 25–29, 30–34 and 35–39 years were used, respectively.

Relative death rates were graphed over age by cause and sex and visually scanned for any marked differences between the global and South African trends in the age group most likely to uniquely indicate HIV/AIDS mortality, namely people aged 20–45 years. Assuming a mostly biologically driven pattern of relative death rates, we considered higher South African relative death rates in ages 20–45 years to be indicative of HIV/AIDS deaths that had been misattributed to non-HIV-related causes. We further evaluated causes showing minor differences between their relative rates according to their biological relatedness to HIV infection. We applied these criteria to determine a set of “source” causes from which deaths would be taken and allocated to HIV/AIDS.

To quantify HIV/AIDS deaths among “source” causes, we applied the age- and sex-specific global relative death rates for each source cause to the reference death rates for every year of South African data. This yielded corrected death rates. When the corrected South African death rate was lower than the original rate in age groups below 70 years, we reassigned the excess deaths to HIV/AIDS. Due to small numbers, death rates among people older than 70 years had too much random variation to allow for plausible attribution of excess deaths to HIV/AIDS in those age groups.

We performed sensitivity analyses to examine the effect of using age groups 70–74 years and 75–79 years to derive alternative reference death rates, and to assess the effect of using only data from less-developed countries (as defined by the United Nations21,22) to derive the global standard. To allow for country heterogeneity in the coding patterns for garbage causes of death, we also explored the use of the 1996 South African data as the standard only for the garbage code cause. We tested these analyses and the main analysis using three alternative age- and sex-specific total mortality estimates to correct for under-registration in South Africa: the Institute for Health Metrics and Evaluation, the United Nations World Population Prospects (WPP),21 and this last source combined with estimates for under-5 mortality from the United Nations Children’s Fund (see Appendix A details).23


The graphs in Fig. 1 show the relative rates of death from select diseases globally and for South Africa (graphs for all communicable and non-communicable causes can be found in Appendix A, Fig. A1). Of the 47 non-HIV-related causes, 14 were identified as “source” causes (ICD-10 codes shown in Appendix A, Table A1): tuberculosis; sexually transmitted diseases excluding HIV infection; intestinal infectious diseases; selected vaccine-preventable diseases; parasitic and vector-borne diseases; meningitis and encephalitis; respiratory infections; other infectious diseases; maternal conditions; nutritional deficiencies; endocrine, nutritional, blood, and immune disorders; non-communicable respiratory diseases; other digestive diseases; and “garbage” codes. Rates of death from these causes were relatively higher in South Africa than in the global standard for ages 20–45 years, and most rates also displayed a clear time trend paralleling the rise of the HIV infection epidemic (Fig. 1, tuberculosis and other infectious diseases). Rates of death from communicable and non-communicable causes not identified as sources generally appeared consistent with the global pattern (Fig. 1, ischaemic heart disease) or deviated from the global pattern, but not in young to middle-aged adults (Appendix A, Fig. A1, cirrhosis). Two borderline causes – genitourinary diseases and other neoplasms – were not selected because their slightly higher relative rates in South Africa became negligible when inspected using the alternative total mortality estimates in the sensitivity analyses. In addition, an exploratory analysis including these two causes as sources yielded estimates within 1% of the point estimates and well within the uncertainty bounds in Table 1. Injuries showed highly variable patterns due to the greater influence of environmental and social factors on the relative rates.

Fig. 1. Death rates in South Africa relative to global rates of death from selected causes, 1996–2006
Fig. 1. Death rates in South Africa relative to global rates of death from selected causes, 1996–2006

The effect of reallocating deaths from source causes to HIV/AIDS is displayed in Fig. 2 (Appendix A, Fig. A2 shows corrected trends in source causes). Mortality from HIV infection rose over time in most age groups and most sharply among people aged 30–44 years, with some levelling off by 2006. Most deaths from HIV/AIDS and most misclassification of such deaths occurred in females aged 15–44 years and in males aged 30–59 years, although peak rates appeared to shift to slightly older ages over time (Appendix A, Fig. A3). In those aged 5 years or older, garbage codes, tuberculosis, respiratory infections and respiratory diseases (only in people aged 60 or older) contributed the most to death misclassification. Contributing less were other infectious diseases; endocrine, nutritional, blood and immune disorders; and intestinal infectious diseases. The remaining sources had negligible contributions. In those aged 0–4 years, most deaths were misclassified as having been caused by respiratory infections; to a lesser extent, by other infectious diseases; nutritional deficiencies; endocrine, nutritional, blood and immune disorders; tuberculosis; and least of all by the remaining sources (Appendix A, Fig. A4).

Fig. 2. Original and corrected rates of death from human immunodeficiency virus infection and acquired immunodeficiency syndrome (HIV/AIDS) in South Africa over time, by age, 1996–2006
Fig. 2. Original and corrected rates of death from human immunodeficiency virus infection and acquired immunodeficiency syndrome (HIV/AIDS) in South Africa over time, by age, 1996–2006

Table 1 shows the effect of correction on total mortality from HIV/AIDS. The 2–3% of deaths registered to HIV/AIDS in South Africa before 2006 reflect only around 10% of all HIV/AIDS mortality, which rose from around 19% of all-cause mortality in 1996 to 48% in 2006. However, the uncertainty ranges from the various sensitivity analyses are wide. Two of the methodological variants – using South African 1996 mortality as the standard for the garbage codes and using only developing countries to derive the global standard – account for as much uncertainty in the aggregate yearly estimates as do the three alternative total mortality estimates used to correct for under-registration. The impact of varying the reference age group is minor by comparison. The shape of the time trend within age groups (uncertainty shown in Appendix A, Table A2) is most strongly affected by the choice of total mortality estimate (data not shown). In addition, the developing country standard proved similar to the global standard for most causes, which supports the hypothesis that relative rates are largely biologically driven (data not shown).


These results confirm the substantial misclassification of HIV/AIDS deaths in South Africa’s vital registration system reported in the literature. While audits of death records in selected areas in 2003–2004 have shown that 53–73% of HIV/AIDS deaths do not explicitly record HIV/AIDS as the underlying cause of death,7,8 this study suggests that during 1996–2006 as many as 94% of all HIV/AIDS deaths in the country were being misclassified, especially among young to middle-aged females and among males in the middle-aged and older groups.

Many of the source causes identified and the age and sex patterns noted for the corrected data are in line with previous findings. Tuberculosis and other respiratory infections, intestinal infectious diseases, parasitic diseases, meningitis, other infectious conditions, digestive disorders and ill-defined ailments were found to be common sources of misclassifications in an audit,7 and Groenwald et al. additionally identified nutritional deficiencies and non-communicable respiratory diseases as misclassification sources when they examined mortality rates between 1996 and 2001.12 Nephritis, cancer and cardiovascular disease were also identified by these studies as potential misclassification sources7,12 but were not found to be source causes in this analysis, perhaps because they contributed relatively little. Higher mortality rates in females than males at younger ages have also been documented previously6,2428 and reflect the partnering pattern in South Africa, where older men often partner with younger women.1,29 Both the slowing of increases in mortality, also documented subnationally,24 and the shifting of peak death rates to older ages may be the result of efforts at preventing and treating HIV/AIDS.24,3032

The corrected data provide estimates of mortality from HIV/AIDS in South Africa that can complement modelled estimates. For example, the Joint United Nations Programme on HIV/AIDS estimates total deaths from HIV/AIDS in 2006 at 350 000 (range: 300 000–420 000),33 while the ASSA estimates deaths from HIV/AIDS at about 345 000 in 2006.3 Our estimates overlap with these and suggest tighter uncertainty bounds than United Nation’s estimates for some years. Empirical estimates may thus help triangulate where true mortality lies within modelled estimates. In addition, because modelling public health prevention and treatment programme effects can be challenging, empirical estimates may be useful in gauging the evolution in the epidemic response – a dynamic area that will be misrepresented if mortality from HIV/AIDS is overestimated.

This study has some limitations. Global data quality may vary depending on the country,20 yet we have assumed that relatively accurate relative death rates by cause can be computed by pooling across countries. In addition, if more than negligible miscoding exists in the South African data across the broad cause list considered in this study, the results will reflect that miscoding. To adjust for under-registration, we tested different total mortality data that are themselves estimates with their own limitations.

We also assumed that relative rates are an appropriate standard of comparison. This only holds true if a biologically driven smooth age pattern of relative death rates exists. This is supported by the consistency of the relative rates for many causes between the global and South African data, as well as between the global data and developing country data, but some of the source causes are subject to important social and environmental influences that create heterogeneity in the relative age pattern. Garbage codes and tuberculosis offer clear examples. The use of codes for ill-defined conditions as underlying causes in death certificates often reflects local patterns of medical training16 and, as mentioned earlier, coding quality is poor in South African death registration data.710,12 The rise of co-infection with tuberculosis and HIV and the high rates of transmission of tuberculosis seen in mining communities have affected tuberculosis epidemiology in South Africa and may have altered the relative age pattern of deaths from tuberculosis.1,3436 In addition, smooth relative death rates require large data sets. Even at the national level in South Africa, relative rates were sensitive to standards and reference age groups, and rates for people older than 70 years were too small to analyse. Within the reference ages, the identification of misclassified deaths is weaker by design.

The wide uncertainty ranges are a final consideration. The method is clearly sensitive to methodological choices and to correction for under-registration of the input data. However, since the ranges stem from a large number of sensitivity analyses, they may capture the outer limits of possible estimates from this method. These limits are wide but actually offer some precision in cases where modelled estimates have greater uncertainty.

Given the biases present in South Africa’s vital registration data with regard to deaths from HIV/AIDS, this study presents a useful empirical method for improving data quality and estimating HIV/AIDS mortality that is based on biological patterns of death and on the epidemiology of HIV/AIDS. The method improves upon existing ones owing to its robustness to changes in absolute death rates over time and to its use of an external standard for comparison. The results for mortality from HIV/AIDS complement modelled estimates and the corrected vital registration data set represents the cause patterns of mortality in South Africa better than the original. This approach to correcting data sets using the relative age pattern for deaths is easily transferrable to other settings with moderate to large epidemics of HIV infection where death registration may not accurately reflect HIV/AIDS mortality. Methodological choices can be modified to better suit national or subnational analyses by tailoring the cause list and using local knowledge of death certification patterns.

High-quality health statistics are instrumental in health planning, decision-making, programme evaluation and monitoring progress.37 Adjusting for biases in existing data is only an interim solution until countries improve the quality of their death certification data. As has been stated in the literature, rigorously training physicians to properly prepare death certificates is necessary to prevent errors and educate the medical community about the importance of quality registration.10,38,39 Complementary efforts to address confidentiality and other factors leading to the omission of HIV-related conditions from death certificates are also key for improving the accuracy of South Africa’s vital statistics. In the meantime, adjusted data provide the best available empirical estimates of patterns in the underlying cause of death.


The authors thank Dennis Feehan, Katie Leach-Kemon, Susanna Makela, Mohsen Naghavi, Janaki O’Brien and Haidong Wang for their valuable collaboration during this study.


This research was supported by funding from the Bill & Melinda Gates Foundation (http://www.gatesfoundation.org) and the state of Washington, USA. The funders had no role in study design, data collection and analysis, interpretation of data, decision to publish or preparation of the manuscript. The corresponding author had full access to all data analysed and had final responsibility for the decision to submit this original research paper for publication.

Competing interests:

None declared.