Bulletin of the World Health Organization

Electronic medical record systems, data quality and loss to follow-up: survey of antiretroviral therapy programmes in resource-limited settings

Mathieu Forster a, Christopher Bailey b, Martin WG Brinkhof a, Claire Graber a, Andrew Boulle c, Mark Spohr b, Eric Balestre d, Margaret May e, Olivia Keiser a, Andreas Jahn f, Matthias Egger a & for the ART-LINC collaboration of the International Epidemiological Databases to Evaluate AIDS

a. Institute of Social and Preventive Medicine, University of Bern, Finkenhubelweg 11, 3012 Bern, Switzerland.
b. Department of Knowledge Management and Sharing, World Health Organization, Geneva, Switzerland.
c. School of Public Health and Family Medicine, University of Cape Town, Cape Town, South Africa.
d. Institut de Santé Publique, d’Epidémiologie et de Développement, Université Victor Segalen, Bordeaux, France.
e. Department of Social Medicine, University of Bristol, Bristol, England.
f. Lighthouse Clinic, Kamuzu Central Hospital, Lilongwe, Malawi.

Correspondence to Matthias Egger (egger@ispm.unibe.ch).

(Submitted: 24 November 2007 – Revised version received: 01 May 2008 – Accepted: 07 May 2008 – Published online: 04 November 2008.)

Bulletin of the World Health Organization 2008;86:939-947. doi: 10.2471/BLT.07.049908


Access to antiretroviral therapy (ART) has improved in lower-income countries over the past 4 years as a result of an exceptional commitment by the international community and donor agencies. WHO estimates that about 3 million people were receiving ART in low- and middle-income countries at the end of 2007, a figure representing a 7.5-fold increase over the previous 4 years.1 The number of patients starting ART has increased exponentially since 2003 and must continue to do so if the goal of universal access to ART is to be achieved.2

In the absence of curative treatments, lifelong follow-up of patients on ART is required to monitor adherence, treatment response and adverse effects. A growing amount of increasingly complex information needs to be reviewed at each visit, and new data must be added to the record. An important aspect is retention in care: a recent analysis of treatment programmes showed that losses to follow-up have become more common with the scale-up of ART.3 Programmes find it increasingly difficult to follow the growing population of patients and to trace those who do not return to the clinic. Electronic Medical Record (EMR) systems can improve health care by increasing adherence to therapeutic guidelines and protocols, informing clinical decisions and decreasing medication errors.4,5 EMR systems allow early identification of patients who miss appointments, thereby facilitating their timely tracing, and provide a platform for operational research.6 Little is known about the role of EMR systems in the context of the scale-up of ART in resource-limited settings. A recent review identified the need for studies on the best ways of using information systems to support the expansion of HIV care in such settings.7 The objective of this study was to describe the electronic medical databases used in ART programmes in lower-income countries and to assess the measures such programmes employ to maintain and improve data quality and reduce the loss of patients to follow-up.


Workshop and subsequent survey

In June 2006, representatives of 21 ART programmes from 15 countries (Benin, Brazil, Burundi, Côte d’Ivoire, the Gambia, India, Kenya, Malawi, Mali, Nigeria, South Africa, Thailand, Uganda, Zambia and Zimbabwe) attended a workshop on the use of EMR systems in ART programmes in resource-limited settings. Ten of the 21 programmes participated in the Antiretroviral Treatment in Lower Income Countries (ART-LINC) collaboration, a network of treatment sites of the International Epidemiological Databases to Evaluate AIDS (IeDEA, http://www.iedea-hiv.org).810 The workshop was jointly organized by IeDEA and the Knowledge Communities and Strategies (KCS) unit at WHO and hosted by the Centers for Disease Control and Prevention (CDC, United States Department of Health and Human Services) offices in Entebbe, Uganda.

Based on the workshop, an online questionnaire covering the EMR systems in place, human and electronic resources, reporting systems, data storage, quality control measures and the tracing of patients lost to follow-up was written in English, translated into French and revised after pilot testing in Bern and Bordeaux. The questionnaire is available from biblio@ispm.unibe.ch. ART treatment programmes that participated in the Entebbe workshop were invited to complete the questionnaire. All sites (n = 21) agreed to participate in the survey. WHO’s web-based Data Collector system11 was used. The questionnaire was uploaded 20 December 2006, and all sites had responded by 12 February 2007.

Data quality in ART-LINC

Questions from the Entebbe survey were used to create indicators of data quality. First, for each programme, a computation was made of the number of hours dedicated to data entry per week divided by the number of ART patients enrolled. This was done separately for data entry clerks and medical staff. Second, whether or not staff had received training in data management and data quality control was determined. Third, a patient tracing indicator was calculated by adding up the measures implemented by the sites, including the availability of staff dedicated to tracing patients, the use of community-based organizations to trace patients, and the consultation of death registries to determine vital status.

ART-LINC sites that were not represented at the Entebbe workshop (n = 7) did not complete the questionnaire but were asked to provide information on these indicators. This allowed us to analyse their influence on the quality of the ART-LINC data. We also used an index of the burden of poverty [the proportion of the country’s population living on less than 1 US dollar (US$) a day]12 to investigate the possible influence of background material deprivation on data quality.

The quality of ART-LINC data was assessed by defining a set of six key variables and calculating the proportion of missing data for each. The six variables were: age, sex, CDC or WHO clinical stage at baseline, baseline and follow-up, CD4+ lymphocyte (CD4) counts and year of ART initiation. An index was computed by determining, for each site, the median of the percentages missing data of all six variables. Sites were then ranked according to this global missing data index. Principal components analysis showed that all variables loaded heavily on a single component that accounted for 60.1% of the total variance. This suggests that most of the problem of missing data for the six variables relates to a single phenomenon. A reliability analysis revealed appropriate internal consistency, with a Cronbach’s α of 0.83. These analyses supported the use of a combined index rather than a separate analysis for each variable. A second indicator was defined for each site by computing the proportion of patients lost to follow-up at 1 year after starting ART. A patient was considered lost to follow-up if his or her last visit was recorded during the first year after starting ART and the patient had at least 1 year of additional potential follow-up until the closing date of the database. The closing date was defined for each cohort as the date of the most recent follow-up visit recorded in the database.

Statistical analysis

Associations between site characteristics and data quality were investigated using univariate and multivariate maximum likelihood logit models. The number of patients with missing data for key variables and the number lost to follow-up in each programme site was divided by the total number of patients enrolled in the site. Models for clustered data were then used to evaluate predictors. Robust standard errors to account for within-site correlation were used (blogit procedure in Stata version 9.2, StataCorp LP, College Station, TX, United States of America). These models assess the likelihood of a patient within a site having missing data or being lost to follow-up. Variables associated with outcomes in univariate models at P < 0.25 were entered into the multivariate model. Results were expressed as odds ratios (ORs) with 95% confidence intervals (CIs).


Programme site characteristics

The 21 sites surveyed provided ART to a total of 50 060 patients. The median number of patients on ART per site was 1000 (interquartile range, IQR: 320 to 2398). Table 1 shows selected site characteristics. All programmes except one were in urban facilities; 11 (52%) were in public facilities; 9 (43%) were run by NGOs and one was a private, for-profit clinic. Eighteen sites (86%) received funding from a donor agency (mostly the Global Fund to fight AIDS, Tuberculosis and Malaria, the President’s Emergency Plan for AIDS Relief and the Bill & Melinda Gates Foundation); 14 (67%) obtained funding from two sources and 5 (24%) from three. One site reported funding from seven different sources.

Overall, 18 (86%) sites routinely used an electronic database. The median number of weekly hours spent on the database per 100 patients on ART was 3.6 (IQR: 1.6–5.1) for clerks and 1.5 (IQR: 0.3–5.7) for medical personnel (physicians or nurses). Four of the 18 sites (22%) exclusively employed clerks for data entry and two (11%) used medical staff only. Thirteen of the 21 sites (62%) had personnel trained in data management or data quality control. Sixteen of the 18 sites (89%) captured patient data by means of written charts during consultations. Three (14%) entered the data electronically during consultations.

Database characteristics and data quality measures

Among the 18 sites that had an electronic database, 11 (61%) used the same software for data collection and data management, six (33%) used two different packages and 1 site used as many as five because of various research and reporting requirements. Ten sites (56%) used generic software such as Microsoft Access (Microsoft Corporation, Seattle, WA, USA) or FileMaker Pro (FileMaker, Inc., Santa Clara, CA, USA), only 5 sites (28%) used systems developed for this purpose, such as FUCHIA (Follow-Up of Clinical HIV Infection and AIDS, from Doctors without Borders and EpiCentre) or ESOPE (from Ensemble pour une solidarité thérapeutique hospitalière en réseau, ESTHER). A relational database with files for demographic data, clinical events, drugs and follow-up examinations was in place in 12 (67%) of the 18 sites using electronic databases. Only three such sites (17%) used solutions based on a Structured Query Language (SQL)13,14 server, which allows management of very large numbers of patients. Nine sites (50%) reported that the database could be linked to other data, including laboratory (8 sites) and pharmacy (7 sites) databases, the nutritional support unit (2 sites) or the socioeconomic support unit (3 sites). Standardized export formats were available at 6 sites (33%); five used Extensible Markup Language (XML) and one used the Health Level Seven (HL7) standard.

Table 2 summarizes the main purpose of each database. Fourteen sites (78%) stated that both patient management and reporting requirements were important reasons for having the database. Also shown are the measures in place to ensure the quality of the data for selected key variables: CD4 counts, drugs and important dates (date of birth and dates of laboratory measurements, follow-up visits and death). Box 1 defines commonly used measures to improve and ensure data quality, including bounds checking, digit checks, fixed taxonomies, numerical alerts and Write Once, Read Many (WORM) computer data storage systems. For the variable CD4 cell counts, at least one of the measures listed in Box 1 was in place at 12 (67%) sites. Four (22%) sites reported two or more measures. Digit check and bounds checking were more frequently used (n = 6) than WORM systems (n = 4) and numerical alerts (n = 3). Fixed taxonomies of drugs were in use at 9 (50%) sites, and 3 (17%) sites additionally reported a WORM strategy. Regarding the four key dates, 7 (39%) reported WORM systems and 5 (28%) reported bounds checking for all four dates. Three (17%) reported both WORM and bounds for all four dates’ records. Use of controlled medical vocabularies was reported by 3 sites; all used the International statistical classification of diseases and related health problems, 10th revision (ICD-10). Twelve sites (67%) performed a daily backup, 4 sites (22%) did a weekly or monthly backup and 2 sites (11%) had no backup strategy in place.

Box 1. Measures to improve the quality of the data collected in clinical databases

Bounds checking: Automated checking of whether or not a number lies within a pre-defined numeric range of possible or likely values.

Check digit: Additional number added to a unique identifier to check for errors when entering identification numbers. The check digit is calculated from the other digits in the identification number and is designed so that it will not match if any of the other digits is incorrect.

Fixed taxonomy: Predefined names assigned to a variable that prevent free text-related problems, including spelling mistakes and inconsistent terminology. Examples include the ATC code for drugs.

Numerical alert: System alerting the operator when a number is not expected (i.e. value out of range, or of a critical nature) which will prompt the operator to verify the number or take other appropriate actions.

WORM (Write Once, Read Many): Any type of data storage to which data can be written to only a single time, but can be read from any number of times. This prevents the user from accidentally or intentionally altering or erasing the data.

Tracing of patients and missing data

Fifteen (71%) of the 21 sites indicated that they traced patients lost to follow-up. This included outreach teams at 11 (52%) sites, collaboration with community-based organizations at 5 sites (24%) and checking death registry data at 7 (33%) sites. The majority of the registries consulted were local death registries, such as hospital registries. Fourteen sites (67%) indicated they recorded when patients moved to another clinic and transferred their records on these occasions. Among the 18 sites with electronic databases, 5 (28%) had automatic alerts for missed visits.

Analyses of missing data were based on 41 936 patients from 19 sites and analyses of loss to follow-up on 36 149 patients from 18 sites participating in the ART-LINC collaboration. Table 3 shows the proportion of missing values for the key variables that we used to create the missing data index, as well as the proportion of patients lost to follow-up. There was considerable variation across sites, as indicated by wide inter-quartile ranges. Missing data were more frequent for variables relating to laboratory measures than for demographic or clinical information. The median missing data index was 10.9% and the median proportion of patients lost to follow-up at 1 year was 8.5%.

As shown in Fig. 1 and Table 4, training of staff and clerk-hours spent per week per 100 patients on ART were associated with a decreased likelihood of there being missing data. The figure shows that the variance decreases as the amount of time spent on the database increases. About 10 hours per week per 100 patients on ART were required to lower the proportion of missing data for key variables to below 10%. Interestingly, the four programmes with lower clerk-hours spent on data and lower levels of missing data tended have a strong research component. The amount of time spent by medical staff was only weakly associated with the missing data index (Table 4). The proportion of the population living on less than US$ 1 per day was also positively associated with missing data. Loss to follow-up was negatively associated with the number of active tracing strategies in place. The effect of the individual measures was similar: In the univariate logit models, the OR for loss to follow-up was 0.58 (95% CI: 0.25–1.34) for the presence of an outreach team; 0.54 (95% CI: 0.24–1.18) for collaboration with community-based organizations; and 0.41 (95% CI: 0.18–0.94) for collaboration with death registries. Finally, there was a positive correlation between the proportion of patients lost to follow-up and the proportion of data missing for key variables: the Spearman rank correlation coefficient (ρ) was 0.51 (P = 0.031).

Fig. 1. Missing data index (median of percentage of data missing in six key variables) and hours spent by data clerks on the database each week
Fig. 1. Missing data index (median of percentage of data missing in six key variables) and hours spent by data clerks on the database each week
Fig. 1. Missing data index (median of percentage of data missing in six key variables) and hours spent by data clerks on the database each week


This survey indicates that EMR systems could play an important role in the scale-up of ART in lower-income countries. However, it also shows that the quality of the data collected and the retention of patients in treatment programmes are often unsatisfactory, mainly because of staff that is insufficient or inadequately trained to manage data and trace patients lost to follow-up.

Antiretroviral database anarchy?

Most databases relied on software intended for personal or small business use, rather than on more sophisticated systems required for programmes with an exponential growth of patient numbers. Some sites used several software packages (as many as five in one clinic) to meet the requirements of different funding agencies, a practice leading to inefficiency and poor data quality. Our findings support the call for the development of affordable and sustainable solutions for medical record-keeping in ART scale-up sites.15 International agencies could prevent database anarchy in ART programmes by jointly developing systems able to support clinical care, monitoring and evaluation, and operational research. Suitable data exchange standards that allow for the transfer of patients from one system to another and facilitate the aggregation of data at the country level are also important. Open source databases, such as the Open Medical Record System (OpenMRS),16 can easily be adapted to local requirements and are increasingly used in the context of the scale-up of ART.1720 OpenMRS is also an example of successful South-South and South-North collaborations.21

The price of quality

We examined factors influencing the completeness of the data in the ART-LINC database. The training of staff was strongly associated with more complete data, but many programmes had no trained staff on site. The time spent on the database by data clerks was also positively associated with more complete data. About 10 hours per week per 100 patients on ART appear to be required for the proportion of missing data for key variables to drop below 10%. The use of medical staff was found to be less effective for improving data quality and is generally an inefficient use of resources. A few simple database features, including bounds checking, check digits, and numerical alert and WORM systems, can improve data quality. However, these measures have not been widely implemented. Similarly, standardized coding using fixed taxonomies or controlled medical vocabularies is not commonly used. Many sites are thus struggling with identifying and correcting data errors and inconsistencies after they occur, which is time-consuming and rarely fully successful. Data security is also of concern, with some sites not performing backups regularly.

Minimizing losses to follow-up

The quality of both the health care provided and the data collected to evaluate it depends on a complete follow-up of patients and their retention in care. Loss to follow-up at 1 year was substantial, in line with the results of other studies.22,23 Most sites reported using some method for tracing patients, but many relied on only one approach, such as outreach teams or collaborations with community-based organizations. The concurrent use of several strategies appears to be most effective in reducing losses to follow-up. Fewer missing data correlated with fewer losses to follow-up, suggesting that better databases might contribute to retaining patients in programmes. Alternatively, limited resources may lead to both poor follow-up and poor data quality. Indeed, it was more difficult to achieve good data quality and to retain patients in more deprived settings. Previous analyses of the ART-LINC data showed that fees for services are associated with higher losses to follow-up.3

Strengths and weaknesses

Fraser et al. recently discussed five HIV treatment programmes using EMR systems to reduce loss to follow-up in Haiti, Kenya, Malawi and Zambia.7 Three of the systems reviewed were also included in our survey. In Haiti, Partners In Health, which is Zamni Lasante’s flagship programme, implemented a web-based medical record system in several HIV treatment sites on the Central Plateau.24 Automated e-mail alerts are used to promote timely initiation of treatment.7 The system is web-based, with the central server based in the United States and sites connected via satellite links. Data can be entered off line when the connection to the network is lost. The system has advantages over local servers in terms of stability and security but raises issues of data ownership. In Zambia, automated reports generated by the EMR systems supported the tracing of patients lost to follow-up by community health-care workers.22

Our survey was conducted in 21 different clinics and cohorts providing ART to over 50 000 persons living on three continents. We could directly examine to what extent EMR systems and other programme characteristics correlated with data quality and rates of loss to follow-up. However, participating sites were heterogeneous and represented a convenience sample. The generalizability of our results is therefore uncertain, but clearly there are important problems in data collection, data management and patient follow-up.

Implications for clinical care and operational research

The collection of incomplete and inaccurate data hampers the provision of high-quality care, the monitoring of patients over time, and their retention in treatment programmes. Failure to retain a high proportion of patients in care negates much of the potential benefit of ART treatment programme, since most patients start ART with advanced disease and are likely to die within weeks or months if therapy is discontinued. Although the scale-up of ART in these settings has been a formidable achievement and many deaths have been prevented, these issues need to be addressed to ensure and document the long-term success of ART in these settings.

Missing data are common in operational research. A popular approach is to restrict analyses to individuals with complete data. Although such “complete-case” analyses are often unbiased, they may be biased if patients with missing data are not typical of the whole sample. They are also inefficient because of the reduced sample size.25 In the context of ART programme research, the inclusion of all patients is important; loss to follow-up is likely to result in underestimated mortality.3 Although multiple imputation of missing values is increasingly used to overcome the problem of missing data,26 the assumptions made in such analyses are difficult to verify.27


Our study suggests that promoting appropriate and sustainable databases and systems to trace patients should be a priority in the context of scaling up ART. Patients could benefit both directly and indirectly from improved data quality, since accurate clinical data are a prerequisite for high standards of care and monitoring, which in turn supports patient retention in the programme. These issues may not have received sufficient attention from the governmental and nongovernmental organizations driving the scale-up of ART in resource limited settings. ■


We are grateful to WHO and CDC for supporting the Entebbe workshop and to all those who attended: Akum Aveika (Gambia); Abdoulaye Kalle (Mali); Alain Azondekon (Benin); Aluonzi Bosco (Uganda); Santhanam Anand (India); Andrew Mugisha (Uganda); Catherine Orrell (South Africa); Claudio Faulhaber (Brazil); Cyrille Franck Soppi (Côte d’Ivoire); Tim Meade (Zambia); Dembele Issiaka (Mali); Jean Marie Ntibigarura (Burundi); Jules Bashi Bagendabanga (Benin); Labake Akintunde (Nigeria); Margaret Pascoe (Zimbabwe); John Lelei (Kenya); Daouda Minta Kassoum (Mali); Monica Katyal (USA); Nicholas Musinguzi (Uganda); Tanakorn Apornpong (Thailand) and Karina Visser (South Africa). We are also grateful to Mauro Schechter, Eduardo Sprinz and François Dabis for their helpful comments on an earlier draft of this paper.

Funding: The ART-LINC collaboration of the International Epidemiological Databases to Evaluate AIDS (IeDEA) is funded by the United States National Institutes of Health (Office of AIDS Research and National Institute of Allergy and Infectious Diseases) and the French Agence nationale de recherches sur le sida et les hépatites virales.

Competing interests: None declared.