Comparative Genomics of Vibrio cholerae from Haiti, Asia, and Africa

A strain from Haiti shares genetic ancestry with those from Asia and Africa.

T he current (seventh) cholera pandemic was caused by serogroup O1 El Tor biotypes of Vibrio cholerae. This biotype fi rst emerged on the Indonesian island of Sulawesi in 1961, then subsequently spread throughout Asia and Africa, where endemic and epidemic disease persists today (1,2). Seventh cholera pandemic biotypes were introduced into Peru in 1991 and subsequently spread across South and Central America, but these biotypes never reached the island of Hispaniola. Recent endemic and epidemic cases in Asia and Africa are increasingly attributed to genetically atypical El Tor variants that share characteristics of classical and El Tor strains (1,3,4).
After the 2010 earthquake in Haiti, an outbreak of cholera emerged that resulted in >385,000 infections and 5,800 deaths as of July 7, 2011 (5). The outbreak strain quickly spread to the neighboring Dominican Republic and globally as travelers returned home from affected regions (6,7). Concurrent cholera cases in the United States, linked by travel to cholera-endemic regions in Asia and Africa, were identifi ed by national surveillance activities of PulseNet USA (Centers for Disease Control and Prevention [CDC], Atlanta, GA, USA.) Serotyping, biotyping, and pulsed-fi eld gel electrophoresis (PFGE) fi ngerprinting investigations suggested that the travel-associated cases could be genetically related to the Haiti outbreak strain (8). Because of the historical absence of cholera in Haiti before the 2010 earthquake, speculation abounds that the outbreak strain was imported into Haiti. Although clonality of the Haiti outbreak strain has been inferred by phenotypic characterization and genotypic subtyping, thereby supporting a single foreign source hypothesis (6,8), defi nitive evidence, e.g., by wholegenome sequencing for the genetic ancestry of the Haitian strain is lacking.
Preliminary comparative analysis of whole-genome sequences from two 2010 Haiti outbreak isolates with genomes from historical cholera cases resulted in speculation that the outbreak originated in southern Asia (9). However, this study lacked recent, globally distributed cholera case isolates and particularly lacked studied genomes from Africa, to which cholera is endemic. We selected contemporary V. cholerae isolates from clinical infections, attributed to geographically distinct locations and sharing PFGE fi ngerprints with Haiti outbreak strains, from the PulseNet USA database for comparative whole-genome analysis. Although detailed epidemiologic investigations are essential for unequivocally attributing geographic origin(s) and means of cholera introduction into Haiti, genome sequences of these 23 contemporary isolates showed details related to genetic content and diversity that were otherwise missed with lower-resolution PFGE subtyping, thereby providing useful genetic ancestry information for interpreting the outbreak in Haiti.

Patients and Isolates
V. cholerae isolates and travel histories from cholera case-patients in the United States were referred to CDC. A strain from an outbreak in Cameroon in 2010, isolated from a specimen received at CDC, and an isolate from South Africa likely linked to an outbreak in Zimbabwe in 2009 were also included in this study (10). Isolates C6706 and 3569-08 were acquired during the outbreak in Latin America in 1991 and from the US Gulf Coast in 2008, respectively. All strains were characterized as V. cholerae O1 on the basis of standard biochemical, cholera toxin, and serologic testing performed as described (11,12). PFGE was performed according to the PulseNet standardized protocol with restriction enzymes Sfi I and NotI; PFGE patterns were designated by using BioNumerics version 5.10 (Applied Maths Inc., Sint-Martins-Latem, Belgium) and compared by unweighted pair group method with arithmetic mean analysis (DICE coeffi cient 1.5% tolerance and optimization). Strain designations and other information are shown in Table 1.

Whole-Genome Data Acquisition, Assembly, and Annotation
Single-end pyrosequencing reads (GS FLX-Titanium; Roche Diagnostics, Indianapolis, IN, USA) and single-end 36-bp or 76-bp Illumina reads (GAIIe sequencer; Illumina, San Diego, CA, USA) were acquired and yielded >99% genome coverage and 32× and 240× average coverage depths, respectively (Table 2). Pyrosequencing reads were fi rst assembled de novo by using Newbler version 2.5.3 (Roche Diagnostics). To correct potential base-calling errors attributed to homopolymers, Illumina GAIIe reads (average 14 million reads/genome) were mapped to the Newbler contigs by using CLC Genomics Workbench version 4.5 (www.clcbio.com/index.php?id=1042) and yielded an average combined coverage depth of 270× per genome.
Both chromosomes of Haiti outbreak isolate 2010EL-1786 were sequenced to full closure by using PCR and Sanger sequence-based bridging of contigs and a fosmid library of templates. Optical mapping also supported the contig ordering derived for 2010EL-1786. For all remaining isolates, Illumina-supplemented, homopolymercorrected, Newbler-assembled contigs were prepared as pseudogenomes by fi rst linking contigs with a linker sequence containing stop codons in all 6 translation reading frames. These high-coverage pseudogenomes were used for downstream analyses. Identifi cation of coding sequences was achieved by using Glimmer3 (14). Genome annotation was achieved by using an automated, in-house, modifi ed version of GenDB version 2.2 (15) and manual curation for regions of interest.

Whole-Genome Alignment and Core Genome Phylogeny
Whole-genome alignments of all study isolates and 5 available reference V. cholerae genomes (Table 1) were performed by using Progressive Mauve (16) and visualized by using PhyML 3.0 (17). To determine vertical inheritance patterns, study genomes were analyzed with historical V. cholerae genomes (isolates M66-2, MJ-1236, CIRS101, and N16961) by using phylogenetic analysis of high-quality single-nucleotide polymorphisms (hqSNPs) contained in core genes. Coding region predictions were analyzed by using parallelized BLASTn (http://blast.ncbi.nlm.gov/Blast. cgi) to identify highly similar orthologs in all strains. Highly similar orthologs were defi ned as those containing a highscoring segment pair >400 bp and identity >97%. Each orthologous loci set was multiply aligned by using ClustalW (18). Multiple alignments were manually inspected to remove erroneously aligned regions; indel-associated SNPs and loci containing >30 SNPs were also excluded. Each SNP column from each multiple nucleotide alignment was analyzed for hqSNPs, defi ned as those containing no gaps or ambiguous basecalls, and having an adjusted quality score >90 (of a maximum score of 93). A total of 4,376 hqSNPs were identifi ed from 632 orthologous loci and extracted from the alignments to prepare a compressed pseudoalignment composed of hqSNPs (online Technical Appendix 1, www.cdc.gov/eid-static/spreadsheets/11-0794-Techapp1.xls). This pseudoalignment was used to build a maximum-likelihood phylogenetic tree by using PhyML 3.0 (17). Branch confi dences were estimated by using the approximate likelihood-ratio test (19).

BLAST Atlases
A circular BLAST atlas was generated for each chromosome by using Haiti isolate 2010EL-1786 as mapping reference. Glimmer3 was used to predict coding sequences contained on pseudogenomes for the remaining isolates sequenced in this study and for 4 available genomes (14). Reference isolate 2010EL-1786 was mapped against the resulting translated coding sequences by using BLASTx with a percentage identity cutoff value of 70% and an expected cutoff value of 1 × 10 -10 for high-scoring segment pairs >100 aa. The results were visualized by using GView (20). Sequence accession numbers are shown in Table 1.

Sfi I and NotI PFGE Patterns of Recent Global Cholera Isolates
Nine V. cholerae isolates directly associated with the outbreak on Hispaniola were examined, 7 of which had  (21). Although all sequenced clinical isolates were serogroup O1, Inaba and Ogawa serotypes were observed among PFGE pattern-matched isolates ( Table 1). All strains were biotype El Tor and all produced cholera toxin.

Phylogenetics of Strains
Haiti outbreak isolates and 12 global PFGE patternmatched V. cholerae isolates belong to phylogroup 1 of the seventh pandemic clade. The phylogenetic tree based on whole-genome sequencing showed clustering of the 9 Hispaniola isolates (8 from Haiti and 1 related isolate from the Dominican Republic) with 12 other PFGE patternmatched isolates. All 21 isolates were in 1 cluster relative to non-PFGE-pattern-matched outliers (Figure 1). When compared with historical reference genomes, the closest ancestors for Haiti genome sequences (2010-2011; derived herein) were isolates CIRS101 from Dhaka, Bangladesh (2002) and MJ-1236 from Matlab, Bangladesh (1994). These data confi rm the genetic relatedness also inferred by PFGE subtyping and further support inclusion of the Haiti outbreak isolates in phylogroup 1 of the seventh pandemic clade ( Figure 1). The whole-genome sequencing dataset showed that additional underlying genetic diversity was present across PFGE pattern-matched isolates (including 9 isolated from Hispaniola) not observed by PFGE subtyping.

Common Mobile Elements and Genes of Haiti Outbreak Strain and PFGE Pattern-matched Isolates
V. cholerae macrodiversity is commonly attributed to presence or absence of mobile genetic elements (22). The contiguous genome derived for Haiti isolate 2010EL-1786 was used as the outbreak type strain and harbored 2 circular chromosomes of 3.03 Mbp (chromosome I) and 1.05 Mbp (chromosome II), which encoded 2,920 and 1,051 predicted coding sequences, respectively. Pairwise comparisons of all coding sequences from each study genome with all coding sequences from reference isolate 2010EL-1786 (all vs. all comparison) showed congruent gene content and low overall diversity on larger chromosome I ( Figure  2). One noteworthy exception was the absence of Vibrio pathogenicity island 1 in the 2005 isolate 3582-05 from Pakistan. This island contains essential cholera virulence factors, including the tcp gene cluster, which encodes toxin-   Figure 2; online Technical Appendix 2 Figure 1, panel A, wwwnc.cdc.gov/EID/pdfs/11-0794-Techapp2.pdf). Smaller chromosome II was more content variable and divergent across study strains. These fi ndings were largely attributable to the hypervariable superintegron region, an ≈120-kb gene capture system predominantly encoding hypothetical proteins (Figure 2; online Technical Appendix 2 Figure 1, panel B) (13). Gene polymorphisms observed in the 9 sequenced isolates from Hispaniola also localized primarily within the superintegron region.
Despite these observed differences, no major deletions in the superintegron were observed among PFGE patternmatched isolates (Figure 2; online Technical Appendix 2 Figure 1). Thus, phylogeny derived from V. cholerae whole-genome sequencing (Figure 1) showed genetic diversity within PFGE pattern-matched isolates. However, binary (present or absent) gene content assessment failed to pinpoint extensive contiguous diversity outside the superintegron region.

Comparison of Haiti Outbreak Genomes
Across the 18 described hypervariable V. cholerae mobile genetic elements sequences (representing >300 kb of the total genome), no macroscopic differences were observed among the 9 Hispaniola isolate sequences ( Figure 2; online Technical Appendix 2 Figure 1), and as stated, only 2 hqSNPs were identifi ed in the core genome. Pairwise alignment of the complete genome of study reference 2010EL-1786 with available genome data for 2 sequenced Haiti 2010 outbreak isolates, designated H1 and H2 (9), showed only 3 polymorphisms across the entire genome. However, because the available H1 and H2 consensus sequences contain ambiguous basecalls, these nucleotides were excluded from our comparative analyses. Nonetheless, these data confi rm the clonal nature of the Haiti outbreak strain.

Structural and Alleleic Profi les of Isolates Carrying a Hybrid Cholera Toxin Prophage
Structure and allelic profi les of the CTXφ prophage have been used for V. cholerae lineage analysis (23). Chromosome I of Haiti isolate 2010EL-1786 harbors 1 hybrid CTXφ characterized by a 1-nt variant of the classical ctxB allele (ctxB-7) and El Tor rstR fl anked by a toxinlinked cryptic element and El Tor-type RS1 element with an intact rstC locus (Figure 3). The SNP at ctxB codon19 results in replacement of the classical cholera toxin B histidine residue with asparagine, and this ctxB-7 allele was observed among all Hispaniola isolates ( Table 1). Five of the 12 PFGE pattern-matched isolates from other locations (2008-2010) also shared this variant ctxB allele. The remaining 7 PFGE pattern-matched isolates encoded classical ctxB alleles.

Discussion
Public health investigators use PFGE, the current standard technique for subtyping most bacterial enteric pathogens, to link patients infected with a particular pathogen to a specifi c infection source(s) by fi ngerprint matching to pathogens isolated from environmental samples. Whole-genome sequencing has recently emerged as an enhanced laboratory tool for high-resolution analysis of microbial diversity and has been successfully used to investigate bacterial disease outbreaks (24)(25)(26). Because whole-genome sequencing can provide pathogen genetic fi ngerprints at single-nucleotide resolution, it should revolutionize the diagnosis, surveillance, and control of microbial diseases.
For molecular epidemiologic investigations using whole-genome sequencing, an expansive number of isolates from an outbreak would ideally be selected to ensure broad coverage for possible genotype variants within that population that might otherwise be masked with lower-resolution typing methods. In addition, outlier isolates from different locations that are indistinguishable or related by several diverse subtyping methods should also be subjected to whole-genome sequencing to contextualize the diversity seen within the outbreak population and to fi nd other clonal relationships In this study, a temporal and geographic distribution of outbreak isolates was selected to confi rm clonality of the outbreak strain and to gain insight into the microevolution of V. cholerae during an outbreak. Additionally, minor PFGE and nonhemolytic variants observed among outbreak isolates were also sequenced to confi rm their clonal relationships with isolates exhibiting the main outbreak pattern and phenotype. The PulseNet USA database substantially contributed to this work by identifying genetically related (using PFGE typing) and epidemiologically relevant isolates for whole-genome sequencing analyses. Notably, one 2008 isolate from a traveler from the United States to Nepal was identifi ed and included in this study, although we acknowledge that the evolutionary relationship of the Haiti strain to strain(s) circulating in Nepal during 2010 may not be ideally represented by this 2008 isolate. Microbial evolution will have occurred during 2008-2010, and global travel may have introduced additional strains into Nepal in the interim, such that the 2008 isolate from Nepal may differ substantially from a strain circulating in Nepal in 2010, the suggested progenitor of the outbreak strain. Unfortunately, 2010 isolates from Nepal were not available for analysis.
Also identifi ed in the PulseNet USA database was 1 PFGE pattern-matched isolate from western Africa. The close genetic relationship of this isolate from Cameroon to the Haiti strain suggests that a potential link between western Africa and the Haiti outbreak cannot be ignored. Further studies on additional isolates from western Africa are required to confi rm or refute this possibility. Similarity of whole-genome sequences for Haiti isolates, PFGE pattern-matched isolates, and other seventh pandemic strains confi rmed the clonal nature of the 2010-2011 cholera outbreak strain and the close genetic relationships for the studied strains initially suggested by PFGE subtyping (Figure 1). Previous V. cholerae studies have reported that seventh pandemic strains are clonal, sharing near identical gene content on a highly related genome backbone but containing variable mobile genetic elements or gene cassettes (27). Despite dynamic horizontal gene transfer (22), we identifi ed only a few nucleotide differences among mobile sequences of the 9 sequenced 2010-2011 outbreak-related Hispaniola isolates and the 12 recent PFGE patternmatched clinical isolates (Figure 2).
Extensive recombination in V. cholerae genomes may confound evolutionary relationship analyses as strains and lineages undergo reassortment (1). However, base substitutions acquired horizontally as recombination segments generally occur with localized density (28). Although we cannot guarantee that recombinant segments were absent from the core genome phylogeny (online Technical Appendix 2 Figure 2), the even spatial and genome-wide distribution of core genome hqSNPs suggests that they were vertically inherited. We have derived a useful phylogenetic approximation of isolate relatedness on the basis of hqSNPs, which supports shared ancestry for the Haiti outbreak isolates and 12 recent clinical isolates sharing PFGE patterns (online Technical Appendix 2 Figure 2). Sequenced isolates from India and Cameroon (2009-2010) were shown to be the closest genetic relatives among the non-Hispaniola isolates (isolated in 1991-2010; this study) and 4 other available reference V. cholerae genomes (isolated in 1937-2002). The ctxB allele variant (ctxB-7) of the Haiti strain (and its genetic relatives) was fi rst observed among isolates from a cholera outbreak in Orissa, India, in 2007 (29), but the ctxB-7 allele has since also been observed in isolates from southern Asia and more recently from western Africa (8,30).
The genetic makeup of the Haiti outbreak strain will likely have substantial public health implications for Haiti and other susceptible locations. Our reasoning is that the atypical O1 El Tor V. cholerae strains (CIRS101 and CIRS101-like variants) have already emerged as the predominant clone causing cholera in Asia and Africa and have displaced prototypical O1 El Tor strains (3,4,29). Unfortunately, atypical O1 El Tor V. cholerae strains appear to have retained the relative environmental fi tness of their prototypical O1 El Tor ancestors while acquiring enhanced virulence traits, such as classical or hybrid CTX prophage and SXT-ICE (4). Thus, with higher relative fi tness and virulent and antimicrobial drug-resistant phenotypes, the Haiti outbreak strain harbors infectivity and ecologic persistence advantages over other seventh pandemic strains. Consequently, the Haiti outbreak strain (or its genetic ancestor) may easily replace current El Tor V. cholerae strains circulating in the Western Hemisphere to become endemic (like other atypical El Tor strains) and will likely cause future outbreaks. Such dire predictions warrant enhanced epidemiologic surveillance and renewed priorities aimed at cholera prevention.
Absence of cholera in Haiti over the past century; the clonal nature of the outbreak strain; and a massive infl ux of international travelers, aid workers, and supplies after the 2010 earthquake suggest an outside infection source for the 2010-2011 outbreak. Our core genome phylogeny (online Technical Appendix 2 Figure 2) suggests that the Haiti outbreak strain most likely derived from an ancestor related to isolates from within or near the Indian subcontinent. However, concurrent identifi cation of a 2010 isolate from Cameroon as a close genetic relative of the Haiti outbreak strain illustrates that whole-genome sequencing on such a relatively small number (n = 27) of V. cholerae isolates is insuffi cient to exclude other plausible ancestral geographic locations.
Our study results are consistent with recent fi ndings of Chin et al. (9), who concluded that two 2010 Haiti outbreak isolates shared ancestry with variant O1 El Tor strains isolated in Bangladesh in 2002 and 2008 and a more distant relationship with an isolate from an outbreak in Latin American in 1991. The vertical inheritance pattern of hqSNPs in our study provide unequivocal genetic evidence for introduction of the outbreak strain into Haiti from an external source as opposed to local aquatic emergence. However, the specifi c geographic source and mode of entry of the outbreak strain into Haiti cannot be proven by microbiological investigations. Only largescale epidemiologic studies and microbiological data can provide conclusive evidence of how cholera was introduced into Haiti. This whole-genome sequencing study provides expanded evidence that variant O1 El Tor V. cholerae appeared in Haiti by importation and has generated a whole-genome sequencing dataset for future study.