High Concordance between the Whole Genome Sequencing data Analysis Methods for Vibrio Cholerae Surveillance in Finland and Norway

Special Article: Vibriosis

Austin J Vet Sci & Anim Husb. 2023; 10(2): 1118.

High Concordance between the Whole Genome Sequencing data Analysis Methods for Vibrio Cholerae Surveillance in Finland and Norway

Nyholm O1*, Tønnessen R2,4, Antony-Samy JK2, Halkilahti J1, Amato E3 and Salmenlinna S1

1Department of Health Security, Finnish Institute for Health and Welfare (THL), Finland

2Department of Infection Control and Vaccines, The Norwegian Institute of Public Health, Norway

3Department of Infection Control and Preparedness, The Norwegian Institute of Public Health, Norway

4European Public Health Microbiology Training Program (EUPHEM), European Centre for Disease Prevention and Control (ECDC), Sweden

*Corresponding author: Nyholm ODepartment of Health Security, Finnish Institute for Health and Welfare (THL), P.O Box 30, 00271 Helsinki, Finland

Received: January 23, 2023; Accepted: February 20, 2023; Published: February 27, 2023

Abstract

Vibrio cholerae infections, both vibriosis and cholera, are rare in Northern Europe. However, the coastal areas suitable for transmission have increased during recent years. Accessible and validated molecular diagnostic methods are needed to monitor such infections. Here, we describe the comparison and validation of Whole Genome Sequencing (WGS) data analysis methods for V. cholerae characterization in the public health institutes of Finland and Norway. The results showed a concordance of 96.7% between the methods in the two countries.

Keywords: Vibrio cholerae; Cholera; Vibriosis; Whole genome sequencing; Bioinformatics

Introduction

Cholera causes around 100,000 deaths each year globally [1]. In Northern Europe, classical cholera caused by toxin-producing Vibrio cholerae serotypes O1 and O139 is rare and almost exclusively travel-related. The occurrence of non-toxigenic, non-O1/non-O139 V. cholerae causing vibriosis varies and is more frequent during warm summers [2-4]. The coastal areas suitable for V. cholerae transmission increased substantially across countries between 2003 and 2019 due to climate change [5]. Therefore, molecular diagnostic methods able to detect and characterize V. cholera isolates are needed for laboratory preparedness purposes and for the surveillance of vibriosis and cholera.

We compared the Whole Genome Sequencing (WGS) data analysis methods for V. cholerae surveillance at the Finnish Institute for Health and Welfare (THL) and the Norwegian Institute of Public Health (NIPH). The methods consisted of species confirmation and virulence genes detection for V. cholerae in both countries with different bioinformatics software. Screening for six markers of toxigenic V. cholerae [6] were included. These markers were genes for detection of V. cholerae species (toxR), cholera toxin (ctxA), serogroups O1 (wbeO1) and O139 (wbfO139), and biotypes classical and El Tor (tcpA variants). The aim of the study was to validate the WGS data analysis methods at THL, Finland, using the pipeline that has previously been established at NIPH, Norway.

Materials and Methods

We included Vibrio spp. sequences from 392 isolates in the comparison of the WGS data analysis methods (Table 1). These sequences included publicly available sequences from the NCBI database, clinical and environmental isolates from Finland, and three reference isolates obtained from the culture collection of University of Gothenburg. The Finnish isolates and the reference isolates were sequenced at THL. Briefly, the DNA extraction was performed using Mag Attract kit (Qiagen), library preparation using Nextera XT kit (Illumina), and sequencing using Illumina MiSeq NGS platform with 300 cycles kit (Illumina)To assess the performance of THL’s WGS data analysis methods, the same sequences (fastq files) were run at NIPH using the existing bioinformatics pipeline [7]. The bioinformatics software algorithms used to analyze the WGS data in THL and NIPH have been previously published and they are described in (Figure 1). THL’s bioinformatics pipeline uses Kraken2 [8] for species confirmation and contamination check and ReMatch [9] to screen for the six V. cholerae specific marker genes toxR, cholera toxin ctxA, serogroups O1 (wbeO1) and O139 (wbfO139), and biotypes classical and El Tor (tcpA variants). The NCBI accession numbers for the target genes and their GC contents were as follows: toxR KF498634.1 GC 45.2%, ctxA AF463401.1 GC 38.4%, wbeO1 KC152957.1 GC 39.6%, wbfO139 AB012956.1 GC 42.0%, tcpA_Classical M33514.1GC 40.3%, and tcpA_El_Tor KP187623.1 GC 43.4%. NIPH’s pipeline also used Kraken2 for species confirmation and contamination check, while ARIBA [10] was used for mapping the reads to databases and it was supplemented with a Python module to screen for the six V. cholerae marker genes. For ARIBA, the databases were custom made using the fasta sequences of the six marker genes. ARIBA uses Bowtie2 to map the reads to databases. ReMatch maps reads, utilizing also Bowtie2, onto a set of reference sequences to determine if the chosen loci of interest are either absent or present in a sample. The outcome was determined by evaluating sequencing depth as well as coverage and similarity when compared to the original reference sequence. The outputs from both pipelines were analyzed using a threshold of ≥80% coverage for the six markers.The results and the interpretation obtained using the methods at NIPH were compared with those at THL.