Transcriptome-Scale Developing of Simple Sequence Repeat Markers in Coffee Arabica

Rapid Communication

Austin J Biotechnol Bioeng. 2019; 6(1): 1103.

Transcriptome-Scale Developing of Simple Sequence Repeat Markers in Coffee Arabica

Huang X1, Gbokie T2, Liu BH2, Wu WH1* and Yi KX1*

¹Environment and Plant Protection Institute, Chinese Academy of Tropical Agricultural Sciences, China

²College of Plant Protection, Nanjing Agricultural University, China

*Corresponding author: Wu WH and Yi KX, Environment and Plant Protection Institute, Chinese Academy of Tropical Agricultural Sciences, 4 Xueyuan Road, Haikou 571101, China

Received: April 22, 2019; Accepted: July 19, 2019; Published: July 26, 2019

Abstract

Coffee is an important beverage crop in the world and the main commercially cultivated species are Coffea arabica (arabica) and C. canephora (robusta). C. arabica is a dominant coffee specie with a high potential for genetic improvement and it normally starts fruiting three years after planting, thereby significantly extending its breeding process. Thus, molecular marker assisted selection would efficiently accelerate the process. In the present study, we conducted a largescale development of Simple Sequence Repeat (SSR) markers according to a high quality 454-pyrosequencing database. 1032 SSR loci were identified from 929 unigenes (6.66% of 13951). Mononucleotides (500, 48.45%), trinucleotides (411, 39.83%) and dinucleotides (98, 9.49%) were the main SSR types. The most abundant SSR motif was A/T (490, 47.57%), followed by AAG/CTT (126, 12.23%), ACG/CGT (88, 8.54%), ACT/AGT (70, 6.79%) and AG/CT (61, 5.92%). A total of 115 pairs of reported SSR primers were utilized for the SSR validation and two matched our results. Our work will expand the number of SSR loci and benefit relevant studies by applying these loci in C. arabica.

Keywords: Coffea arabica; Transcriptome; Simple sequence repeat markers

Abbreviations

AFLP: Amplified Fragment Length Polymorphism; SSR: Simple Sequence Repeat; SNPs: Single Nucleotide Polymorphisms; MISA: MIcroSAtellite; COG: Clusters of Orthologous Groups; EST: Expressing Sequence Tag

Introduction

Coffee is an important beverage crop in the world and goblally, the two main species of coffee that are commercially produced are Coffea arabica (arabica) and C. canephora (robusta). C. arabica, the dominant coffee species, contains a high potential for genetic improvement and normally starts fruiting about three years after being planted, which has significantly extended the crop breeding process [1]. Thus, molecular marker assisted selection would efficiently accelerate the process [2]. To date, molecular markers have been successfully utilized in germplasm evaluation of coffee, such as Amplified Fragment Length Polymorphism (AFLP) and Simple Sequence Repeat (SSR) [3,4]. But the markers employed in these studies are still on small amounts. In recent years, the fast development of next generation sequencing technology makes it possible for large scale marker-based germplasm evaluation [5]. A recent study has reported 1444 Single Nucleotide Polymorphisms (SNPs) associated with caffeine content by a draft genome sequence of C. arabica [1]. Although this genome data is still not released with the published study, transcriptome data in previous studies makes it possible for large scale SSR marker developing [6,7]. In this study, we conducted a large-scale development of SSR markers based on a high quality 454-pyrosequencing database [6]. This work has expanded the number of SSR loci which could complement available information and enhance future research efforts on the utilization of these loci in C. arabica.

Materials and Methods

Identification and characterization of SSRs

De novo assembly of 13,951 unigenes of C. arabica CIFC H147/1 from a previous study [6] was utilized for SSR detection by using the MIcroSAtellite (MISA) identification tool with default criteria [8]. The maximal number of bases interrupting two SSR motifs in a compound microsatellite was 20. All the SSR-contained unigenes were searched against the Clusters of Orthologous Groups (COG) data by BLASTx set and then classified into COG categories with a cutoff Expected value (E-value) of 1e–5 [9].

SSR validation with previous study

Primers from previous studies were downloaded to validate the SSRs in the present study [5,10]. These primers were transformed into FASTA format and each primer considered as a single sequence. Then, comparison was conducted by BLASTn-short procedure to search all the unigenes with SSR loci [11]. Each pair of primers that matched the same sequence with a sequence similarity over 95% was selected for SSR loci comparison. If these SSRs were the same with our result, they would be highlighted and also a reliable proof for our work.

Results

Identification of SSRs

A total of 13,951 unigenes from the previous study were utilized for SSR loci screening. As a result, we found 1,032 SSR sites in 929 unigenes with a frequency of 1 SSR per 8.53 kb sequences (Table 1). Among these, 87 unigenes had more than one SSR loci. 79, 6 and 4 sequences contained 2, 3 and 4 SSR loci, respectively. We separately counted the number of unigenes that contained SSR loci and used for SSR detection at different length intervals (Figure 1). 3.3%, 8.0%, 13.6%, 17.1%, 21.4% and 20.2% of unigenes contained SSR loci at 6 different length intervals, respectively.