A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing

The parallel sequencing of targeted amplicons is a scalable application of next-generation sequencing NGS that can advantageously replace Sanger sequencing in certain DNA barcoding studies. It can be used to sequence different PCR products simultaneously, including co-amplified products. Here, we explore this approach by simultaneously sequencing five markers including the DNA barcode and a diagnostic marker of Wolbachia in 12 species of Halictidae that were previously DNA barcoded using Sanger sequencing.

Sequencing cost per marker and per specimen We provide guidelines for selecting NGS or Sanger sequencing depending on the goals of future studies. In insect systematics, these methods can be profitably used to 1 sequence multiple loci at relatively reduced costs, 2 improve single gene phylogenies and 3 assess the presence of cytoplasmic endosymbiotic bacteria such as Wolbachia Breeuwer and Werren ; James et al.

These bacteria are frequently detected in Halictidae and can affect the transmission of the mitochondrial genome Smith et al.

All these issues can affect gene trees in Hymenoptera Magnacca and Brown ; Cristiano et al. Here, we implemented the parallel sequencing of targeted amplicons to 1 re-sequence the COI barcode fragment, 2 sequence three nuclear gene fragments and 3 sequence a fragment of the Wolbachia outer surface protein gene in 12 Halictid species that were recently studied by DNA barcoding using Sanger sequencing Pauly et al. These species belong to Halictus Seladonia Robertson, [or Seladonia depending on its assignment as a subgenus Michener or genus Pesenko , ] and include five species belonging to the H.

While COI data strongly supported the delineation of these five species, they did not fully resolve the phylogenetic relationships of the group Pauly et al. The present small-scale NGS implementation explores to what extent NGS can effectively contribute to solve the aforementioned issues. We sampled 21 specimens Table I representing five of the six species of the Halictus smaragdulus complex and seven closely related Halictidae species showing the smallest interspecific p-distances at COI with respect to the complex Pauly et al.

One species of the complex H. Most specimens were collected after and were captured with a net, killed with ethyl acetate and stored in absolute ethanol. We targeted five gene fragments; four of them were used for phylogenetic tree reconstructions and included COI and three nuclear markers that were previously used for phylogenetic analysis in hymenopterans, viz.

Finally, a gene fragment of the Wolbachia outer surface protein wsp was used to assess the presence of Wolbachia. MiSeq data were demultiplexed and cleaned using Trimmomatic v. AlienTrimmer v. Paired-end reads were assembled with PEAR v. Reads obtained for wsp were used to identify Wolbachia haplotypes using the Wolbachia wsp typing module of the Wolbachia multilocus sequence typing MLST system Baldo et al. For these assemblies, we calculated the average rate of substitution per base.

Geneious v. Phylogenetic analyses were conducted on different datasets in order to compare topologies and resolutions obtained with the different gene fragments: COI 21 specimens, bp , wnt1 14, bp , w 21, bp , HOG 19, bp , the concatenation of the three nuclear fragments 17, bp and the four fragments 17, bp. In order to assess the added value of including nuclear fragments to a COI phylogeny, we compared the topologies of the COI dataset including only specimens used in the concatenated datasets 17, bp with that of the concatenated dataset.

Unique haplotypes were extracted using the R packages ape Paradis et al. When alternative haplotypes were observed for the same individual, phylogenetic analyses were repeated with the different haplotypes instead of using the consensus sequences. Neighbour-joining trees were constructed in MEGA 7. Maximum parsimony MP trees were searched using the R package phangorn Schliep , using the parsimony ratchet heuristic method Nixon , with characters of equal weights, gaps considered as missing data and using non-parametric bootstrap replicates.

For Bayesian phylogeny inference BI , best partition scheme and best-fit substitution models were estimated using PartitionFinder v. BI analyses were performed with MrBayes v. Overall, , reads paired and unpaired were assigned to the targeted gene fragments Figure 2. Read quality scores Phred ranged from 28 to 40 mean values between 38 and 39 depending on the specimens. The average rate of substitution per base varied from 0. Numbers of reads per specimen obtained for each DNA fragment Table I ranged from zero for the two old museum specimens to 21, reads for w in AP The COI alignment comprised variable sites and showed interspecific p-distances ranging from 2.

The nuclear data wnt1, w and HOG comprised 36 variable sites and showed interspecific p-distances ranging from 0 to 2. Two variant characters were observed with relative frequencies of 0. Other variant characters found in wnt1 with a frequency of 0. The intra-individual p-distances among these haplotypes were 0. These values were within the range of interspecific distances measured here 0—2. No variant was observed for Wolbachia COI. The phylogenetic relationships within the H.

Variant haplotypes affected neither the topology, nor the support in the trees. Phylogenies obtained using COI only were slightly less resolved than those obtained using the four gene fragments Figure 3. Those solely based on nuclear data both separate and concatenated datasets only supported a few nodes outside the species complex Online Resource. The only nodes that were never resolved concerned the relationships among H.

Analyses exclusively based on nuclear data are presented as Online Resource. Wolbachia sequences of wsp were obtained in eight out of the 21 specimens, with 14 to reads per specimen Table I. The eight wsp positive specimens belonged to five species Table I : H. All haplotypes queried in the Wolbachia MLST database provided a perfect match with Wolbachia sequences of the supergroup A, a clade of Wolbachia strains commonly found in Hymenoptera Casiraghi et al.

Five different sequences of the hypervariable region 1 HVR1 of wsp, coded as numbers 1, 11, 13, 51 and 53, in the Wolbachia MLST database were observed. One or two different HVR1 sequences were detected per specimen. We observed mainly HVR1: 11 in H. Types of hypervariable region 1 HVR1 identified in the Wolbachia surface protein wsp gene fragment surveyed in this study. Values indicate number of front end reads matching a HVR1 type. Parallel sequencing of PCR amplicons is most effective when limited sequence data are targeted per specimen Mamanova et al.

This is the case for DNA barcoding or multilocus phylogenetic analyses. Compared to Sanger sequencing, it can improve the sequencing sensitivity fewer false negatives and accuracy by enabling the simultaneous detection of co-amplified products such as homologues, paralogues and contaminants Grover et al. Below, we evaluate the added value of the protocol applied here compared to standard DNA barcoding using Sanger sequencing. Success rate of parallel amplicon sequencing is expected to highly depend on the PCR amplification.

The low sequencing depths obtained here for older museum specimens were not considered reliable. A more uniform molarity of the PCR products and a selection of the Illumina reagent kit in accordance with the number of samples processed can further improve this cost-efficiency. The labour cost was higher 1 person month than for Sanger data analysis 0. The average substitution rate per base calculated for each assembly was within the expected range of sequencing error rates reported for amplicon sequencing with the Miseq Illumina platform Schirmer et al.

Variant haplotypes observed with relative frequencies of 0. The other variants observed with a frequency of 0.

Indeed, the uneven distribution of sequencing errors along sequencing reads can explain some more frequent sequencing errors Schirmer et al. Concerning COI, the reads obtained for three specimens both specimens of H. These variants are not cross-contaminants because they are different from the COI haplotypes sequenced in the other individuals.

They are more probably due to heteroplasmy. These variant haplotypes did not affect the phylogenetic trees because both species investigated here H. However, the intra-individual divergences observed here up to 2. Detecting such variants is therefore essential in DNA barcoding. Concerning the detection of numts, we did not observe stop codons or shifts in the reading frame but we cannot totally exclude that nuclear copies were amplified.

In this regard, our approach does not offer more guarantees than Sanger sequencing as it also relies on the PCR amplification of small DNA fragments and can be biased by different amplification efficiencies Cruaud et al. Sequencing the whole mitochondrial genome represents a better solution to detect numts Nelson et al.

The lack of resolution of the trees exclusively constructed with nuclear data was not useful to check the species delineation obtained with COI. In contrast, some deeper nodes were only resolved in the analyses combining COI and the three nuclear gene fragments Figure 3.

With this dataset, the two clades identified by morphology Pauly et al. The Halictidae comprises thousands of species that are often difficult to identify morphologically and whose taxonomy is regularly being refined using COI sequence data.

Although COI data provide good support for most morphologically described halictid species Schmidt et al. It is therefore useful to consider additional loci or genome skimming Marcus both for a better species delineation and for a better understanding of interspecific phylogenetic relationships Danforth et al. Obviously, the set of loci analysed here was not useful for species delineation but it clarified the evolutionary history of the species studied. The detection of the wsp gene in more than one third of the specimens reveals a high prevalence of Wolbachia in the group under study.

Although Wolbachia infections were observed previously for the genus Gerth et al. In five of the eight infected individuals, two different HVR1 sequences were detected. This is also in agreement with previous studies revealing the co-occurrence of more than one Wolbachia sequence type in insects Breeuwer et al.

The parallel sequencing of targeted amplicons, as applied here, can advantageously replace DNA barcoding in two cases: when a multilocus dataset has to be assembled for a considerable number of specimens and when variant haplotypes are expected in the sampling. Indeed, our experiment was useful to construct a multilocus dataset consisting of DNA barcodes COI and three nuclear gene fragments with a cost-efficiency that is estimated to become interesting compared to Sanger sequencing when more than specimens are investigated.

Our experiment also enabled the detection of variant COI haplotypes with intra-individual divergences in the range of interspecific distances in Halictidae and mixed sequence types of the intracellular bacteria Wolbachia. This relatively cheap application of NGS may therefore be useful in bee systematics, when these cases are encountered.

Abouheif, E. Science , — Baldo, L. Batovska, J. G3-Genes Genom. Bolger, A. Bioinformatics 30, —

Corylus L. Taxonomic and phylogenetic relationships of Corylus species have long been controversial for lack of effective molecular markers. In this study, the complete chloroplast cp genomes of six Corylus species were assembled and characterized using next-generation sequencing. We compared the genome features, repeat sequences, sequence divergence, and constructed the phylogenetic relationships of the six Corylus species. The results indicated that Corylus cp genomes were typical of the standard double-stranded DNA molecule, ranging from , base pairs bp C. Each of the six cp genomes possessed unique genes arranged in the same order, including 80 protein-coding, 29 tRNA, and 4 rRNA genes.

Next-generation sequencing technology has increased the capacity to generate molecular data for plant biological research, including phylogenetics, and can potentially contribute to resolving complex phylogenetic problems. The evolutionary history of Medicago L. Leguminosae: Trifoliae remains unresolved due to incongruence between published phylogenies. Identification of the processes causing this genealogical incongruence is essential for the inference of a correct species phylogeny of the genus and requires that more molecular data, preferably from low-copy nuclear genes, are obtained across different species. Here we report the development of 50 novel LCN markers in Medicago and assess the phylogenetic properties of each marker. We used the genomic resources available for Medicago truncatula Gaertn. This alternative proves to be a cost-effective approach to amplicon sequencing in phylogenetic studies at the genus or tribe level and allows for an increase in number and size of targeted loci.

Request PDF | Applications of next-generation sequencing to phylogeography and phylogenetics | This is a time of unprecedented transition in.

The parallel sequencing of targeted amplicons is a scalable application of next-generation sequencing NGS that can advantageously replace Sanger sequencing in certain DNA barcoding studies.

Methods that reduce the genome by restriction digest and manual size selection Applications of next-generation sequencing to phylogeography are few. of applying NGS to phylogeography and phylogenetics of non-model organisms and.

