Long-read vs short-read sequencing in microbiome research

Microbiome research relies heavily on DNA sequencing to identify and characterize microbes within a particular environment. For a long-time, this has primarily involved massive sequencing using short-reads (150-300 bases); however, recent developments in DNA sequencing have led to sequencing of much longer fragments (up to 10kb), with increased usability and capacity to assemble entire genomes. The low cost and portability of some long-read sequencing platforms is allowing them to become a point-of-care tool in clinical settings, in remote settings during infectious disease outbreaks, and even on the international space station. This blog will introduce long-read sequencing, its advantages and disadvantages and its potential use in microbiome research.

Long-read sequencing to the rescue:

The most popular form of short-read sequencing, Illumina sequencing, works  by fragmenting DNA into small segments, attaching them to nanoparticles, amplifying them, and reading these segments as they are synthesized; followed by reconstructing the sequence based on overlaps between these shorter reads or comparing them to reference databases. Typically, these reads are between 50-300 bases and therefore require millions of reads to cover a sufficient proportion of a genome, depending on the organism. The presence of repetitive regions complicates the reconstruction, as it makes it impossible to assign a sequence to a single place in the genome, resulting in fragmented assemblies. In the case of metagenomics, identical sequences can come from multiple related organisms which makes assembly of genomes even harder. We are able to quantify genes but have a harder time providing genomic context to them. Long-read sequencing bypasses most of these problems.

Advances in optics, microfluidics, and biochemistry have allowed for the development of long-read sequencing technologies. These methods usually require larger DNA fragments (1-30 Kbases) and  manipulate single molecules of DNA. There are two primary long-read sequencing technologies that are currently used:

Oxford Nanopore Sequencing

Oxford Nanopore sequencing operates on the innovative principle of threading DNA strands through nanopores and reading their electric charge, allowing real-time detection of nucleotide sequences. This portable, cost-effective technology produces exceptionally long reads, reaching into the thousands of bases, making it an appealing choice for metagenomic studies. In addition, some of their sequencers are small and portable (~10 cm) and low cost (a few thousand dollars).

PacBio Sequencing

Pacific Biosciences employs single-molecule, real-time (SMRT) sequencing, observing the incorporation of individual nucleotides into growing DNA strands. This results in long reads, often spanning tens of kilobases, providing a unique advantage in uncovering regions of microbial genomes that cannot be accurately read by short-read sequencing, namely repetitive regions.

Advantages and Disadvantages of Long-Read Sequencing:

Like all new technologies, long-read sequencing has its advantages and disadvantages over classical short-read sequencing and these factors should be considered when designing your microbiome study:

Genome assembly

Microbes contain highly variable, constantly changing genomes, often with repetitive regions which can make genome assembly difficult. Short-read sequencing poses challenges for genome assembly due to its poor ability to resolve repetitive regions of genomes, often leading to fragmented genomes. Long-read sequencing, on the other hand, is a much more powerful tool for genome assembly due to its greater ability to read these highly repetitive regions which can result in assembly of whole genomes (metagenome assembled genomes).

Taxonomic resolution

Due to the length of reads, long-read sequencing is a more powerful tool to resolve microbial taxonomy at deeper levels when used for amplicon sequencing. When applied to 16S amplicon sequencing for example, long-read sequencing can provide more information to resolve between closely related microbial groups versus short read sequencing, due to its ability to more easily sequence the entire 16S gene therefore resolving at species or even “strain” level within complex microbiomes.

Accuracy

Short-read sequencing typically provides very high accuracy in base-calling (>99.9%), whilst historically, long-read sequencing had lower accuracy and quality (~90%). Recent advances in long-read technologies, however, have improved accuracy to >99% at a base level for both Oxford Nanopore and PacBio sequencing, suggesting that accuracy is no longer a major issue with long-read sequencing. PacBio provides slightly higher accuracy versus Nanopore due to its method of repetitive sequencing in a circular library DNA structure.

DNA quantity and quality

Long-read sequencing relies on higher quantities and longer fragments of DNA. Generating enough sequence coverage, particularly long, intact DNA, remains a critical requirement for successful microbial assembly, especially in environmental microbiome studies. This poses challenges for DNA extraction protocols, which require cell lysis while retaining DNA integrity. When using protocols to extract long, intact DNA, it is essential to ensure accurate representation of all species within microbiome samples, which can be more difficult than protocols for short-read sequencing that can use more fragmented DNA. The isolation of suitable DNA for long-read sequencing can therefore pose challenges, particularly in diverse and fragmented environmental samples with lower biomass. 

Cost

Multiple factors contribute to sequencing cost, including depth and coverage of sequencing required. Some long-read sequencers, such as the Oxford Nanopore MinION, can be much cheaper than short-read sequencers ($1,000 vs $100,000), although other long-read sequencers (e.g. PacBio) are as expensive as their short-read counterparts. However, flow-cells used for long-read sequencing are generally much more expensive than short-read sequencers. The per base cost for short-read sequencing is ultimately much cheaper if multiple samples are multiplexed on a single sequencing run. Multiplexing is possible on long-read sequencing devices, but currently not to the same extent. 

Portability and Speed

One remarkable aspect of Nanopore long-read sequencers is their portability, enabling their use in diverse settings. Unlike typical, large, benchtop sequencing devices, these small long-read sequencing devices can be directly plugged into a laptop and have been employed in unconventional environments, from the International Space Station to remote field sites during disease outbreaks, showcasing their adaptability and potential for rapid response in monitoring microbial populations. Furthermore, the sequencing data for each sample can be generated and analysed in real-time providing results much more quickly than typical short-read sequencing technologies. However, overall, short-read sequencing technologies are more high-throughput due to their ability to multiplex many more samples on one sequencing run.



Aspect

Long-Read Sequencing

Short-Read Sequencing 

Read Length

Thousands to tens of kilobases, capturing entire genomic regions

Typically around 35-600 bases, limiting the ability to resolve complex genomic structures

Genome Assembly

Facilitates the assembly of complete genomes

Often struggles to reconstruct entire genomes due to fragmented assemblies

Structural Variation

Excels at identifying large structural variations

Limited ability to detect complex structural variations, especially large ones

Taxonomic Resolution

Provides finer taxonomic resolution (sub-species/strain level) for amplicon sequencing versus short-read sequencing

Coarser resolution for amplicon sequencing, may struggle with closely related microbial groups

Base accuracy

Less accurate base calling (~99%), although improving

High accuracy for individual bases (>99.9%), but challenges in accurately resolving repetitive regions

Cost-Effectiveness

Sequencers are as cheap as ~$1000 but generally higher operating costs.

Generally more cost-effective per base, but can be expensive for large-scale projects

Input DNA requirements

Medium to high 

Low

 

Applications in Microbiome Research:

Increasingly, long-read sequencing is being used in microbiome research to assemble within complex microbial communities amongst other uses.

Full-length amplicon sequencing

Amplicon sequencing, particularly 16S rRNA sequencing, is a critical tool for microbiome research, however when conducted using short-read sequencing, it is limited to taxonomic resolution at a genus or species level. Long-read sequencing, on the other hand, has the ability to read the entire 16S gene with single nucleotide resolution, meaning that it can be used to identify sub-species clades or “strains” within a community. This approach can be particularly useful in samples where the genetic material of interest is a small proportion of all DNA or in microbiome samples from environments that are less well characterized.

Metagenomics

Metagenomics provides an extra layer of information versus amplicon sequencing, by providing functional information on microbial gene content. However, short-read sequencing makes the process to identify and assemble these genes very labor-intensive and difficult due to the necessity to read and assemble millions of short reads. Long-read sequencing has great potential to make metagenomic analysis a simpler task and to provide greater coverage of metagenomes due to its greater ability to resolve repetitive regions of genomes. This may help to identify new genes of interest within microbiome samples that would not be identified using short-read sequencing.

Genome sequencing

Whole genome sequencing of microbial isolates of members of microbial communities is critical to understand factors such as antimicrobial resistance, pathogenicity and mutations in microbial organisms. Long-read sequencing provides a faster and simpler method to sequence entire genomes than short-read sequencing, making it an appropriate tool to track infection outbreaks or strain transmission between individuals and environments.

It is evident that long-read sequencing can be used effectively in all areas of microbiome research. Increasingly, researchers are applying long-read sequencing to analysis of their microbiome samples in order to generate high quality sequencing data with full genome coverage from multiple organisms in complex samples.

Long-read sequencing in your microbiome study?

Long-read sequencing has the potential to add an extra dimension to your microbiome study by providing deeper genetic insight into your samples. However, as always, it is important to consider carefully what experimental approach is most useful for your particular research. The team at Microbiome Insights have recently developed the capability to add long-read sequencing to your microbiome study, so if you have any questions, get in touch today.

About Microbiome Insights

Microbiome Insights, Inc. is a global leader providing end-to-end microbiome sequencing and comprehensive bioinformatic analysis. The company is headquartered in Vancouver, Canada where samples from around the world are processed in its College of American Pathologist (CAP) accredited laboratory. Working with clients from pharma, biotech, nutrition, cosmetic and agriculture companies as well as with world leading academic and government research institutions, Microbiome Insights has supported over 925 microbiome studies from basic research to commercial R&D and clinical trials. The company's team of expert bioinformaticians and data scientists deliver industry leading insights including biomarker discovery, machine-learning based modelling and customized bioinformatics analysis.