However, short-read sequencing platforms fail to provide vital information that comprises genes involved in the response to such a challenge. Here, we applied long-read sequencing to retrieve missing sequences to the duckweed species of Spirodela polyrhiza. Evolution of the genetic network and root morphology show that roots play a function as sea anchors rather than nutrient uptake. Moreover, disease-resistance gene clusters are constitutively active whereas they are silenced by phasiRNA in land plants.
Aquatic plants have to adapt to the environments distinct from where land plants grow. A critical aspect of adaptation is the dynamics of sequence repeats, not resolved in older sequencing platforms due to incomplete and fragmented genome assemblies from short reads. Therefore, we used PacBio long-read sequencing of the Spirodela polyrhiza genome, reaching a 44-fold increase of contiguity with an N50 (a median of contig lengths) of 831 kb and filling 95.4% of gaps left from the previous version. Reconstruction of repeat regions indicates that sequentially nested long terminal repeat (LTR) retrotranspositions occur early in monocot evolution, featured with both prokaryote-like gene-rich regions and eukaryotic repeat islands. Protein-coding genes are reduced to 18,708 gene models supported by 492,435 high-quality full-length PacBio complementary DNA (cDNA) sequences. Different from land plants, the primitive architecture of Spirodela’s adventitious roots and lack of lateral roots and root hairs are consistent with dispensable functions of nutrient absorption. Disease-resistant genes encoding antimicrobial peptides and dirigent proteins are expanded by tandem duplications. Remarkably, disease-resistant genes are not only amplified, but also highly expressed, consistent with low levels of 24-nucleotide (nt) small interfering RNA (siRNA) that silence the immune system of land plants, thereby protecting Spirodela against a wide spectrum of pathogens and pests. The long-read sequence information not only sheds light on plant evolution and adaptation to the environment, but also facilitates applications in bioenergy and phytoremediation.
Figure 1: Comparison of genome assembly from short reads and long reads. From outside to inside, the circles represent karyotype (a), sequence gaps (b), GC content (c), full-length LTRs (d), gene density (e), and syntenous connections (f). The metrics are calculated in 1-Mb sliding windows. The right half circle represents genome assembly from long reads (Sp7498V3). The left half circle represents genome assembly from short reads (Sp7498V2). Every blue vertical bar indicates one gap in layer b. There are 270 gaps in Sp7498V3 and 13,459 gaps in Sp7498V2. The inner lines denote the synteny of two versions of genomes. Chr, chromosome.