The circum-basmati group of cultivated Asian rice (Oryza sativa) contains many iconic varieties and is widespread in the Indian subcontinent.

Despite its economic and cultural importance, a high-quality reference genome is currently lacking, and the group’s evolutionary history is not fully resolved. To address these gaps, we use long-read nanopore sequencing and assemble the genomes of two circum-basmati rice varieties.




We generate two high-quality, chromosome-level reference genomes that represent the 12 chromosomes of Oryza. The assemblies show a contig N50 of 6.32 Mb and 10.53 Mb for Basmati 334 and Dom Sufid, respectively. Using our highly contiguous assemblies, we characterize structural variations segregating across circum-basmati genomes. We discover repeat expansions not observed in japonica—the rice group most closely related to circum-basmati—as well as the presence and absence variants of over 20 Mb, one of which is a circum-basmati-specific deletion of a gene regulating awn length. We further detect strong evidence of admixture between the circum-basmati and circum-aus groups. This gene flow has its greatest effect on chromosome 10, causing both structural variation and single-nucleotide polymorphism to deviate from genome-wide history. Lastly, population genomic analysis of 78 circum-basmati varieties shows three major geographically structured genetic groups: Bhutan/Nepal, India/Bangladesh/Myanmar, and Iran/Pakistan.




The availability of high-quality reference genomes allows functional and evolutionary genomic analyses providing genome-wide evidence for gene flow between circum-aus and circum-basmati, describes the nature of circum-basmati structural variation, and reveals the presence/absence variation in this important and iconic rice variety group.





Figure 2: Circum-basmati gene sequence evolution. a The deletion frequency of genes annotated from the Basmati 334 and Dom Sufid genomes. Frequency was estimated from sequencing data on a population of 78 circum-basmati varieties. b Groups of orthologous and paralogous genes (i.e., orthogroups) identified in the reference genomes of circum-aus N22, japonica Nipponbare (NPB), and indica R498, as well as the circum-basmati genome assemblies Basmati 334 (B334) and Dom Sufid (DS) of this study. c Visualization of the genomic region orthologous to the Nipponbare gene Os03g0418600 (Awn3-1) in the N22, Basmati 334, and Dom Sufid genomes. Regions orthologous to Awn3-1 are indicated with a dotted box.