Cassava breeders often use a single locus, CMD2, for introducing CMD resistance into susceptible cultivars. The CMD2 locus has been genetically mapped to a 10-Mbp region, but its organization and genes as well as their functions are unknown.
We report haplotype-resolved de novo assemblies and annotations of the genomes for the African cassava cultivar TME (tropical Manihot esculenta), which is the origin of CMD2, and the CMD-susceptible cultivar 60444. The assemblies provide phased haplotype information for over 80% of the genomes. Haplotype comparison identified novel features previously hidden in collapsed and fragmented cassava genomes, including thousands of allelic variants, inter-haplotype diversity in coding regions, and patterns of diversification through allele-specific expression. Reconstruction of the CMD2 locus revealed a highly complex region with nearly identical gene sets but limited microsynteny between the two cultivars.
The genome maps of the CMD2 locus in both 60444 and TME3, together with the newly annotated genes, will help the identification of the causal genetic basis of CMD2 resistance to geminiviruses. Our de novo cassava genome assemblies will also facilitate genetic mapping approaches to narrow the large CMD2 region to a few candidate genes for better informed strategies to develop robust geminivirus resistance in susceptible cassava cultivars.
CMD2 locus in TME3 genome. a The upper panel shows CMD2-associated genetic SNP markers and their genetic distance relative to their physical position on scaffold_7 of TME3. Red dots indicate CMD2 SNP markers released by Rabbi and colleagues , and blue dots indicate the SNP markers released by Wolfe and colleagues [22, 42]. The lower panel shows the distribution of main repetitive genomic features at the CMD2 locus. b The upper panel shows the alignment position of AM560 v6.1 CDS in the region of Chr. 12 containing the CMD2 locus. Each black dot represents the CDS alignment position at the CMD2 scaffold (x-axis) and its chromosomal origin from the AM560 v6.1 cassava reference genome. Sequence breaks (gaps > 1 Kb) are shown as pink bars. The lower panel shows the MSS for every annotated gene at the CMD2 locus in TME3. Green dots indicate genes that are found in the CMD2 region of 60444, and light blue dots indicate genes that are found in close proximity of the CMD2 locus in 60444. Orange dots indicate TME3 genes that show a syntenic relation to 60444 genes on other 60444 scaffolds, and red dots indicate genes with no syntenic relation. The dashed line represents the MSS average for the whole genome.