Sequencing, Analysis, and Characterization of Ruditapes philippinarum Genome

In June 2014 we started a whole genome sequencing project of the Manila clam, Ruditapes philippinarum. According to the Animal Genome Size Database, the C-value of R. philippinarum genome is 1.97 pg, corresponding to approximately 1.93 Gb.
We sequenced a single male individual with 40x Illumina HiSeq 2500 and 30x PacBio RSII. Illumina sequencing was performed at the UPC Genome Core at the University of Southern California (Los Angeles), on a HiSeq 2500. Two libraries were sequenced with 2×250 bp reads: a short insert library (450-500 bp), and a long insert library (1500-1700 bp). PacBio sequencing was performed at the Genomics High Throughput Facility Shared Resource of the Cancer Center Support Grant (CA-62203) at the University of California, Irvine, on a RSII with P6-C4 chemistry. A BluePippin size selection was performed (10-50 Kb).
The assembly process was quite challenging. We encountered some major problems such as a very high heterozygosity (as reported for many bivalve genomes, but see Renaut et al. 2018), large number of tandem repeats, and many repeats larger than PacBio reads (sometimes >50Kb). The best de novo assembly was obtained with Canu (32,399 contigs, total length 1.94 Gb, contig N50=76,831 bp), with 86% complete, 5% fragmented, and 9% missing metazoan orthologs according to BUSCO (Simão et al 2015). The barplot below shows the assessment of assembly quality according to BUSCO (v3, genome mode, metazoan orthologous gene set), performed on the best mollusc genomes available (see also Ghiselli et al. 2017). Right now we are working on gene prediction and annotation, and we plan to publish and release the first draft within the next few months.

This will be just the beginning of new research project that will consist of:

  • characterization of the genome (gene repertoire and structure, repeated sequences, transposable elements);
  • comparative analysis: (identification and manual curation of a bivalve/mollusc orthologous gene set, gene family expansion/reduction, species-specific genes, protein domain evolution, molecular evolution, phylogenomics);
  • establishment of a dedicated website, with a genome browser and access to all the genomics resources of the species.

The following step will be the development of a second, improved, version of the genome, with the goal of producing a chromosome-level assembly.

COLLABORATIONS: this project is carried out in collaboration with the Theodosius Dobzhansky Center for Genome Bioinformatics (St. Petersburg State University, Russia), the Nuzhdin Lab (University of Southern California, Los Angeles, USA), SeqOnce Biosciences (Pasadena, USA), and the Breton Lab (Université de Montréal, Canada).