Codon usage in labyrinthulomycete coding sequences
We analyzed the recently available whole genome sequences from two thraustochytrids (Aurantiochytrium limacinum ATCC MYA-1381, Schizochytrium aggregatum ATCC 28209) and one aplanochytrid (Aplanochytrium PBS07) (Table 1).
Table 1. Labyrinthulomycete genome sequencing and annotation results (Joint Genome Institute).
Species | Size (Mb)/ coverage | # scaffolds | N50/ L50 | # gaps | # proteins predicted | Introns per gene/ Median length | %EST mapped |
Aplanochytrium | 35.77/ 108x | 207 | 19/ 0.72 | 316 | 11892 | 2.75/ 71 | 97.7 |
Schizochytrium | 40.85/ 72x | 283 | 15/ 0.64 | 1953 | 10612 | 1.5/ 345 | 97 |
Aurantiochytrium | 60.93/ 31x | 101 | 10/ 2.46 | 937 | 14859 | 1.5/ 494 | 96 |
We calculated the genome-wide relative synonymous codon usage (RSCU; Fig. 1A), codon frequencies (Fig. 1B) and GC content (Table 2) for predicted coding sequences from each of the three species.
We compared these to other stramenopiles: the diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana, and the oomycete Phytophthora sojae, as well as to the ascomycete fungus Saccharomyces cerevisiae. We found:
- The stramenopiles have distinctly different patterns of codon usage than the yeast Saccharomyces.
- The thraustochytrid Schizochytrium has codon usage more similar to the oomycete Phytophthora than to the other labyrinthulomycetes (Aplanochytrium and the thraustochytrid Aurantiochytrium).
- In general Schizochytrium shows underutilization of codons ending in A or T, and overutilization for codons ending in G or C, with the divergence reaching significance as an outlier for some threonine, alanine, leucine, and arginine codons.
- Although never indicated as an outlier, Phytophthora shows a similar codon use pattern as Schizochytrium.
- The differences in codon usage are broadly consistent with the greater GC content of CDS in Schizochytrium (~63%) and Phytophthora (~59%) than the other genomes (~40 to 51%).
However, these comparisons also show that GC content is not the only factor associated with variations in codon usage patterns.
Table 2. GC content of coding sequences.
Aplanochytrium kerguelense | 0.424 |
Aurantiochytrium limacinum | 0.489 |
Phaeodactylum tricornutum | 0.509 |
Phytophthora sojae | 0.584 |
Saccharomyces cerevisiae | 0.396 |
Schizochytrium aggregatum | 0.628 |
Thalassiosira pseudonana | 0.479 |
Analysis code for results presented above is available at protocols.io. The codon tables are available at Academic Commons.
This data was generated by Collier, Rest, et al. as part of a grant from the Gordon and Betty Moore Foundation.