Copy Number-Based Seeding Approaches to Efficient Orthology and Synteny Mapping in Genome Comparisons
Date Issued
2008
Date
2008
Author(s)
Chang, Yu-Jung
Abstract
Motivation: Orthology/synteny mapping—finding orthologous regions among genomes and organizing these evolutionary counterparts into a coherent global picture—is fundamental to studies of comparative genomics. With the increasing number of completely sequenced genomes and thus the increase in comparisons of massive nucleotide sequences, the need for orthology/synteny mapping methods of high sensitivity/specificity and high efficiency becomes even more compelling.esults: First we have developed the UniMarker (UM) method for synteny mapping of large genomes that are closely related, such as the human and mouse. In this method, the occurrence spectra of genome-wide unique 16mer sequences present in both the human and mouse genome are used to directly detected orthologous genomic segments. Being sequence alignment-free, the UM method is very fast and the high-quality human-mouse synteny maps based on DNA comparisons can be completed in a few hours on single desktop computer. Second, we propose a new type of DNA sequence seed for use in orthology mapping of not closely related genomes. We call our seeds α-pairs, where α is an integer equal to or greater than the number of times any qualifying seed can be found in the compared genomes. These copy number-based seeds are thus distinct from the well-known length-based seeds, such as the fixed-length k-mer seeds or the maximal exact match (MEM) seeds which have a length no less than k. We present a linear time algorithm to efficiently retrieve α-pairs in two given genomic sequences based on enhanced suffix arrays. A comparison of the results using α-pairs with those using length-based seeds for their ability to detect the orthologues annotated by Ensembl and COG for several vertebrate genomes/chromosomes and for prokaryote genomes of long evolutionary distances suggested that orthology seeding using copy number can achieve a higher sensitivity and better efficiency than orthology seeding using length. Moreover, we extend the α-pair method to generate discontiguous wobble seeds of maximal length with copy number constraints. The comparative results of ROC curves for human chr.15 vs. mouse chr.7, chicken chr.10, and pufferfish genome showed that the discontiguous wobble α-pairs achieved significantly better performances than spaced k-mer seeding methods tested.
Subjects
comparative genomics
synteny mapping
orthology mapping
sequence alignment
seeding
suffix array
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-97-D90922014-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):40c2aca90116c5df7cfd4c7271a5f1b7
