Welcome to my Bioinformatics Research Page

The sequencing of the human genome was lauded as one of the most significant achievements in the history of science. A sometimes forgotten fact is that none of it would have been possible without modern computers, not in such a short time. Special computer algorithms were developed and used to decipher millions of pieces of the genetic material and the convergence of computer applications and molecular biology gave birth to two new fields: Computational Biology and Bioinformatics.

Both of these interdisciplinary approaches draw from specific disciplines such as applied mathematics, informatics, statistics, computer science, artificial intelligence, chemistry, and biochemistry to solve biological problems usually on the molecular level. As defined by the National Institutes of Health (NIH), Bioinformatics focuses more on information processing, while Computational Biology focuses on the underlying hypothesis and the modeling of biological systems.

My research involves both information analysis and hypothesis testing and therefore both terms are used interchangeably in the description of my projects. A list of current projects can be found below.
 

Research Projects

Phylogenetic Analysis of Viral Quasispecies (HIV,HCV): The main project in my lab consists of the study of HIV evolution within a host using an approach that brings together elements from a variety of fields such as Phylogenetics, Algorithms, Virology and Statistics. Patterns of viral evolution are inferred from serially-sampled sequence data, i.e., sequence data obtained from strains isolated at consecutive time points from a single patient or host. Traditional phylogenetic methods assume a tree-like evolutionary model, many RNA viruses, however, have the capacity to exchange genetic material with one another using a process called recombination. A genealogy involving recombination is best described by a network structure, which may reveal unique patterns of viral evolution and help explain the emergence of disease-associated mutants and drug-resistant strains, with implications for patient prognosis and treatment strategies. (Book to appear soon: ISBN:978-3-8364-3458-4)

Alternative Splicing: At least 70% of human genes express multiple mRNAs through alternative splicing of exons or exon segments. The splicing machinery (called the spliceosome) identifies cis-acting elements during the spicing process.  cis-acting elements distinguish exons from introns, direct the spliceosome to the correct nucleotides for exon joining and intron removal, and serve as binding sites for auxiliary factors that regulate alternative splicing. These elements make up what is now recognized as a ‘splicing code’, which appears to be particularly dense within and around exons. Our research consists of the De Novo discovery of splicing regulatory sequences and alternative exons.

Base Compositional Bias: One potential pitfall for phylogenetic estimation from biological sequence data is compositional bias.  Third codon positions have more extreme base compositional biases and account for a majority of the variable sites, resulting in a more rapid loss of the historical signal of relatedness recorded in individual nucleotides. We are developing a non-parametric test for multiple alignments to test if the clustering of taxa in a tree is due to similar levels of base compositional bias instead of expected genealogical relationships.

Software:

Sliding MinPD:  A program that combines distance-based phylogenetic methods with automated recombination detection based on the best known sliding window approaches to reconstruct serial evolutionary trees or networks. Sliding MinPD is a program that can be applied to sequences from recombining, fast-evolving viruses such as HIV-1, sampled serially from the same host.  The network facilitates the study of viral evolutionary relationships, evolutionary patterns, splitting and merging of lineages, and helps to determine how these correlate with the disease status of the patient. C source code available.

Serial NetEvolve: A flexible simulation program that generates DNA sequences evolved along a tree or recombinant network. It offers a user-friendly Windows graphical interface and a Windows or Linux simulator with a diverse selection of parameters to control the evolutionary model. Serial NetEvolve is a modification of the Treevolve program with the following additional features: simulation of serially-sampled data, the choice of either a clock-like or a variable rate model of sequence evolution, sampling from the internal nodes and the output of the randomly generated tree or network in our proposed NeTwick format.

 

Shuffler (expected released date: April 2008)

 


Patricia Buendia
University of Miami
Last Updated:
1/13/08

Back to home page