ABSTRACTS
        
        
        Vincent J. Carey, Harvard Medical School
          Genomic EDA and Modeling with R/Bioconductor
          Bioconductor (www.bioconductor.org) 
          pursues the creation of flexible and portable tools for statistical 
          analysis of genomic data. I will describe Bioconductor facilities for 
          exploratory data analysis and flexible statistical inference with microarray 
          data. Particular examples will include classification of gene expression 
          densities, visualization and inference on genomic network structures, 
          and flexible methods for testing hypotheses about the roles of pathways 
          and pathway components in gene expression studies.
        
        Christopher Field, Dalhousie University
          Robustness Issues in Phylogeny
          To estimate the tree structure for a set of taxa, we typically use a 
          statistical model for evolution and compute the maximum likelihood estimate. 
          Molecular Biologists recognize that the model is a rough approximation 
          to reality and there is considerable literature on the effects of model 
          deviations. In this talk, I will examine some of these deviations paying 
          particular attention to the type of robustness methodology needed to 
          successfully estimate the tree and make reliable inference.
        Greg Gloor, University of Western Ontario
          Co-evolution and mutual information of amino acid positions in protein 
          families
          Proteins are extremely complicated molecular machines that have evolved 
          to perform a particular cellular function. While knowing the structure 
          of a given protein often gives valuable insights into its function, 
          there are also many unanswered questions. This is because each structure 
          is a snapshot of one particular conformation of a protein isolated from 
          one individual species. In many instances functionally important amino 
          acid positions are conserved, but mutation 
          studies show that many non-conserved positions equally important. We 
          are using mutual information to find these important, yet variable, 
          amino acid positions in protein families. I will describe our progress 
          on this project, and present some strengths and limitations of the current 
          generation of tools used to show the correspondence between structure 
          and sequence.
        
        David Sankoff, University of Ottawa
          Far-reaching effects of missing map data and local shuffling on the 
          inference of genome rearrangement history
          Joint work with Phil Trinh. Until recently algorithms for studying 
          the evolution of gene order could only be applied to small genomes (mitochondria, 
          chloroplasts, prokaryotes), the difficulty with mammalian and other 
          larger eukaryotic nuclear genomes lying not so much in their much greater 
          length but rather in the absence of comprehensive lists of genes and 
          their orthologs. Pavel Pevzner and Glen Tesler (PNAS 2003) have suggested 
          a way to bypass gene finding and ortholog identification by using the 
          order of syntenic blocks constructed solely from sequence data as input 
          to a genome rearrangement algorithm. The method focuses on major evolutionary 
          events by glossing over small block-internal rearrangements, and neglecting 
          intervening blocks smaller than a threshold length. This use of large 
          "sanitized" blocks, and the neglect of short blocks may, however, 
          blur important parts of the historical derivation of the genomes. We 
          model the effects of eliminating and amalgamating short blocks, concentrating 
          on the summary statistic of`"breakpoint re-use" introduced 
          by Pevzner and Tesler. They did not conceive of this as an evolutionary 
          distance, but in the context of their protocol it effectively measures 
          to what extent genomes have diverged in becoming random permutations 
          of blocks with respect to each other. We use analytic and simulation 
          methods to investigate breakpoint re-use as a function of threshold 
          size and of rearrangement parameters. We discuss the implication of 
          our findings for the comparison of mammalian genomes and suggest a number 
          of mathematical, algorithmic and statistical lines for further developing 
          the Pevzner-Tesler approach. 
        David Tritchler, University of Toronto
          A Spectral Clustering Method for Microarray Data
          Joint work with Shafagh Fallah and Joseph Beyene. Cluster analysis is 
          a commonly used dimension reduction technique. This talk introduces 
          a clustering method motivated by a multivariate analysis of variance 
          model and computationally based on eigenanalysis (thus the term ``spectral" 
          in the title). Our focus is on large problems, and we present the method 
          in the context of clustering genes and arrays using microarray expression 
          data. The computational algorithm for the method has complexity linear 
          in the number of genes.
        Of the numerous methods for constructing clusters
          from microarray data, many require that the number of clusters believed 
          present in the data be specified a priori, and in general judgements 
          about the appropriate number of clusters is problematic. We also introduce 
          a method for assessing the number of clusters exhibited in microarray 
          data based on the eigenvalues of a particular matrix.
        
        Jean Yee Hwa Yang, University of California, 
          San Francisco
          Statistical Issues in the Design of Microarray Experiments
          Microarray experiments performed in many areas of biological sciences 
          generate large and complex multivariate datasets. This talk addresses 
          statistical design and analysis issues, which are essential to improve 
          the efficiency and reliability of cDNA microarray experiments. We discuss 
          various considerations unique to the design of cDNA microarrays, and 
          examine how different types of replication affect design decisions. 
          We calculate variances of two classes of estimates of differential gene 
          expression based on log ratios of fluorescence intensities from cDNA 
          microarray experiments: direct estimates, using measurements from the 
          same slide, and indirect estimates, using measurements from different 
          slides. These variances are compared and numerical estimates are obtained 
          from a small experiment. Some qualitative and quantitative conclusions 
          are drawn which have potential relevance to the design of cDNA microarray 
          experiments.
        Kenny Q Ye and Anil Dhundale, SUNY at 
          Stony Brook
          Pooling or not pooling in microarray experiments - an experimental 
          design point of view
          Microarray experiments are often used to detect differences in gene 
          expression between two populations of cells; a test population versus 
          a control population. However in many cases, such as individuals in 
          a population, the biological variability can present changes that are 
          irrelevant to the question of interest and it then becomes important 
          to assay many individual samples to collect statistically meaningfully 
          results. Unfortunately the cost of performing some types of microarray 
          experiments can be prohibitive. A potentially effective but not well 
          publicized alternative is to pool individual RNA samples together for 
          hybridization on a single microarray. This method can dramatically reduce 
          the experimental costs while maintaining high power in detecting the 
          changes in expression levels that relate to the specific question of 
          interest. In this talk, we will discuss why this technique works and 
          the optimal design strategy for pooling. This idea will also be illustrated 
          by a synthetic experiment and a real experiment that studies Afib (cardiac 
          atrial fibrillation), a condition that is a serious health condition 
          that affects a large percent of the population but mechanistically remains 
          not well understood.
        Back to Top
        Back to Workshop Home Page