| ||||||
|
Home | Overview | Application | Course Work and Research | Faculty | Contact | FAQ | BCBSI Archive | NIH-NSF BBSI Formal Course WorkThe formal course work will consist of eight modules described in detail below. Each module has its lectures and computer laboratory exercises. One additional computer laboratory component is an introduction to Perl/BioPerl programming.
Module A. Overview of Bioinformatics and Computational Systems Biology (2 Lectures).An intensive introduction to fundamental methods in bioinformatics and computational systems biology describing applications to prominent research problems, and discussing recent accomplishments in genomics and systems biology. The lectures will introduce basic cell structure, genome structure, protein structure, metabolism, enzymology, and signaling and regulation of various biological processes. Examples of modeling and integrating those biological components will demonstrate the important roles that the quantitative sciences - mathematics, statistics, and computer science are playing to deepen our understanding of biological systems. Module B: Statistical Foundations (4 Lectures)This module is designed to provide the core knowledge of probability and inferential statistics, which is an important foundation for much of bioinformatics and systems biology. The topics will include basic probability concepts, random variables and distributions, parameter estimation via maximum likelihood, hypothesis testing, and statistical modeling. Through examples, we will demonstrate how to meaningfully quantify the significance of interesting findings, how to predict future observations, and how to develop additional models for more realistic systems. The R package will be introduced in the lab session. Module C: Algorithmic Foundations (2 Lectures)The goal of this module is to introduce the fundamentals of sequence comparison methods. Two lectures on pairwise and multiple sequence comparisons will cover classic dynamic programming approaches and fast string matching algorithms based on suffix trees and suffix arrays. Module D: Genomic Sequence Analysis (4 Lectures)In addition to a brief introduction to topics in genome informatics, the goal of this Module is to demonstrate how the foundations presented in Modules B and C are applied to solve important research problems in genomics. We will discuss conventional approaches for annotating gene sequences for protein functions. Also some extensions of sequence matching such as motif searching will be included. We will discuss Hidden Markov Model approaches to gene structure prediction in prokaryotes and eukaryotes, and spliced alignment techniques to evaluate gene structure predictions in eukaryotes based on transcription evidence. The laboratory will provide hands-on opportunities to use gene structure annotation tools and tutorials that were developed to actually annotate plant genes – see http://www.plantgdb.org/. Module E. Structural Genomics (3 Lectures)Knowledge of protein structure is critical for understanding many details of protein function. The large numbers of protein sequences inferred from genomic DNA require fast methods for interpretation as structures. This can be achieved either with high-throughput experimental structure determination or by in silico structure prediction. Computations play an important role in both approaches. In addition, even when the structure and function of one related protein are known, its mechanism of operation may not be known – and computations can assist in making interpretations of structures and functions in terms of functional motions. In this module we will focus on: Computations for Structure Determination; Surveying Structures; Protein Structure Predictions; and Molecular Simulations. Module F. Functional Genomics (5 Lectures)Functional genomics refers to the genome-wide or system-wide experimental approaches to assess gene function by making use of high-throughput experimental data. But, functional genomics requires extensive use of computational and statistical methods to achieve meaningful interpretations. The goal is to show how genome-wide information from various large-scale “omics” data sets is integrated to interpret how genes, proteins, and protein complexes functionally coordinate in biological systems. An introduction to the various large- scale “omics” technologies will be provided, including transcriptomics, proteomics, and metabolomics. Profiling can incorporate a wide variety of data from a particular set of cases to identify patterns unique to the organism, tissue, or conditions. How to profile using transcriptomic, proteomic, and metabolomic data will be discussed, as well as the mapping of gene expression data, protein- protein interaction data, and the phenome mapping. Key ideas for assessing these maps as graphs such as small-world graphs and connectivity will also be introduced. The laboratory section will provide an opportunity for students to explore extant maps of the transcriptome and protein-protein interactions using graph visualization programs such as FCModeler, Cytoscape, and graph analysis packages in R and Leda. Module G. Computational Analysis of “Omics” data – Machine Learning Approaches (5 Lectures)This module focuses on several representative techniques for automated or semi-automated data-driven knowledge acquisition and exploratory data analysis based on algorithms drawn from artificial intelligence, machine learning, data mining, and statistical inference approaches. Lectures and labs will cover the computational, mathematical, and statistical foundations of algorithms and their application to representative problems in Computational Molecular Biology, including data-driven discovery of protein sequence-structure/function relationships, automated construction of protein function classifiers, prediction of protein-protein interactions, and inference of genetic networks from gene expression data. Module Module H. Integrative Systems Biology and Future Prospects (5 Lectures)The central theme of Module H will be systems biology – how we can look at the whole, given that we have a lot of information on individual parts. Researchers have yet to establish a sufficiently broad “genetic” picture of the complexities of mechanisms in organisms. This will require the integrated efforts of many types of scientists conducting research in gene regulatory networks, protein interactions, metabolism, and genomics. A critical element absent in this picture is a full set of mappings between phenotypes and all of the available data. This requires gene mapping incorporating data from metabolomics, protein interactomes, and measured gene expression levels. A broader and deeper comprehension of the complex interconnections between function, regulation and behavior of genes and the various available data will lead to improvements for the informed selection of traits for agricultural use. Prime examples would be the selection of plants for resistance to stress and disease, and tackling problems with significant economic agricultural values. We will show how these efforts can lead to improvements in the comprehension of the effects of stress/disease in plants, the cataloging of proteins/genes affected by specific external perturbations as well as the pathways, cascades and processes affected. This system-wide data analysis will deepen our understanding of the origin of phenotypic behavior and the connections between phenotypes and various data, and lead to better ways for performing cell simulations. This Module should encourage the participants to embark on new scientific careers to tackle the leading-edge problems on the new frontiers of biological research. Programming Class: Introduction to Perl/BioPerl.As an integral part of the instruction, the students will have to implement some of the algorithms and struggle with underlying concepts through data work. Because some students will not be adept at programming, we will offer a daily programming class during the Short Course. Perl is a scripting language that has gained much visibility with the immense popularity of the internet, for example, in the form of CGI programs. Perl may be less computationally efficient than some other languages, such as C and Java, but it is fairly mature, easy to learn and platform-independent (Perl interpreters are available on every major platform). Because of its highly developed text processing and pattern matching capacity, Perl has become one of the most popular languages for biological data analysis. BioPerl is the first of several open-source projects aimed to provide modular bioinformatics support, others being BioPython, BioJava and BioRuby. "The Bioperl project is a coordinated effort to collect computational methods routinely used in bioinformatics and life science research into a set of standard CPAN-style, well-documented, and freely available Perl modules. Perl provides unparalleled support for many tasks common in bioinformatics and life science research yet there are no standard Perl modules for biology. We hope to help fill this void." (from BiolPerl.org, official BioPerl web site). In our workshop, we plan to give participants a solid understanding of Perl and BioPerl basics, with exercises aimed at developing skills to perform common tasks, such as formatting, parsing BLAST output, and the skills to take on more advanced bioinformatics programming.
|
||||||
|
|
E-mail: mjmccunn@iastate.edu |
|||||
|
|
Copyright © 2002 Iowa State University |
|||||