L. H. Baker Center for Bioinformatics and Biological Statistics

Bioinformatics and Computational Biology Summer Institute
Iowa State University

 

Research Projects for Fellows, June 20 - August 6, 2005
P1: Evaluation of EST clustering results

Title: Evaluation of EST clustering results.
Mentors: Dr. Qunfeng Dong and Dr. Volker Brendel
Description: ESTs (Expressed Sequence Tags) are valuable data for gene discovery, especially for plant species with large genomes that have not been fully sequenced. ESTs provide a convenient means of accessing the transcriptome of a given species. However, ESTs generally correspond to only partial cDNA sequences, and EST samples are typically highly redundant (especially if EST sets are not derived from normalized EST libraries). Therefore, a major task for the PlantGDB database (http://www.plantgdb.org) is to assemble overlapping ESTs into putative unique transcript contigs on a frequent and regular basis. The resulting contig sequences are annotated with putative functions by searching against protein database. All the assembly and annotation results are then stored in the PlantGDB MySQL database tables and made accessible through the web. This project provides a real-world bioinformatics problem-solving experience for the participating student. The computational problem here is definitely solvable. It is especially suitable for students with strong computer science background (e.g. knowledge in C, intereste in parallel computing, etc.). No strong biology background is required. The work resulting from this project will have immediate deep impact on the plant research community for gene discovery. The participant is expected to work with the PaCE computer program (http://bioinformatics.iastate.edu/bioinformatics2go/PaCE/). The student is responsible for building an automatic pipeline for the above processes. Other training involved in this project includes database and web-interface programming.
Web Resources: PlantGDB
References: Kalyanaraman, A., Aluru, S., Kothari, S. & Brendel, V. (2003) Efficient clustering of large EST data sets on parallel computers. Nucl. Acids Res. 31, 2963-2974.