Research Projects for Fellows, June 20 - August 6, 2005
P1: Evaluation of EST clustering results
| Title: |
Evaluation of EST clustering results. |
| Mentors: |
Dr. Qunfeng Dong and
Dr. Volker Brendel
|
| Description: |
ESTs (Expressed Sequence Tags) are valuable data for gene discovery, especially for plant species with large genomes
that have not been fully sequenced. ESTs provide a convenient means of accessing the transcriptome of a given species.
However, ESTs generally correspond to only partial cDNA sequences, and EST samples are typically highly redundant
(especially if EST sets are not derived from normalized EST libraries). Therefore, a major task for the PlantGDB database
(http://www.plantgdb.org) is to assemble overlapping ESTs
into putative unique transcript contigs on a frequent and regular basis.
The resulting contig sequences are annotated with putative functions by searching against protein database. All the assembly and
annotation results are then stored in the PlantGDB MySQL database tables and made accessible through the web.
This project provides a real-world bioinformatics problem-solving experience for the participating student.
The computational problem here is definitely solvable. It is especially suitable for students with strong computer
science background (e.g. knowledge in C, intereste in parallel computing, etc.). No strong biology background is required. The work
resulting from this project will have immediate deep impact on the plant research community for gene discovery. The
participant is expected to work with the PaCE computer program
(http://bioinformatics.iastate.edu/bioinformatics2go/PaCE/).
The student is responsible for building an automatic pipeline for the above processes. Other training involved in this project
includes database and web-interface programming.
|
| Web Resources: |
PlantGDB
|
| References: |
Kalyanaraman, A., Aluru, S., Kothari, S. & Brendel, V. (2003)
Efficient clustering of large EST data sets on parallel computers.
Nucl. Acids Res. 31, 2963-2974.
|
|