Projects

Random Forest Confidence and Prediction Intervals

Random forests are known to provide excellent point predictions in a variety of contexts, but methods for presenting accurate assessments of the uncertainty associated with random forest predictions are lacking.  Baker Center personnel (Zimmerman, Nordman, and Nettleton) are working to develop a new approach for constructing intervals that accurately reflect the uncertainty of random forests predictions.  A random forest confidence interval provides a range of values that contains (with a specified probability) the mean of the response variable for given values of the predictor va Read more about Random Forest Confidence and Prediction Intervals

Linear Mixed Model Analysis of Proteomic Data with Many Missing Values

Label-free proteomic data can be used to obtain abundance measurements for many proteins that can be compared across multiple biological samples.  Abundance measurements of peptides within a protein are combined together to provide a measurement of protein abundance.  Values for many peptides can be missing in one or more samples.  Missing values can be a sign of low abundance, but some missing values occur independently of protein abundance.  Thus, care must be used when interpreting missing values.  In a collaboration with Animal Science faculty (Dekkers, Lonergan Read more about Linear Mixed Model Analysis of Proteomic Data with Many Missing Values

Comparing the Ocular Microbiota of Calves that Do and Do Not Develop Pinkeye

Infectious Bovine Keratoconjunctivitis (IBK), also known as pinkeye, is a disease that has an important impact on the health and welfare of beef calves.  In a project led by Veterinary Diagnostic & Production Animal Medicine Professor Annette O'Connor, Baker Center personnel (Lithio and Nettleton) are analyzing microbial community data collected from the eyes of calves prior to disease onset.  Samples from the eyes of calves that subsequently develop the disease are compared to samples from the eyes of calves that remain disease free in a case-cohort study.  Microbes that Read more about Comparing the Ocular Microbiota of Calves that Do and Do Not Develop Pinkeye

Using RNA-Seq to Elucidate How Light Influences the Biology of a Plant Pathogen

Light bolsters plant defenses against pathogens, but the role of light in influencing the virulence and behavior of plant pathogens is poorly understood.  Photosensory proteins in plant pathogenic bacteria provide an experimental means to link light cues to specific behavioral outcomes.   A project led by Professor Gwyn Beattie in the Department of Plant Pathology and Microbiology is using RNA sequencing (RNA-seq) to identify pathogen genes that respond to light of various wavelengths, and the subsets of these genes that depend on each of the photosensory proteins.  Beha Read more about Using RNA-Seq to Elucidate How Light Influences the Biology of a Plant Pathogen

Adjusting for Spatial Effects in Genomic Prediction

Phenotypes measured on plants grown in fields can be spatially correlated.  Such correlation can arise because plants growing near each other may share a common microenvironment that differs from the microenvironment experienced by plants in other parts of the field.  This microenvironmental variation can induce phenotypic similarity among neighboring plants.  When such spatial effects exist but are unaccounted for in analysis, decisions about which plant genotypes are expected to perform best with regard to one or more phenotypic traits can be adversely affected.  Baker Read more about Adjusting for Spatial Effects in Genomic Prediction

Associating Single Nucleotide Polymorphisms and RNA-seq Measures of Transcript Abundance with Quantitative Traits

Single nucleotide polymorphisms (SNPs) can be used to track genomic variation at millions of loci across genotypes.  RNA sequencing (RNA-seq) can be used to measure transcript abundance levels of tens of thousands of genes.  Baker Center personnel (Li, Wang, and Nettleton) are developing methods for modeling quantitative traits as additive functions of SNP effects and nonlinear transcript abundance effects.  Adaptive group LASSO (AGLASSO) methods are used to select and estimate the effects of important SNPs and transcripts. Read more about Associating Single Nucleotide Polymorphisms and RNA-seq Measures of Transcript Abundance with Quantitative Traits

Regression-Enhanced Random Forests

Random forest methodology is a useful statistical learning methodology for predicting response values (e.g., corn yield) from predictor variables (e.g., soil type, soil moisture, and temperatures).  Random forest predictions are weighted averages of response values that occur in a training dataset.  The random forest algorithm provides a clever mechanism for determining weights assigned to each training dataset observation when predicting the response value associated with given values of the predictor variables.  However, one limitation of random forest predictions is that, Read more about Regression-Enhanced Random Forests