• Random forest methodology is a useful statistical learning methodology for predicting response values (e.g., corn yield) from predictor variables (e.g., soil type, soil moisture, and temperatures).  Random forest predictions are weighted averages of response values that occur in a training dataset.  The random forest algorithm provides a clever mechanism for determining weights assigned to each training dataset observation when predicting the response value associated with given values of the predictor variables.  However, one limitation of random forest predictions is that, as a weighted average or training dataset responses, all predictions are necessarily bounded by the range of response values in the training dataset.  Thus, random forests may not be able to generate accurate predictions of future response values if a response (lik

  • Single nucleotide polymorphisms (SNPs) can be used to track genomic variation at millions of loci across genotypes.  RNA sequencing (RNA-seq) can be used to measure transcript abundance levels of tens of thousands of genes.  Baker Center personnel (Li, Wang, and Nettleton) are developing methods for modeling quantitative traits as additive functions of SNP effects and nonlinear transcript abundance effects.  Adaptive group LASSO (AGLASSO) methods are used to select and estimate the effects of important SNPs and transcripts.

  • Phenotypes measured on plants grown in fields can be spatially correlated.  Such correlation can arise because plants growing near each other may share a common microenvironment that differs from the microenvironment experienced by plants in other parts of the field.  This microenvironmental variation can induce phenotypic similarity among neighboring plants.  When such spatial effects exist but are unaccounted for in analysis, decisions about which plant genotypes are expected to perform best with regard to one or more phenotypic traits can be adversely affected.  Baker Center personnel (Mao, Dutta, Wong, and Nettleton) are working to develop methods for genomic prediction that properly adjust for spatial effects when using phenotype data from the field to identify the best genotypes.

  • Johnny Pippins, an online graduate student at the University of Idaho and an inmate at the Anamosa State Penitentiary, is an L. H. Baker Center intern for the summer of 2019.  Under the guidance of Baker Center Director Dan Nettleton, Mr. Pippins is applying statistical methods to discover genes differentially expressed among patients with different types of lupus.  The project builds on research published by Rai et al.

  • Light bolsters plant defenses against pathogens, but the role of light in influencing the virulence and behavior of plant pathogens is poorly understood.  Photosensory proteins in plant pathogenic bacteria provide an experimental means to link light cues to specific behavioral outcomes.   A project led by Professor Gwyn Beattie in the Department of Plant Pathology and Microbiology is using RNA sequencing (RNA-seq) to identify pathogen genes that respond to light of various wavelengths, and the subsets of these genes that depend on each of the photosensory proteins.  Behavioral studies of pathogens altered in light-regulated genes are providing insights into the role of photosensing in pathogen biology.

  • Infectious Bovine Keratoconjunctivitis (IBK), also known as pinkeye, is a disease that has an important impact on the health and welfare of beef calves.  In a project led by Veterinary Diagnostic & Production Animal Medicine Professor Annette O'Connor, Baker Center personnel (Lithio and Nettleton) are analyzing microbial community data collected from the eyes of calves prior to disease onset.  Samples from the eyes of calves that subsequently develop the disease are compared to samples from the eyes of calves that remain disease free in a case-cohort study.  Microbes that differ in abundance between the two types of samples are potential targets for the development of new pinkeye-prevention treatments.

  • Label-free proteomic data can be used to obtain abundance measurements for many proteins that can be compared across multiple biological samples.  Abundance measurements of peptides within a protein are combined together to provide a measurement of protein abundance.  Values for many peptides can be missing in one or more samples.  Missing values can be a sign of low abundance, but some missing values occur independently of protein abundance.  Thus, care must be used when interpreting missing values.  In a collaboration with Animal Science faculty (Dekkers, Lonergan, and Tuggle), Baker Center personnel (Jeon and Nettleton) are coupling imputation techniques with linear mixed model analyses to identify proteins whose abundance differs across pig genetic lines and/or across infection treatments. 

  • Random forests are known to provide excellent point predictions in a variety of contexts, but methods for presenting accurate assessments of the uncertainty associated with random forest predictions are lacking.  Baker Center personnel (Zhang, Zimmerman, Nettleton, and Nordman) have developed a new approach for constructing intervals that accurately reflect the uncertainty of random forests predictions.  A random forest prediction interval provides a range of values that covers (with a specified probability) the response variable value associated with given values of the predictor variables.  The new intervals exhibit the advertised coverage probability and are narrower (more precise) than those produced by competing methods.