Genomewide_prediction

Genomewide_prediction

Material for the Course “GENOME-WIDE PREDICTION OF COMPLEX TRAITS IN HUMANS, PLANTS AND ANIMALS (GWP)”

Instructors: Evangelina Lopez de Maturana, Oscar Gonzalez-Recio

This course will introduce students to perform prediction of complex traits using genomic information. Each day the course will start at 14:00 and end at 20:00 (CET).

Preparatory_steps:

For computing, we will use our EC2 AWS cloud, where most of the software needed for this course are already installed.

You will, therefore, only need a few applications installed on your laptop: SSH client Windows: MobaXterm

Mac/Linux: not required, terminal should be installed as standard

FTP client - transfers files to/from the server Windows/Mac/Linux -Filezilla Client

This is my recommendation but any FTP client should be fine, including Mac/Linux built-in

Please make sure that you have installed on your laptop R and RStudio

Once you have R and R Studio installed on your laptop, please install this list of packages using this command:

rpkgs<-c("BGLR", "snpReady", "data.table", "pheatmap", "rsample", "coda", "ggplot2", "ROCR", "tidyverse", "rmarkdown","knitr", "pander", ‘remotes’, ‘bigreadr’, ‘ggpubr’)
install.packages(rpkgs)

remotes::install_github("privefl/bigsnpr")

It is likely that when you install snpReady you get a message saying that ‘impute’ R package is necessary. You can install it as follows.

 if (!require("BiocManager", quietly = TRUE))
 install.packages("BiocManager")
 BiocManager::install("impute")

The ultimate check whether a package installation was successful is to load the package into your R session via:

library() #eg library(ggplot2)

Content of the course

Day 1: Concepts review

  • Presentation (E&O)
  • General Introduction / Overview of the Course [General Introduction]
  • Introduction to Genome-wide Prediction in Human genetics and Animal and Plant breeding. Breeding value vs Polygenic Risk Score. Factors affecting reliability of GWP. (E). Slides
  • Review of Quantitative genetics (personal assignment). Slides
  • Overview of genome-wide prediction in animals. Slides
  • Linear mixed models. Slides
  • Genotype imputation procedures (design the reference population). Slides
  • Lab 1: imputation. code training.ped training.map testing.ped testing.map

Day 2: Imputation

  • Breakout-rooms: Design of analytical approaches. (E&O)
  • The ‘Curse’ of Dimensionality in large p small n problems. Regularization and shrinkage estimation. Slides
  • Resemblance among relatives: Pedigree vs Genomic-based. (E). Slides
  • Lab 2: building relationship matrices (E). code data

Day 3: Kernel and Bayesian regression methods for GWP

Day 4: Machine Learning methods for GWP

Day 5: Practical session

  • Build your own Genome-enabled prediction. Breakout rooms

This is your reference population and the corresponding map file, and these are the candidate individuals and SNP map file to predict their genomic value.

Hackathon steps:

  • Imputation
  • Determine your predictive accuracy (internal), with different methods/models
  • Predict yet-to-be observed phenotypes with your preferred method(s)
  • submit results to instructors for final check

Organization of the code for the practical Sessions

Day 1

  • Code example to show the infinitesimal model
  • Exercise on solving equations using residual updates.
  • Imputation

Day 3

Day 4