Notes
Slide Show
Outline
1
Efficient Estimation of Breeding Values from
Dense Genomic Data
2
Genomic Calculations
  •  Genotypes soon available from BFGL:
    • 50,000 SNPs / animal
    • 3,000 animals, many more possible
    • Need efficient computing algorithms
  • Traditional PTAs available from AIPL:
    • PTAs combine phenotypes and pedigree
    • SNP effects evaluated in second step using deregressed PTAs weighted by reliability
3
Genomic Computer Programs
  • Simulate SNPs and QTLs
    • Compare SNP numbers, size of QTLs
  • Calculate genomic EBVs
    • Use selection index, G instead of A
    • Use iteration on data for SNP effects
  • Form haplotypes from genotypes?
    • Not tested yet, SNP regression used
4
Simulation Program
  • Save memory by processing each chromosome separately
    • 3,000 Holstein bulls to genotype
    • 17,000 ancestors in pedigree file
    • 1 billion (20,000 x 50,000 SNPs) genotypes simulated per replicate
    • Only 150 million (3,000 x 50,000) genotypes stored for evaluation
5
Linear Estimates using Markers
  • Selection index equations for EBV
    • u^ = Cov(u,y) Var(y)-1 (y – Xb)
    • u^ = Z Z’ [Z Z’ + R]-1 (y – Xb)
    • R has diagonals = (1 / Reliability) - 1
  • BLUP equations for marker effects, sum to get EBV
    • u^ = Z [Z’R-1Z + I k]-1 Z’R-1(y – Xb)
    • k = var(u) / var(m)
6
Non-linear vs Linear Models
7
Marker Effect Prior Distribution
Nonlinear Model
8
Iteration on Data
  • Simple trick to reduce time from quadratic to linear with # SNPs
    • Sum coefficients x solutions once
    • Sum – diagonal = 3 off-diagonals
    • Janss and de Jong, 1999 conference
    • Rediscovered by Legarra and Misztal
  • Elements of Z are –p and (1 – p), where p is frequency of 2nd allele
9
Computer Memory
  • Inversion including G matrix
    • Animals x markers to hold genotypes
    • Animals2 to hold elements of G
    • <1 Gbyte for 50,000 SNPs, 3000 bulls
  • Iteration on genotype data
    • Markers + animals
    • <.1 Gbyte for 50,000 SNPs, 3000 bulls
    • Little memory required for either
10
Computing Times
  • Inversion including G matrix
    • Animals2 x markers to form G matrix
    • Animals3 to invert selection index
    • 10 hours for 3000 bulls, 50,000 SNPs
  • Iteration on genotype data
    • Markers x animals x iterations
    • 16 hours for 1000 iterations
    • .997 correlation with inversion
11
Convergence
with iteration on data
  • Jacobi iteration
    • Use previous round coefficients x solutions
  • Adaptive under-relaxation
    • Increase relax if convergence improving
    • Decrease relax (each round) if diverging
  • Solution convergence reasonable
    • SD of change < .0001 after 350 rounds
    • SD of change < .000001 after 1700 rounds
12
Potential Results
Simulation of 50,000 SNPs, 100 QTLs
13
Reliability from Genotyping
  • Daughter equivalents
    • DETotal = DEPA + DEProg + DEYD + DEG
    • DEG is additional DE from genotype
    • REL = DEtotal / (DETotal + k)
  • Gains in reliability
    • DEG could be about 15 for Net Merit
    • More for traits with low heritability
    • Less for traits with high heritability
14
Conclusions
  • Predictions from 50,000 SNPs using:
    • Selection index equations, or
    • Iteration on genotype data
    • Predictions correlated by up to .9999
  • Linear and nonlinear costs OK
    • Convergence within 200 to 2500 rounds
    • Nonlinear regression improved reliabilities
  • Real data predictions available soon