6.1 Derivation of Best Prediction

Individual daily yield can be modeled as the expected value of a management group plus a deviation from that mean:

$\displaystyle y_i = E(y_i) + t_i$

where $ y_i$ is an individual yield on test day $ i$ , $ E(y_i)$ is the expected yield for an animal in the same management group [Wiggans, Misztal, and Van VleckWiggans et al.1988] on the same test day, and $ t_i$ is a deviation from the group mean on the same test day. Suppose that $ \mu$ is a vector of expected values for each day of lactation for a single trait, $ t$ is a vector of 305 test day deviations for the trait, and $ t_m$ is a vector of only the measured deviations. The means and variances of $ t$ and $ t_m$ are assumed known with $ V(t) = V$ and $ V(t_m) = V_m$ . The covariance between $ t$ and $ t_m$ , $ C$ , is also assumed known. Predictions for lactations longer than 305 d can also be made [Dematawewa, Pearson, and VanRadenDematawewa et al.2007].

Lactation yield. A cow's true 305-d yield ($ y$ ) is the sum of the expected values for each day ($ 1'\mu$ ) plus the sum of her 305 deviations from expectations ($ 1't$ ), where $ 1'$ is a vector of 1s of length 305. The cow's true yield, and the best prediction of that yield ($ \hat{y}$ ), are:

$\displaystyle \begin{array}{lll}
y & = & 1't \\
\hat{y} & = & 1'CV^{-1}_{m}t_{m} \\

Reliability of $ \hat{y}$ is obtained from variances of $ y$ and $ \hat{y}$ , which are computed from the quadratic forms:

$\displaystyle \begin{array}{lll}
Var(y) & = & 1'V1 \\
Var(\hat{y}) & = & 1'C...
...y}) & = & Var(\hat{y}) / Var(y) \\
& = & 1'CV^{-1}_{m}C'1 / 1'V1

Procedures to adjust $ V_m$ if data are from partial days or are computer averages from several days or are recorded by an owner instead of a supervisor were outlined by VanRaden VanRaden1997.

Persistency. Persistency may be measured by multiplying test day deviations by a linear function of days in milk [Cole and VanRadenCole and VanRaden2006,VanRadenVanRaden1998]. Let $ d$ represent a vector whose elements, $ d_i$ , represent DIM associated with $ y_i$ . A measure of persistency that is phenotypically uncorrelated with lactation may be obtained by defining coefficients $ q_i = d_i - d_0$ , where $ d_0$ is a constant which acts as a balance point between yields in early and late lactation. This may be written in matrix form as $ q = d - 1d_0$ . The covariance between persistency and yield is: $ Cov(q't,1't) = q'V1 = (d'-1'd_0)V1 = d'V1 - 1'V1d_0$ for all values of $ d_0$ . The tipping point, $ d_0$ , that makes yield and persistency of a trait uncorrelated is obtained by setting $ Cov(q't,1't)$ to 0 and solving for $ d_0$ : $ d_0 = d'V1 / 1'V1$

VanRaden VanRaden1998 originally defined persistency on an across-lactation basis such that estimates were comparable across parities. The same tipping points ($ d_0$ ) were used for each trait across all cows regardless of breed or parity, and were updated by Cole and VanRaden ColeVR2006. With the implementation of smooth lactation curves (6.2) new tipping points were estimated. The new curves were found to differ enough by parity that it was not possible to select tipping points that resulted in phenotypic correlations of 0 between yield and persistency. As a result, persistency was redefined on a within-parity group basis. Persistency was originally comparable across cows of all breeds and lactations; under the new definition, cows may be compared across breeds within parity group. Tipping points for smooth and linear interpolated curves for each parity group and trait are presented in Table 6.1. Linear interpolated curves use the same tipping points for both parity groups and should produce persistencies comparable to those calculated by AIPLDCR. Persistencies are not comparable when calculated with different interpolation methods (see: parms-intmethod, parms-intmethodscs).

Persistency tipping points ($ d_0$ ) Curves & Parity & Milk & Fat & Protein & SCS

Smooth & First & 115 & 115 & 150 & 155 \\
& Later & ...
...& 161 & 159 & 156 & 155 \\
& Later & 161 & 159 & 156 & 155 \\
Note that the default tipping points are those for the smooth curves. If you are using linear interpolated curves then you must make sure that the correct tipping points are entered into the parms-dim0 array in the parameter file.

Matrix $ V$ accounts for changing phenotypic standard deviations and correlations across the lactation. Persistency and 305-d yield would be genetically uncorrelated if $ V$ were replaced w ith a 305-by-305 genetic variance matrix, $ G$ , which was not available or estimated in this study. Random regression models approximate $ G$ and $ V$ with polynomials, whereas best prediction approximates $ V$ with autoregressive correlations.

True persistency ($ p$ ), predicted persistency ($ \hat{p}$ ) and the expected value of persistency, $ E(p)$ , are given by:

$\displaystyle \begin{array}{lll}
p & = & E(p) + q't \\
\hat{p} & = & q'CV^{-...
...E(p) & = & q'\mu \\
& = & (d'-1'd_0)\mu \\
& = & d'\mu-d_0E(y)

Reliability of $ \hat{p}$ is obtained from variances of $ p$ and $ \hat{p}$ , which are computed from the quadratic forms:

$\displaystyle \begin{array}{lll}
Var(p) & = & Var(q't) \\
& = & Var[(d'-1'd_...
...) & = & Var(\hat{p})/Var(p) \\
& = & q'CV^{-1}_{m}C'q / q'Vq \\

The accuracy of lactation records from a wide variety of test plans can be compared using a http://aipl.arsusda.gov/reference/datarating.htmData Collection Rating, which is calculated as the squared correlation of estimated and true yields multiplied by a factor of 104 to give monthly testing a rating of 100 and daily testing a rating of 104.

A standardized estimate of persistency, $ \hat{s}$ , was obtained by subtracting the population mean for persistency ($ \mu_p$ ) and dividing by within-herd phenotypic standard deviation:

$\displaystyle \hat{s} = \frac{\hat{p}-\mu_p}{\sqrt{Var(p)}}

The mean and variance of $ \hat{s}$ are 0 and 1, respectively. Positive values of $ \hat{s}$ indicate increased persistency relative to an average cow and negative values of $ \hat{s}$ indicate decreased persistency. Cows with high persistency tend to milk less than expected at the beginning of lactation and more than expected at the end of lactation than cows with the same level of production and average persistencies.

Predicted persistency represents the component of persistency that is independent of yield. Druet et al.DruetJD2005 extracted eigenvalues from genetic covariance matrices and showed that they may be used as proxies for lactation yield and persistency in random regression test day models. While the derivation of those proxy traits is quite different that presented for $ \hat{p}$ , the two measures of persistency are conceptually similar in that they both represent persistency independent of yield.

Expected Values. Lactation curves may differ according to age, parity, breed, time, herd, and their interactions. In the past, adjustment factors were used to standardize lactation records. Let vector $ \mu$ contain the mature-equivalent or standard lactation curve, and let vector $ b$ contain the expected 305-day values for age, parity, season, year, and herd of interest. If $ \mu$ and $ b$ differ, adjustment factors can be used to standardize yield and persistency. Multiplicative factors should not be used for persistency to avoid division-by-zero and because differences in variance can be removed by creating a unitless trait. For yield, the additive adjustment is $ 1'(\mu-b)$ . For persistency, the additive adjustment is $ q'(\mu-b)$ , which is equivalent to $ (d'-1'd_0)(\mu-b)$ . This approach is simple, but has the disadvantage of not fully preserving curve shape. The lactation curve is scaled vertically by the yield factor and rotated by the persistency factor. For any group of interest, the assumed curve is $ (d'-1'd_0)(\mu-b)$ , which is the standard curve minus the persistency factor and then divided by the yield factor.

Expansion. Predicted yields and persistencies were expanded to conform to assumptions of commonly-used statistical models. Predicted values were divided by their reliability, holding the mean constant, to produce expanded yields ($ \tilde{y}$ ) and persistencies ($ \tilde{p}$ ). This is analogous to the de-regression step in multiple-trait across-country evaluations. The expanded values contain the corresponding true values plus independent errors [VanRaden, Wiggans, and ErnstVanRaden et al.1991]:

$\displaystyle \begin{array}{lll}
\tilde{y} & = & E(y) + [\hat{y}-E(y)] / Rel(\hat(y)) \\
\tilde{p} & = & E(p) + [\hat{p}-E(p)] / Rel(\hat(p)) \\

These expanded variables have greater variances than the true values, while the predicted values have lower variances:

$\displaystyle \begin{array}{lll}
Var(\tilde{y}) & = & 1'CV^{-1}_{m}C'1 / [Rel(...
...^{-1}_{m}C'q / [Rel(\hat{p})]^2 \\
& = & q'Vq / Rel(\hat{p}) \\

Although $ Cov(y,p) = 0$ , best predictions or expanded estimates of $ y$ and $ p$ may covary if tests do not represent the entire lactation, such as with records in progress. The covariance of $ \tilde{y}$ with $ \tilde{p}$ is $ Cov(\tilde{y},\tilde{p}) = 1'CV^{-1}_{m}C'q/ [Rel(\tilde{y})Rel(\tilde{p})]$ . Expanded yield and persistency records contain the normal environmental variance present in the true record plus an additional measurement error term that is independent of the true record [VanRaden, Wiggans, and ErnstVanRaden et al.1991]. The total error variances for yield $ [Var(\tilde{y}-u_y)]$ and persistency $ [Var(\tilde{p}-u_p)]$ are:

$\displaystyle \begin{array}{lll}
Var(\tilde{y}-u_y) & = & Var(\tilde{y}) + Var...
...{p}-u_p) & = & Var(\tilde{p}) + Var(u_p) - 2Cov(\tilde{p},u_p) \\

where $ u_y$ and $ u_p$ are the sum of random effects other than error contained in the models for yield and persistency, respectively. For example, $ u_y$ and $ u_p$ might contain genetic effects plus permanent environmental effects. If the measurement errors $ \tilde{y}-u_y$ and $ \tilde{p}-u_p$ are uncorrelated with $ u_y$ and $ u_p$ the covariance terms are variances of random effects $ u_y$ and $ u_p$ , and variances and covariances reduce to:

$\displaystyle \begin{array}{lll}
Var(\tilde{y}-u_y) & = & Var(\tilde{y}) + Var...
...}-u_y,\tilde{p}-u_p) & = & Cov(\tilde{y},\tilde{p}) - Cov(u_y,u_p)

Multiple Trait Prediction. VanRaden VanRaden1997 presented both single- and multiple-trait approaches for best prediction of yields but not for persistencies. The current version of BESTPRED supports single- and multiple-trait best prediction of both yield and persistency.

See About this document... for information on suggesting changes.