An Optimal Design Criterion for Within-Individual Covariance Matrices Discrimination and Parameter Estimation in Nonlinear Mixed Effects Models

In this paper, we consider the problem of finding optimal population designs for within-individual covariance matrices discrimination and parameter estimation in nonlinear mixed effects models. A compound optimality criterion is provided, which combines an estimation criterion and a discrimination criterion. We used the D-optimality criterion for parameter estimation, which maximizes the determinant of the Fisher information matrix. For discrimination, we propose a generalization of the T -optimality criterion for fixed-effects models. Equivalence theorems are provided for these criteria. We illustrated the application of compound criteria with an example in a pharmacokinetic experiment.


Introduction
In the application of optimal design, one of the basic assumptions is to assume that the model used to describe a given phenomenon or process is the correct model. In practice, however, several candidate models may exist. One way of selecting the most appropriate model from among several candidates is by conducting an experiment designed in such a way that the observations obtained allow us to discriminate between the models in the best possible way. In addition, it is useful and advantageous that this design works well for estimating selected model parameters efficiently. This leads to the problem of finding the optimal experimental conditions by using optimality criteria for both model discrimination and model parameter estimation simultaneously.
In the case of fixed-effects models, a commonly used parameter estimation criterion is D-optimality, which maximizes the determinant of the variancecovariance matrix of estimated parameters (Atkinson, Donev & Tobias 2007). For models with normal errors, Atkinson & Fedorov (1975) proposed the Toptimality criterion for discriminating between two competing-response models, which provides the most powerful F-test for the lack of fit of one model when the other is assumed to be true. The criterion has been generalized for others fixed effects models (e.g., Uciński & Bogacka, 2005). López-Fidalgo, Tommasi & Trandafir (2007) proposed the KL-optimality criterion based on the Kullback-Leibler distance. This criterion is an extension of the T -optimality criterion and it can be used when the rival models are nested or not, homoscedastic or heteroscedastic, and with any error distribution. These criteria have been used in compound criteria for model discrimination and efficient parameter estimation (Atkinson, 2008 andTommasi, 2009).
The nonlinear mixed-effects models are particularly useful in longitudinal studies, such as population pharmacokinetics experiments, assay analyses, and studies of growth in which a limited number of samples can be obtained from each individual. These models distinguish two classes of variation: random variation among observations within a given individual (within-individual) and random variation between-individuals (Davidian & Giltinan 1995). This separation of variability allows the estimation of population characteristics from sparse samples per individual in a set of subjects without requering individual estimation of the parameters. The pattern of within-individual variability is not necessarily homogeneous, and depending on the application and nature of the data, it is possible to incorporate heteroscedasticity by specification of a variance function. In the context of mixed effects models, a population experimental design is defined by the number of individuals to study and the individual designs to be performed in the individuals (number of samples and the sampling times) (Mentré, Burtin, Merlé, Bree, Mallet & Steimer 1995). Assuming that the response function and the between-individuals variability model are correctly specified, it may be of interest the problem of designing an experiment in groups of individuals with sparse samples and different sampling scheme, for discrimination between two competing within-individual covariance models and parameter estimation.
The most common approach to optimal population design is D-optimal (e.g., Baccar, 1997 andGagnon &Leonov, 2005, among others). To discriminate between two models, some authors proposed extensions of Toptimality criterion (e.g., Waterhouse, Redmann, Duffull & Eccleston, 2005, Vajjah & Duffull, 2012and Castañeda & López-Ríos, 2016. Kuczewski, Bogacka & Uciński (2008) proposed an extension of the T -optimality criterion for multivariate mixed models and it can be applied directly when all individuals are observed under the same experimental conditions. In contrast to the above problems, the model discrimination and parameter-estimation problem have not been developed in great detail for mixed models. Waterhouse, Redmann, Duffull & Eccleston (2005) proposed the product D-optimality criterion based on the product of the determinants of the Fisher-information matrices. For nonlinear models, however, such designs may be less efficient for discriminating than the T -optimal designs. Therefore, alternative methods for the dual purposes of model discrimination and parameter estimation are required.
In this paper, we consider the problem of finding optimal population designs for discrimination between two within-individual variance models and efficient parameter estimation. A compound optimality criterion is provided, which combines an estimation criterion and a discrimination criterion. The parameter estimation criterion we used is the D-optimality criterion, and for discrimination, we propose a generalization of the T-optimality criterion for fixed-effects models. Our approach can be applied to population studies for groups of individuals with sparse samples and a different sampling scheme, where each sampling scheme is a point in a finite space of admissible sampling sequences. This paper is organized as follows: In Section 2, we present the nonlinear mixedeffects model, optimal design concepts, and D-optimum designs. In Section 3, we derive a generalization of the T-optimality criterion, and a necessary and sufficient condition for a design to be optimum is given. In Section 4, we propose a compound criterion for discrimination and estimation and provide an equivalence theorem. In Section 5, an illustrative example is presented. Finally, our conclusions and further work are detailed in Section 6.

Model
We assume that for each individual i in a sample of N individuals, from a larger population, the number of different available observations is n. Let y i = (y i1 , . . . , y in ) T be the vector of repeated measurements for the ith individual, and t i = (t i1 , . . . , t in ) T the n × 1 vector of sampling times where t ij belongs to a finite set X . It is assumed that measurements made on different subjects are independent. To model the relationship between y i and t i , we consider the nonlinear mixed model (Demidenko 2004): ) T with f a known nonlinear function of β, Z i is an n × q full-rank matrix of known constants, b i is a q × 1 vector of random effects associated with individual i, and ε i is the n × 1 withinindividual errors vector. It is assumed that the b i are independent and normally distributed with mean 0 and variance-covariance matrix D, and that b i and ε i are independent. It is assumed that matrix and is called the within-individual covariance matrix, which depends on the parameter λ(a × 1). The matrix G takes into account the nature of the variation among observations on a given individual and may be choosen in such a way that reflects the heterogeneity of variance, the correlation structure, or both. In this work, we assume uncorrelated observations and heterogeneous variance for the within-individual variability, so that G(t i , λ) is a diagonal matrix with diagonal elements Var(ε ij ) = g(t ij , λ), where g is a variance function. We assume that the individuals share a common within-individual covariance pattern that only varies between them through the possibly different values of t i , and that this does not depend on the parameter β. The distinct elements of the parameter vector λ and the covariance matrix D can be arranged in a single vector γ ∈ Γ ⊂ R c of variance parameters of dimension c = a+q(q+1)/2. Under the assumption of independence and normality of the distributions of b i and ε i , the marginal distribution of y i is n-variate normal with mean vector and variance-covariance matrix given by In this model, the full parameters vector to be estimated is

Population Designs
Suposse that the sample of N individuals consists of s groups each of size n k and that the individuals in the same group are observed under the conditions vector t k = (t k1 , . . . , t kn ) T . The collection of t k and n k , represented by is a population design. The set X n is the design region, and points t k are the design points. The collection ζ N = {t k , ω k } s 1 where ω k = n k N is the proportion of subjects allocated to each group is the normalized or exact population design with weights vector ω = (ω 1 , . . . , ω s ) (Gagnon & Leonov 2005). For fixed values of the total number of individuals N and the number of sampling times n, the population optimal design problem consists of finding the design by a choice of distinct values for the sampling times vector t ∈ X n and values for the number subjects assigned to vector t in such a way that the resulting design maximizes some optimality criteria. The most commonly used optimality criteria usually depend on the unknown model parameter. One approach is to construct locally optimal designs, which require specifying a prior parameter estimate, then addressing the optimization problem for this specific value (Chernoff 1953). In this article, the approximate design ζ, in which the weights ω k may be any real numbers from the interval [0, 1], is considered. The set of points t k in the design region X n , for which the design ζ has nonzero weights ω k , is the support set of ζ.

D-optimality Criterion
D-optimality is the most widely used criterion for efficient parameter estimation, which determines a design ζ maximizing where M (ζ, θ) is the Fisher information matrix for the population design ζ which can be expressed as the sum of the Fisher information matrix , where ℓ(θ; y i ) is the log-likelihood of the vector of observation y i for the population parameter θ. This matrix can be calculated using the general formula for observations with normal distribution, For the class of approximate population designs Ξ, a population design ζ * ∈ Ξ with a nonsingular Fisher information matrix M (ζ, θ) is a D-optimum design for the parameter estimation θ, if and only if ϕ D (t, ζ * ) ≤ 0, ∀t ∈ X n , where is the directional derivative of the criterion Φ D (M (ζ, θ)) at ζ in the direction of δ ζt = ζ t − ζ, and ζ t is the design which puts the whole mass at point t (Gagnon & Leonov 2005). The D-efficiency of a design ζ relative to the optimum design ζ * . This efficiency is a number in (0,1) which measures the goodness of a design ζ for estimating purposes.

Discrimination Between Two Within-Individual Variance Models
In this section, we propose a generalization of T -optimality criterion for discrimination between two nested within-individual variance models. The criterion determines a design that maximize the power of likelihood ratio test when the largest model is assumed to be the true model. Our approach can be applied to population studies for groups of individuals with sparse samples and a different sampling scheme.
Let G 1 (t, λ 1 ) and G 2 (t, λ 2 ) be two models for the within-individual variability where one model is nested within the other. For example, if G 2 is nested within of G 1 , it would mean that both models involve the same structure G(t, λ), and the parameter space Γ 2 of G 2 is a subset of the parameter space Γ 1 of G 1 , which is defined by the imposition of κ equality constraints. In order to discriminate between these models, we propose the following criterion: assuming that G 1 is the true model for a known parameter vector γ 0 1 , a locally optimum approximate population design ζ * to discriminate between G 1 and G 2 is such that where with and Σ r (t, γ), r = 1, 2, is given by (2). The design ζ * be called T G -optimal.
The justification of criterion is given below. Without loss of generality, consider the exact population design ζ N = {t i , ω i } N 1 where ω i = 1 N . To discriminate between G 1 and G 2 consider the problem of testing the hypothesis: Since y i are N n (f (t i , β), Σ(t i , γ)) random vectors, the likelihood function under the model G r is Thus, the likelihood-ratio test function usually associated to the hypotheses (9) is given by If the model G 1 is the true model for a specific alternative value γ 0 1 ∈ Γ 1 and γ * 2 is an estimate for γ 2 , then From this expression, the power of the likelihood ratio test for the hypotheses (9) increases with T G (ζ N ) and hence can be maximized by the choice of design ζ N .
Finally, the exact design ζ N can be replaced by the corresponding approximate design ζ, thus we obtain the T G -criterion defined in (7).
In order to provide precise conditions for checking whether a particular design is T G -optimum, we say that a design ζ is a T G -regular design if the set Γ 2 (ζ) = γ 2 : γ 2 (ζ) = arg min is singleton; otherwise it is a T G -singular design. Hence, if ζ is a T G -regular design, and γ 2 ∈ Γ 2 (ζ) then γ 2 is the unique solution of the equation Assuming that Γ 2 is a compact set and Σ 2 (t, γ 2 ) is a twice continuously differentiable in Γ 2 , the equivalence theorem for T G -optimality criterion can be formulated as follows. The proof of this theorem is similar to the proof of Theorem 1 in Castañeda & López-Ríos (2016).

Theorem 1. Let ζ * be a T G -regular design.
(i) A necessary and sufficient condition for the design ζ * to be T G -optimal is is the directional derivative of T G (ζ) at ζ in the direction of δ ζt = ζ t − ζ, and ζ t is the design putting all mass at the point t ∈ X n .
(ii) The function ϕ T G (t, ζ * ) achieves its maximum value at the support points of the optimal design ζ * .

(iii) The set of T G -optimum designs is convex.
A measure of efficiency of a design ζ relative to a T G -optimum design is

DT α G -optimality Criterion
In order to provide population designs for both discrimination and efficient parameters estimation, we propose a compound criterion based on the combination of D-optimality for each model and T G -optimality, specifically the criteria used are: • Φ 1 (ζ) = T G for discrimination assuming that G 1 is the true model where M r (ζ) is the information matrix under the model G r for some known θ 0 r . We adopted the approach proposed by López (2008) which consists of maximizing the geometric mean of the efficiencies for a given design ζ with respect to each criterion, weighted by a predefined constant α, where 0 ≤ α ≤ 1. We give equal weighting to the estimation criteria and other weighting for discrimination. Thus, assuming that the model G 1 is the true model, we propose the following criterion: where ζ * j = arg max ζ∈Ξ Φ j (ζ). Taking the logarithm of (14), we obtain where C is a constant that does not depend on the design ζ. The approximate design that maximizes the criterion (15) over the set Ξ is DT α G -optimum. The design criterion (15) is concave because it is a convex combination of concave functions. Therefore, the DT α G -design criterion satisfies the conditions of convex optimum design theory, and the following equivalence theorem may be stated: Theorem 2. Let ζ * be a T G -regular design with a nonsingular Fisher information matrix M r (ζ), r = 1, 2. A necessary and sufficient condition for the design ζ * to be DT α G -optimal is where is the directional derivative of criterion function at ζ in the direction of δ ζt . The function ϕ DT α G (t, ζ * ) achieves its maximum value at the support points of the optimal design ζ * .
A measure of the efficiency of a design ζ relative to a DT α G -optimum design is

An Example
To illustrate the proposed criterion, we consider the data set presented in Racine, Grieve, Fluhler & Smith (1986) on the plasma concentrations of cadralazine in 10 German cardiac-failure patients who were observed between 2 and 32 hours after a single 30mg dose of cadralazine. We propose to fit the nonlinear mixed model shown above where only the intercept is random; thus, for simplicity, Z i = 1 for all i. This model is where b i is the random effect associated with individual i. We assume that b i ∼ N (0, ψ) and ε i ∼ N n (0, G(t i , λ)). Thus, the full parameters vector to be estimate is θ = (β T , γ T ) T where β = (β 1 , β 2 ) T and γ = (ψ, λ T ) T .
We consider two alternative models for the within-individual covariance matrix G(t i , λ): (1) Exponential model We fit the model (18) with each within-individual covariance structure in R Development Core Team (2014) using the nlme function for fitting nonlinear mixed-effects models (Pinheiro & Bates 2000). The results of the estimation are: β 1 = (14.93, 0.21) T ,ψ 1 = 0.23 2 ,σ 2 1 = 0.08 2 ,δ = 0.02 β 2 = (15.01, 0.19) T ,ψ 2 = 0.20 2 ,σ 2 2 = 0.10 2 Since the model G 2 is nested within G 1 , the true model is G 1 . We use the estimates of the parameters obtained previously as the local value of θ to find the locally optimal designs. Thus, The times in X was taken of the set of measurement times for the experiment: , 4, 6, 8, 10, 24} hours We assume that two observations are available for each patient n = 2, which is a sparse sampling situation in a pharmacokinetic study. Therefore, the design region is given by the set of combinations of two sampling times from X ; that is, X 2 = {t = (t 1 , t 2 ); t j ∈ X } contains 15 elements. The DT α G -optimum design was calculated to optimize the respective criterion implemented through an algorithm in R Development Core Team (2014) for different values of α (0.25, 0.50, 0.75). The function nlminb was used for the optimization in the design region X 2 . Additionally we find the D-, T G , and product D-optimum designs (ζ * P ) to obtain an overview of the individual designs. The computing times took approximately 1 hour for the compound designs. The computation complexity depends on the size of the design region and the number of parameters. To check the optimality of the calculated compound designs, we use the Equivalence Theorem. First, we enumerate all candidate sampling sequences (i.e., the elements of X 2 ) and calculate the sensitivity function for each sampling sequence. Then, we plot the sensitivity function as a function of index i. The resulting plot for each design is shown in Figure 1.   Table 1 presents the efficiencies of different optimum designs with respect to the D-and T G -optimum designs. The first and second column list the efficiencies of the optimum designs with respect to the D-optimum designs and corresponds to G 1 and G 2 models, respectively. These efficiencies measure the goodness of optimum designs for parameter estimation. The T G -optimum design does not seem good for estimation purposes; for the G 2 especially, the efficiency is approximately 38%. On the other hand, the efficiencies of ζ P -and DT α G -optimal designs are always high; the worst efficiency (α = 0.75) is approximately 62%. The third column presents the efficiencies of the optimum designs with respect to the T Goptimum design, which measure the goodness of a design for model discrimination. For the DT α G -optimal designs, these efficiencies seem good; the lowest efficiency (α = 0.25) is approximately 78%. The product-optimum design does not seem to be good for discrimination purposes; the efficiency is approximately 44%. Thus, the best strategy for both parameter estimation and model discrimination purposes seems to be the DT α G -optimality criterion. In order to choose an adequate value α to successfully discrimination and estimation, we compute the efficiencies for D-and T G -optimality for different values of α. Figure 2 graphically displays these efficiencies. For α approximately 0.3, the efficiencies are high; therefore, the compound design DT 0.3 G is good for discriminating and estimating.

Conclusions
In this article, a compound optimality criterion for model discrimination and parameter estimation was proposed for two nonlinear mixed effects nested models with respect to the within-individual variance pattern. We used the D-optimality criterion for estimation, and we proposed a generalization of T-optimality criterion (the T G -optimality criterion) for the problem of model discrimination. An equivalence theorem for the criterion was provided. To illustrate the use of proposed methodology, a pharmacokinetic example was presented. To show the good properties of the optimum compound design, the D-, T G -and product D-optimum designs were computed; for discrimination, the efficiencies of the compound optimum designs were higher than the D-and product D-optimal designs. The design efficiencies for different values of the weight in the compound design α show that the design for α = 0.3 is good for estimating and discriminating. Further development of the method for estimating the model parameters, some nonlinear functions of parameters, and simultaneously discrimination between models would also be interesting. In this work, we considered nonlinear mixed model with normality of both within-subject random error and random-effects, however, there is often interest in models with nonnormal distribution. Thus, the KL-optimality criterion could be an interesting methodology to explore in this class of models. Finally, it should be noted that we have found local designs, but the other option is the use of the Bayesian approach.