Method to Obtain a Vector of Hyperparameters: Application in Bernoulli Trials

The main difficulties when using the Bayesian approach are obtaining information from the specialist and obtaining hyperparameters values of the assumed probability distribution as representative of knowledge external to the data. In addition to the fact that a large part of the literature on this subject is characterized by considering prior conjugated distributions for the parameter of interest. An method is proposed to find the hyperparameters of a nonconjugated prior distribution. The following scenarios were considered for Bernoulli trials: four prior distributions (Beta, Kumaraswamy, Truncated Gamma and Truncated Weibull) and four scenarios for the generating process. Two necessary, but not sufficient conditions were identified to ensure the existence of a vector of values for the hyperparameter. The Truncated Weibull prior distribution performed the worst. The methodology was used to estimate the prevalence of two transmitted sexually infections in an Colombian indigenous community.


Introduction
When Bayesian inference methods are used for obtaining information about a phenomenon of interest, the elicitation of the prior probability distribution is a key process. According to Garthwaite, Kadane & O'Hagan (2005), the statistician or facilitator establishes contact with an individual considered to be a specialist (expert) on the research topic. The expert has valuable subjective information about the data-generating process, having acquired experience from repeated exposure to the phenomenon. In order to incorporate the information about the data-generating process provided by the expert in the statistical model proposed for the study, the statistician has to "express" such information as a probability distribution (prior distribution) to establish "communication" with the information contained in the data, which is expressed as a likelihood function.
Expressing the prior information as a probability distribution implies two challenges for the facilitator: to establish a theoretical model of probability that best represents the natural behavior of the indicator assumed to be an expression of the data-generating process conditions in the statistical model and to determine the numerical values of the parameters associated with the theoretical model of probability (hyperparameters), to ensure that this model is an adequate representation of the conditions under which the data-generating process occurs (phenomenon of interest). Different authors have studied the conditions considered in an elicitation process. Some authors such as Tversky & Kahneman (1973), Tversky & Kahneman (1974), Hogarth (1975) and Hogarth (1987), have studied the cognitive and psychological processes present in the individual who performs the experts functions when responding to interrogations by the facilitator related to the generating process. Winkler (1967) and Fowler & Floyd (1995) have studied the problem of how to formulate questions and what to ask the expert concerning the given issue. From a more probabilistic and statistical point of view, Murphy & Winkler (1974), Winkler (1967) and Kadane & Winkler (1988) have developed mathematical methods for obtaining different types of probability distributions that serve as representations of the level of presence-absence of prior information in the generating process. Likewise, Chaloner & Duncan (1983), Sindhu, Feroze & Aslam (2013), Tovar (2012), Penha (2014) and Flórez & Correa (2015) have worked on the development of mathematical procedures for obtaining the hyperparameter values that best represent the conditions of the data-generating process in the prior theoretical distribution model. On the other hand Moala & Penha (2016), have proposed a simulation methodology for evaluating the effectiveness of these procedures.
One of the most used method to obtain the vector of hyperparameters ϕ of prior distribution π, is to evaluate a quantiles interval from the specialist and calculate from this its midpoint α and a fraction β of its rank. Using this, information and the moments of the prior distribution π, the system of equations is constructed The vector ϕ * that satisfies the system defines a single prior distribution π(θ | ϕ * ) as representative of the beliefs of the specialist. Given that the assumed values for the mean and variance of the subjective distribution are fixed, this method ends up forcing the specialist to accept the prior distribution obtained or provide a new quantiles interval that generates a different vector ϕ * .
To avoid these difficulties, some authors have used Prior Predictive Distribution (PPD) to obtain the hyperparameter vector. Chaloner & Duncan (1983), considers Bernoulli trials, the prior Beta distribution, a quantile interval evaluated by the specialist and the PPD. On the other hand, survival times modeled with the Weibull distribution are used by Penha (2014), who assumes the bivariate Gamma distribution as a prior (nonconjugated) and constructs a system of equations with PPD approximations.
Except for Penha (2014), the methods of elicitation proposed by the aforementioned authors are characterized by assuming the conjugate of the likelihood function as prior distribution and obtaining information about the moments or percentiles of said distribution from the specialist to form a system of equations that provides the value of the hyperparameters. In this article this subject is addressed and a method to obtain hyperparameter values of nonconjugated prior distributions is proposed. Using the Laplace (1773) method, we developed asymptotic approximations of the PPD obtaining a system of nonlinear equations, which is solved using some numerical approximation as the Newton-Raphson method. The Laplace Method (LM), which has been studied extensively by Tierney & Kadane (1986), Azevedo-Filho & Shachter (1994) and Vidal (2014) in order to approximate posterior moments, marginal distributions and the Bayes factor, among others. In our literature review, we found one work that implemented the LM to obtain values of the hyperparameters of a prior distribution, (Penha 2014).
To illustrate the proposed method, a realization of a random sample of independent variables with a Bernoulli distribution is considered and four priors (Beta, Kumaraswamy, Truncated Gamma, Truncated Weibull) were assumed. A simulation study using four different values of the parameter to be estimated was carried out and the statistical performance of the Bayesian estimates was compared with those obtained using the classical approach. Data obtained from a study carried to estimate the prevalence of HIV and Syphilis infection in a Colombian indigenous community, were used to illustrate the performance of the proposed method.

Methodological Proposal to Obtaining Hyperparameter Values
In general the Figure 1 presents the steps of the method described in this section. The procedure used to approximate integrals is that developed by Laplace (1773) and formalized by authors such as Erdélyi (1956) and Bruijn (1961). The Laplace method formally indicates: Let h(.) and b(.) be analytical and smooth functions infinitely differentiable over a domain Ω, so that their Taylor series expansion around the local minimum θ of the convex function h (.) in Ω exists. Then, for a considerably high value of n, the integral has the following result and O(n −1 ) is the Landau notation and indicates that the following is fulfilled for a function g(n) = O(n −1 ); namely,

Obtaining the Hyperparameter Values
Consider that an experiment has been performed n times (in a heuristic way, with a computational simulation algorithm, through real realizations) so x (n) = {x 1 , . . . , x n } which is assumed at least exchangeable, i.e. each observation having distribution f (x | θ), where θ is an unknown scalar or vector of parameters that represents the conditions under which a generating process occurs. The information about θ contained in the sample of observations can be incorporated into the model through the likelihood function L(θ | x (n) ), which has the following form, In addition to the sample information, it is possible to consider in the construction of the statistical model that represents the situation under study, information "external" to the sample that has been obtained from the predictions of Expert A in the area of research and from the parametric space Ω that contains all the values of θ. Thus a probability distribution function, π(θ | ϕ) is sought, where ϕ is a hyperparameter vector of l components, whose values will have to be found to represent A's predictions. On some occasions, the Ω domain can be restrictive in selecting the prior; but it can be ignored by using truncated distributions or re-parameterizations.
The information represented in the likelihood function can be combined with the expression through π(θ | ϕ), using Bayes' formula, thereby obtaining the posterior distribution (2).
The denominator of (2) represents the marginal likelihood of (1), also called the PPD. Finding an analytical expression for (2) depends on the prior distribution used. To avoid this constraint, the asymptotic approximation provided by LM was used. The asymptotic approximation of the denominator in (2) is denoted as I L (x (n) | ϕ), where n represents the sampling size and allows determining the approximation accuracy; while ϕ is the constant vector that represents the hyperparameters of the prior distribution.
To implement the LM, the functions b(θ) and h(θ), which represent the PPD, must be chosen; thus b(θ) exp{−nh(θ)} = L(θ | x (n) )π(θ | ϕ). According to the literature, two parameterizations can be considered (selections of b and h). The first parameterization (3) is based on classical techniques, given that maximizing −nh(θ) is equivalent to maximizing the logarithm of the likelihood function. However, Kass & Raftery (1995) indicated that the approximation error increases when the prior distribution is informative.
On the other hand, in the second parameterization (4), the maximum value of −nh(θ) is called the posterior mode because it is the result of maximizing the logarithm of the product between the likelihood and the prior distribution; that is, the logarithm of the non-normalized posterior distribution. The latter parameterization is the one implemented in the proposed method.
Once the prior distribution has been chosen and I L (x (n) | ϕ) has been obtained, the information from Expert "A" is extracted as follows: i) A value m is selected for the number of elements that a hypothetical sample contains.
ii) Using m, characteristics of the expert's subjective distribution are evaluated, as: quantiles x (m),j , probabilities α j , correlation or association coefficient ρ j , mean, among other descriptive measures. The number of characteristics that must be evaluated coincides with the number l of components of ϕ.
iii) Trial samples are generated that satisfy each evaluated characteristic and the following system of l equations with l unknowns (components of ϕ) is constructed.
For example, consider a sample size (hypothetical) m and a vector of hyperparameters of l = 2 components. Two quantiles have been evaluated from the expert, x (m),1 and x (m),2 , with the probabilities α 1 and α 2 , respectively. Using this information, the specialist's Prior Predictive Pseudo-Distribution (PPPD) is defined, as: is the likelihood function defined by the trial sample of size m containing the quantile x (m),j . With this fact, the PPPD depends on the vector of hyperparameters ϕ. Thus, the method of obtaining hyperparameters consists of finding a vector of constants ϕ * that satisfies the following equation: The vector ϕ * representing the beliefs of the expert is obtained as the solution of system (5). Several methods may be used to determine such solutions, such as successive approximations, Newton-Raphson and variants (such as finite differences, Jacobi, Gauss-Sediel, and Quasi-Newton, among others). The Newton-Raphson method will be used as it exhibits quadratic convergence and other properties that were studied in detail by Dennis-Jr & Schnabel (1996).

n Bernoulli Observations
Let y 1 , . . . , y n be a sample of n observations obtained when performing n independent Bernoulli experiments, each with a θ ∈ [0, 1] probability of success (the sample can be only exchangeable). The sufficient statistic x = n i=1 y i is defined. Given the characteristics of the experiment, it is known that X ∼ Binomial(n, θ); thus the likelihood function of the parameter θ related to the sufficient statistic x will be Thus for any prior distribution π(θ | ϕ), the PPD will be To approximate L(x | ϕ), the LM is used, which requires the selection of functions −nh(.) and b(.), the calculation of the minimum value of h(.) and the second derivative of h(.). For the Bernoulli distribution, the parameterization (4) generates If parameterization (3) is used, expression (8) would be obtained without the term log π(θ | ϕ) -that is, only the logarithm of the binomial density-; thus the maximum of −nh(.) coincides with the maximum likelihood estimator, θ = x/n. Determining an analytical expression for the maximum or posterior mode of (8) θ depends on the shape of the selected prior distribution. In some cases, it is impossible to obtain it. Therefore, it is necessary to apply the numerical approach method called Bisection to find the roots of the following expression: Likewise, the asymptotic approximation of (7) by LM depends on n, x = x (n) and ϕ: Its analytical expression is: Four families of probability distribution are considered to model the prior information about the natural behavior of the parameter of interest. The first two (Beta, Kumaraswamy) are characterized by modelling the natural behavior of the random variables whose domain is the interval Ω = [0, 1]. The Beta distribution is the conjugate prior when a Binomial model is assumed for the likelihood function and its density function (11) is characterized by two form parameters, a, b > 0. If θ is a random amount of interest such that θ ∈ Ω, it can be assumed that θ ∼ Beta(a, b) so that ϕ = (a, b), then: When using (11) in (7), the PPD is obtained: The integrand of (12) may be a Beta distribution, and the closed form (13) can be obtained. L Other prior distribution assumed is the Kumaraswamy (Kum), with density It has the advantage of having a probability distribution that is analytically more treatable than Beta. This distribution is applicable to model phenomena, whose results have lower and higher limits, such as the weight of individuals, scores obtained from tests (Sindhu et al. 2013).
Besides of Kum and Beta distributions, we used as prior distributions, the truncated Gamma and the truncated Weibull. Given that, both correspond to theoretical models constructed for random variables that take non-negative values, for effects of the elicitation method when using Bernoulli trials, each one was truncated in the interval Ω = [0, 1] considering the process associated to probability of success θ ∈ Ω. The Gamma distribution, is characterized by having two parameters: one of form a > 0 and the other of inverse scale b > 0, also called a rate parameter or event rate. The expression of the Truncated Gamma (TG) distribution (14) contains a normalization constant (K), which depends of the values of the hyperparameters ϕ = (a, b).
The last prior probability distribution considered is the Weibull, which has two parameters: one of form a > 0 and the other of scale b > 0. The Truncated Weibull (TW) distribution, involves a normalization constant (K) that depends of the hyperparameters ϕ = (a, b).
The last two distributions were considered under the assumption that specialist thinks the probability of success takes values closed to zero in most of times.
With each of these prior distributions, the LM is implemented to determine an analytical approximation of the PPD (10). These expressions are presented in Table 2 and generate a surface of prior predictive probabilities that can be plotted on a region D ⊆ R 2 . For example, consider 10 successes obtained in 50 independent repetitions of a Bernoulli experiment, with D = [0, 100] 2 and the prior Beta as representative of a specialist's beliefs. On the left side of Figure 2, the approximation of the PPD (10) appears in blue and its closed form in red (13); while on the right side the approximation of the PPD (10) is presented, assuming the prior Kumaraswamy. These approximations are implemented to form the system of nonlinear equations (5). c) What is the number of elements x (m),mod that you consider would present the characteristic of interest with more frequency? This last question is related to the mode value in order to verify the consistency of the experts responses; therefore, The expert indicates that the number of elements with the characteristic of interest that will occur with higher frequency is x (m),mod = 8. Then, with the information obtained from the previous items a) and b), this is verified that 8 ∈ [5,10]. Thus the expert's information is coherent) Using the data presented in items (a) and (b), the system of nonlinear equations is defined (5), dependent on the vector ϕ = (a, b).
The expression I L (x (50),j | ϕ) represents the PPPD given by (10) for prior distribution and a pair of hyperparameter values ϕ. To determine the solution vector ϕ * , the Newton-Raphson method is used, and the initial point is obtained graphically through the system-level curves. Table 1 provides the most relevant results, such as the number of iterations, error vector and solution vector (value of hyperparameters). The results of the prior TW distribution are not presented because the PPPD was lower than the one provided by the specialist. ( In the process of obtaining the vector solution ϕ * of the system of nonlinear equations (16), two necessary conditions were identified but were not sufficient to guarantee the existence of the solution. The first indicates that I L (x (m),j | ϕ) must not be exceeded by the specialist's probabilities α j , j = 1, 2; and the second is that the amplitude of the interval evaluated (x (m),1 , x (m),2 ) should not be too width.
In this table the following expressions are used: Given that the hypothetical sample size (m) delivered to the expert can affect the process to obtain hyperparameters, four scenarios of the generating process expressed as probabilities of success were established, and three hypothetical sample values were assumed for each one. These data were presented for the specialist and in the Table 3 presents some of these results, as the maximum values generated by each prior distribution on the surface of the PPPD. For these cases, a mesh D = [0, 100] 2 was used initially. These surface limited the mesh (Table 4). For example, on the right side of Figure 2, the surface of the PPPD generated by the prior Kumaraswamy was presented in the scenario of θ = 0.15 and mesh D = [0, 100] 2 ; however, the region where the surface is far away from the ab plane is [0, 15] × [0, 100]. Other aspects that can be evidenced from Table  3 are: Table 3: Scenario of the generating process and the three sizes of hypothetical samples presented to the specialist.  • The maximum PPPD values increase as the size of the hypothetical sample decreases so that the values obtained from this distribution generate an upper limit for the specialist's α j .
• The information provided by the specialist generates a likelihood of beliefs, L(θ j | x (m)j ), which approximates the maximum PPPD values when the size of the hypothetical sample decreases.
• The prior distribution Beta and TW generated, respectively, the highest and lowest PPPD.
• When the parameter to be estimated had central values (that is, around 0.5) in its domain and the sum of the extremes of the interval evaluated by the specialist coincided with the size of the hypothetical sample considered (x (m),1 + x (m),1 = m), the components of each of the following vectors were equal: the probability of the specialist (α 1 = α 2 ) and maximum prior predictive probabilities, these last amounts appear by pairs of rows from the seventh to the tenth column of Table 3.

Simulation Study
Consider from the previous section the probabilities for item (b) fixed for each scenario of the parameter in Table 3. Figure 3 presents the form of the prior densities elicited and Table 5 the respective hyperparameters, descriptive measures and case from which the specialist's information was obtained. For example, in the scenario where θ takes a value in the middle of its domain (θ = 0.5), based on the prior Beta, the case was considered where the specialist was evaluated based on a hypothetical sample of m = 50 (Case 5). With the Kumaraswamy and TG the hypothetical sample was m = 10 elements (Case 6); whereas with the TW, no solution to system 16 was found in any of the cases.
Assuming that the specialist agrees with the probability contained in the interval [θ 1 , θ 2 ], which was obtained from the prior distribution proposed, It was developed one simulation procedure. Bernoulli samples (s = 1, . . . , 10000) were generated assuming different θ and sample size n k (simulation's scenarios). In each of these scenarios, the probability of success was estimated. The mean, maximum, minimum and MSE (Mean Square Error) of the estimates were computed. The procedure was carried out 56 times, because four prior distributions, four values of θ and four sample sizes n k = {20, 50, 125, 312} were considered. The sample sizes were associated with the first four elements of a geometric serie, whose common ratio was 2.5 and first term, 20. The algebraic expressions of the mean and posterior variance were obtained by approximating the r−th posterior moment using the Laplace Method. See Appendix A.  The results are shown in Figure 4, which is complemented in Appendix B. The precision-defined as the amplitude of the interval formed by the minimum and maximum values of the estimates-was greater in the first, third and fourth scenarios of the subjective approach when considering the Beta, Kumaraswamy and TG distributions obtained by the proposed elicitation method rather than from the classical approach. In the second scenario, the Beta distribution was the only one that generated considerably greater precision than the classical approach and other prior distributions obtained. This was due to the fact that its variabilities were greater, and the probability contained in the specialist's interval was less than 12.6%. The same was true for the reason why the prior TW had poor precision and accuracy (average of estimates) in the first and third scenarios. As expected, in all scenarios and for both approaches, the increase in sample size generated better precision and accuracy. The MSE was calculated and represented with a symbol on each vertical line of Figure 4 in order to compare the dispersion of the estimates generated by the prior distribution obtained and the classical approach.
The Beta distribution obtained the lowest MSE in all scenarios, followed by Kumarswamy and TG in the first and third scenarios, respectively (Table 5).

Aplication: Prevalences in an Indogenous Community
An epidemiological study to estimate the prevalence of three sexually transmitted infections in a Colombian indigenous community was conducted by the University of Antioquia in the eastern region of Colombian. The study enrolled 295 individuals and three HIV infection and eight Syphilis infection cases were observed. The objective of this study was to estimate the prevalence of each disease. This data were used before by Tovar (2012) to obtain the hyperparameter values for a prior Beta distribution using the fixed interval and the Chebyshev inequality. For HIV infection, the author used information presented by the Ministerio de la Protección Social (2010) supported by the criterion of a specialist in the subject, who indicated that the prevalence of this infection should not exceed 1%. According to the Ministerio de la Protección Social, the prevalence of HIV in the population with age between 14 and 49 years was 0.22% and for the entire Colombian population was 0.59%. To illustrate the proposed method, we expressed the prevalences as quotients generating a range of quantiles (0.0022 = 2.2/1000 ≈ 2/1000) and (0.0059 = 5.9/1000 ≈ 6/1000) for the number of individuals infected with HIV in a hypothetical sample of 1000. With this information, it is possible to define a system of non-linear equations whose roots are the hyperparameter values of a selected prior distribution.
For the infection of Syphilis, the specialist provided an interval of values for the prevalence (0.01, 0.03) and the system of non-linear equations for this case was: We did not obtain prior distribution for HIV infection, because the prior interval obtained for this infection was in the limits of the parametric space (very close to zero) and the amplitude was too small, which made difficult to find a region D from the plane ab containing at least one point of intersection between the surfaces of the system (17) and the plane ab. For the Syphilis case, we obtained the prior distributions. In Table 6 the characteristics of these prior distributions are showed and the obtained hyperparameter values are compared with those obtained using the method proposed by Tovar (2012). The Beta prior distribution obtained with the proposed method showed the highest density and the highest contained probability within the interval suggested by the specialist. See the upper part of Figure 5. The posterior densities were obtained with the help of the LM. The posterior densities and the binomial likelihood of the observed data set are showed in Figure 5.  In the infection by Syphilis, we obtained the estimates of the prevalence parameter, using the maximum likelihood and the Bayes estimates, assuming as prior distributions those obtained applying the proposed method and a Beta prior distribution obtained using the Tovar (2012) method (Beta*). Some important indexes as standard deviation, the prior predictive probability and the Bayes Factor to select among prior models and 95% credibility intervals were computed too. See Table 7.

Discussion
The proposed method of elicitation in the same way that those developed by Chaloner & Duncan (1983), Tovar (2012), Sindhu et al. (2013) and Flórez & Correa (2015), to assume any prior distribution that has the same domain of the parameter of interest and that is adequate for expressing the specialist's information. In the model this fact makes possible to reflect the natural behavior of the random quantity, θ, complying with the coherence principle; otherwise it will not be possible to obtain the values of the hyperparameters. This result allow us to conclude that Truncated Weibull is not a good choice for a prior distribution when we have a Binomial likelihood. An interesting feature of the method is that the resulting prior distribution elicited is not unique. Therefore the specialist can choose from a wide variety of prior distribution.
The proposed method is similar that developed by Penha (2014). She used the Laplace Method to obtain the hyperparameters of a bivariate Gamma distribution used as prior for a Weibull likelihood. She considers only four observations (one for parameter) but our approach works with a sample of n observations using a nonlinear system equations. This fact improves the error of approximation O(n −1 ). The initial point for the Newton-Raphson method was identified through the contour lines associated with the surfaces of the nonlinear equations system (5). Questions were asked using hypothetical samples to evaluate information from the specialist, who was allowed to select the prior distribution based on the probability contained in the evaluated interval and the asymmetry of the density obtained. Several scenarios for the specialist's beliefs were considered. This made it possible to determine the behavior of the precision and accuracy of the posterior estimates generated by each family of prior distributions.
The elicitation methods described by Flórez & Correa (2015) used only the prior distribution characteristics involved; i.e., its moments, quantils and probabilities. This allows simulation procedures such as those used by Moala & Penha (2016), which are adequate to evaluate the effectiveness of the foregoing methods but not for the one developed herein. This is due to the fact that the method used by Moala considers the known hyperparameters of the prior distribution that accurately reflects the specialist's knowledge. Then the aforementioned elicitation methods were used, assuming that the evaluated mean and probabilities by the specialist have known errors. As expected, the elicitation methods turn out to be quite precise. Moala reports three specific methods. While the proposed method uses information regarding the likelihood function and the prior distribution considered representative of the specialist's knowledge.

Conclusions
Two necessary, but not sufficient conditions were found for the existence of a solution point of the system of nonlinear equations (5). The first is that according to the prior distribution chosen and the hypothetical sample used to evaluate the specialist's information, the probabilities α j assigned by the expert must not exceed the prior predictive pseudo-probabilities that would be obtained using the prior chosen. If this occurs, the surfaces of the system move away from the ab plane and prevent finding a solution point; that is, a vector of hyperparameters that represents the specialist's behavior. The second condition is that the interval proposed by the specialist should not be too width, because the surfaces of the system will be too far apart and could not be intercepted on plane ab.
Of the four prior distributions considered, the TW was the one that had the lowest prior predictive pseudo-probability, which prevented obtaining values for the hyperparameters in two of the scenarios and consequently, posterior estimates. In general there was no greater difference between the averages of the posterior estimates generated from the prior distributions obtained for each scenario.
With the proposed methodology is possible to obtain a region D in the plane ab where the surfaces of the system of equations is defined using the specialist's information. The obtained level curves are plotted on D and with the Newton-Raphson's method to obtain a vector ϕ * that represents the additional beliefs about the generating process.
For the case applied to the Colombian indigenous community the prior distribution for the prevalence of HIV infection was not obtained due to the computational cost of finding a D region that contain roots for the system of equations. One way to address this problem is to find a maximum point of each surface and build the region D around these points. Appro.
Using the approximated of the r-th posterior moment with r = 1, the approximation of the posterior mean is obtained with c 1 1 = x + a and θ N as the root of −nh ′ 1 (θ).
Appro. It is enough know the approximation of the second posterior moment. Table A3: Approximation of distribution and r-th posterior moment, considering Bernoulli trials and Truncated Gamma prior distribution.
Using the approximated of the r-th posterior moment with r = 1, the approximation of the posterior mean is obtained with c 1 1 = x+a, c 1 3 = n+a+b y θ 1 N as the root of −nh ′ 1 (θ).
Appro. It is enough know the approximation of the second posterior moment.
k pt,s are respectively the s-th estimation of the mean and variance posterior; the prior predictive probability for the s-th sample of size n k is denoted as I k s ; the s-th maximum likelihood estimation for sample n k is denoted as θ k M,s .