Using Copula Functions to Estimate The AUC for Two Dependent Diagnostic Tests

When performing validation studies on diagnostic classification procedures, one or more biomarkers are typically measured in individuals. Some of these biomarkers may provide better information; moreover, more than one biomarker may be significant and may exhibit dependence between them. This proposal intends to estimate the Area Under the Receiver Operating Characteristic Curve (AUC) for classifying individuals in a screening study. We analyze the dependence between the results of the tests by means of copula-type dependence (using FGM and Gumbel-Barnett copula functions), and studying the respective AUC under this type of dependence. Three different dependence-level values were evaluated for each copula function considered. In most of the reviewed literature, the authors assume a normal model to represent the performance of the biomarkers used for clinical diagnosis. There are situations in which assuming normality is not possible because that model is not suitable for one or both biomarkers. The proposed statistical model does not depend on some distributional assumption for the biomarkers used for diagnosis procedure, and additionally, it is not necessary to observe a strong or moderate linear dependence between them.


Introduction
The problem of estimating performance parameters and the prevalence in studies for validating diagnostic procedures have been associated with three aspects of interest that are approachable through statistical theory: verification bias, lack of identifiability and the presence of dependence between the test results (Tovar 2011). The last problem has been addressed by different methods such as latent variable models and reparametrizations and many authors have assumed a binary dependence structure using a covariance parameter in the estimation model. Nikoloulopoulos (2018) mentions that the composite likelihood is amongst the computational methods used for estimation of the generalized linear mixed model (GLMM) in the context of bivariate meta-analysis of diagnostic test accuracy studies. To synthesize the diagnostic test accuracy studies, a copula mixed model has been proposed in the biostatistics literature. This general model includes the GLMM as a special case and can also allow for flexible dependence modelling, different from assuming simple linear correlation structures, normality and tail independence in the joint tails. Tovar and Achcar (Tovar & Achcar 2012, Tovar & Achcar 2011a, Tovar & Achcar 2013, Tovar & Achcar 2011b) addressed the problem of dependence between diagnostic test results by assuming that the dependence structure between the biological traits (biomarkers), measured on an interval or rational scale, can be modeled using a copula function. These authors assumed weak linear dependencies (FGM copula function) and weak, but not necessarily linear dependencies (Gumbel Barnett copula function) between the results of the biomarkers used as diagnostic tests for their approaches. The authors estimated the performance test parameters and the prevalence, but they did not estimate the area under the receiver operating characteristic (ROC) curve.
To obtain the ROC curve it is necessary to dichotimize the values of the expressions of the biomarkers, establishing a threshold value (cut point), which can be defined using clinical criteria or a statistical methodology. If the cutting point is obtained applying statistical methodology (such as the ROC curve) on the data obtained for the field work, it is necessary to estimate the area under the ROC curve (AUC), in addition to the performance parameters of the test (sensitivity and specificity). The ROC curve is a graph of sensitivity versus 1−specificity for all possible threshold values and it is the most commonly used global index for diagnostic precision. The AUC is also used to choose between two different diagnostic tests. Many authors have studied the statistical properties of the AUC and the methods for estimating them; for example, Faraggi & Reiser (2002) developed and compared some of the processes used for estimating the AUC under parametric and nonparametric assumptions. Zou, O'Malley & Mauri (2007) reviewed and applied the measures of precision used for ROC curves (sensitivity, specificity and AUC) to evaluate diagnostic tests and predictive models.
Relevant works on clinical diagnostic studies agree on the importance of combining the information about the health state contained in different biomarkers used as diagnostic tests because these combinations tend to be more accurate than diagnostic procedures based on single tests (Etzionin, Kooperberg, Pepe, Smith & Gann 2003). Thus a great interest in developing methods to combine multiple tests for disease classification that will result in a deeper and more detailed analysis of the ROC curve (Ma & Huang 2007, Pepe & Thompson 2000, Su & Liu 1993. Generally, the parametric assumptions apply to the distributions of the observed variable in normal and non-normal populations. The maximum likelihood methods to estimate the area under the curve and the relevant parameters under a binormal model assumption have been widely used to estimate this area (DeLong, DeLong & Clarke-Pearson 1988). The normality assumption on the biomarkers or on monotonic transformations of them in both diseased and non diseased populations, in some situations is not true because there exist many biomarkers expressed in a continuous form (Pundir & Amala 2012). Pundir & Amala (2015) consider the use of two continuous biomarkers as clinical diagnostic tests, and they develop a method to estimate AUC under the assumption of correlated tests using a log-normal distribution and the Pearson's correlation coefficient. On the other hand, DeLong et al. (1988) addresses a nonparametric comparison of areas under correlated ROC curves using the theory of generalized U-statistics which takes advantage of the properties of Mann-Whitney statistic to generate an estimated covariance matrix.
The main goal of this work is to estimate the AUC in screening studies to validate procedures for clinical diagnoses, that use two biomarkers expressed in a continuous form. The proposed model assumes a dependence structure between the two biomarkers that can be modeled using copula functions. Given that, both biomarkers are measured in each individual, it is possible that a scatter plot with their data behaves very similar to that observed when the test results are independent between them. We assume that the dependence between diagnostic tests is linear and weak so it can be modeled using an FGM copula, or the dependence structure is weak but not necessarily linear. We then use a Gumbel Barnett copula structure to model it.
This document is organized as follows: Section 2 presents relevant sections on copula functions and ROC curve and AUC for continuous tests, respectively. We present the analysis of the ROC curve and the AUC, considering copula-type dependencies between diagnostic tests, discusses the steps followed to derive the AUC estimate. Section 3 presents the estimates obtained using the proposed method and the results of a simulation study. In addition, a comparison is shown with the estimates obtained with the Pudir and Amala method (Pundir & Amala 2015). Section 4 we provide a practical example with real data on dengue detection. Finally, Section 5 includes a discussion regarding aspects found during the implementation process. Calculations, simulations, adjustments and ROC curve tracing were performed using the statistical software R.

Copula Functions
A copula describes the dependence structure of a multivariate random variable. Using copulas, random variables can be transformed through their cumulative distributions into uniformly distributed variables. The dependence structure is determined by the relationships established between the uniform distributions (Gallardo 2010). The copula functions may use these relationships to link marginal distributions with a joint distribution.
Thus in accordance with Dupuis (2007), a copula is a joint distribution function of random variables with uniform standard distribution as marginals: where U i ∼ U (0, 1), i = 1, . . . , d. Thus copula functions allow the characterization of the dependence structure of a set of random variables independent of the form of the marginal distributions. Random variables with uniform distributions are obtained by applying the probability integral transformation in each of the marginals with distribution F 1 (x), . . . , F d (x) so that U 1 = F 1 (X), . . . , U d = F d (X) (Genest, Quessy & Rémillard 2006). Given a set of random variables X 1 , . . . , X d with a joint probability distribution H and marginal distribution functions F i (x), i = 1, . . . , d, one unique copula function C may be written as C (u 1 , . . . , u d defines the quantile function. However, if C is a copula function and F 1 (X), . . . , F d (X) are arbitrary distribution functions, then function H defined as H(X 1 , . . . , X d ) = C[F 1 (X 1 ), . . . , F d (X d )] is a multivariate distribution function with marginal distribution functions F 1 , . . . , F d .

ROC Curve and AUC for Continuous Tests
The ROC curve is a graphic in which all sensitivity/specificity pairs resulting from the continuous variation of cutoff points (thresholds) can be found in the full range of the observed results. The proportion of true positives (sensitivity) are located on the y-axis, and the proportion of false positives (1−specificity) are located on the x-axis (Burgueño, García & Gonzáles 1995). Specifically, the use of a threshold (cutoff point) t defines a binary test from one of the continuos biomarkers Z considered in the diagnostic procedure to be evaluated. If Z ≥ t then, the individual is classified as positive and if Z < t the individual has a negative result (Pepe 2003). Let X and Y be random variables that represent the values of the biomarkers in the nondiseased and diseased groups. Let (X 1 , X 2 , . . . , X d ) be d vectors of values that take related biomarkers measured in individuals among the nondiseased group, and let (Y 1 , Y 2 , . . . , Y d ) be the set of vectors associated to the diseased group, where d = 1, 2, . . . , k is the number of tests. The corresponding rates of the true and false positives for threshold t are T P R(t) and F P R(t), for the diagnostic procedure are given by: with D being a dichotomous variable representing the true state of the individual; that is, D = 1 for a diseased individual and D = 0 for a nondiseased individual, and t corresponds to the threshold vector t = (t 1 , t 2 , . . . , t d ), t i ∈ (−∞, ∞). Thus, the ROC curve is the complete set of possible fractions of true and false positives found using the dichotomization of X and Y with different thresholds:

ROC(·) = {(F P R(t), T P R(t))} (4)
A perfect diagnostic test accurately separates diseased subjects from nondiseased subjects. For a given threshold t, we must have T P R(t) = 1 and F P R(t) = 0 so that the ROC curve is formed over the entire left portion of the positive quadrant.
The AUC is a global measure of accuracy for a diagnostic test and is thus the most commonly used summary index for the ROC curve. The AUC is shown to be the probability of correctly classifying a pair of individuals, selected from the population at random, as healthy or sick, using the results obtained after applying the diagnostic test (Burgueño et al. 1995, Sumi & Hossain 2012. A perfect test with a perfect ROC curve has a value of AU C = 1.0. Likewise, an uninformative test with ROC(t) = t has an AU C = 0.5. The majority of tests have values that fall between these two values.

Analysis of a ROC Curve with Copula Dependence and d = 2 Continuos Biomarkers
Sometimes two biomarkers can be associated with the presence of a disease; therefore these biomarkers must be considered in conjunction to classify the subject (Ma & Huang 2007). Let (X, Y ) represent the values of the biomarkers in the nondiseased and diseased groups. Let (X 1 , X 2 ) be two sets of related biomarkers measured in the nondiseased group, and let (Y 1 , Y 2 ) be two related biomarkers taken from the diseased group. Thus (X 1 , X 2 ) and (Y 1 , Y 2 ) are independent pairs of bivariate biomarkers in each group of individuals; and a subject is identified as diseased when the values of Y 1 and Y 2 are sufficiently large (greater than a given threshold or cutoff point) (Wang & Li 2012). The cumulative distribution functions for the random variables that define the biomarker results are defined as F Y (t 1 , t 2 ) = P (Y 1 ≤ t 1 , Y 2 ≤ t 2 ) and F X (t 1 , t 2 ) = P (X 1 ≤ t 1 , X 2 ≤ t 2 ), respectively, where t 1 and t 2 correspond to the cutoff points for each test. The method develops an iterative procedure taking all the possible permutations between both biomarkres within each group and for each permutation it evaluates the individual's health condition and classifies it as positive or negative. For each pairs t 1 , t 2 its possible to obtain a 2 × 2 table, with the results showed in Table 1. Table 1: Final classification obtained after to apply the diagnostic procedure. .

Positive Negative
True For the construction of the ROC curve, the false positive rate (FPR) and true positive rate (TPR) can be defined as P (X 1 > t 1 , X 2 > t 2 ) and P (Y 1 > t 1 , Y 2 > t 2 ), respectively, according to this bivariate criterion. We assume that dependence between results of biomarkers can be modeled using copula functions; and that it is possible to estimate the AUC including that fact in the estimation model. Our methodological approach assume two copula functions as candidates for modeling the dependence structure between the biomarkers; the Farlie-Gumbel-Morgenstern (FGM) and the Gumbel-Barnett. The FGM copula function has the following analytical form: where φ is the dependence parameter with ρ = − 1 3 ⇔ φ = −1 and ρ = 1 3 ⇔ φ = 1; ρ is the Pearson correlation coefficient (Nelsen 2006). The Gumbel Barnett copula function has the form: where ϕ is the dependence parameter and ρ = 0 ⇔ ϕ = 0, ρ = −0.41 ⇔ ϕ = 1, ρ is the Pearson correlation coefficient (Portilla & Tovar 2018).
For the random variables associated with results of the biomarkers, we have that X i ∼ G Xi (x i ) and Y i ∼ G Yi (y i ) ∀i = 1, 2. Once the distribution G(·) has been determined (Goodness-of-fit tests can be performed to determine the corresponding distribution), we proceed to estimate their respective parameters jointly (see Appendix A). The explicit forms of the T P R and F P R in the bivariate case are analytically complex. The corresponding rates (ratios) of true and false positives for threshold t = (t 1 , t 2 ) are T P R(t) and F P R(t), respectively, with and F P R(t) = P (X 1 > t 1 , X 2 > t 2 ) After applying the probability integral transformation (PIT), the following equations must be true: and F P R(t) = where t * 1 and t * 2 correspond to the cutoff points for each test after we applied the respective PIT.
Given that the cumulative functions G X and G Y can be written in terms of copula functions, the AUC for the bivariate dependent copula ROC curve assumes the following form: where the univariate case, the expression (10) is proportional to the statistics of the traditional non-parametric Mann-Whitney test (Bamber 1975) Then, considering the uniform variables obtained after to apply the PIT using the marginal distributions (or the empirical cumulated distributions), we have: which cannot be expressed in a closed form, its can be estimated using numerical methods, such as the trapezoidal rule or Simpson's rule (Pundir & Amala 2015).
If we assume that the biomarker results have a dependence structure that can be modeled using an FGM copula, we have: } Similarly, if we used an estimation model that assumes a Gumbel-Barnett structure for the dependence between the test results, we have: It is possible to use copula functions to model the structure dependence between two random variables in a statistical procedure developed to estimate the AUC curve, when the marginal distributions are or are not known. If the marginal distributions are not known, it is possible to apply the probability integral transformation on observed data using the respective empirical cumulated distribution (Achcar, Tovar & Moala 2019). The ROC curve and its area can be estimated using the transformed data preserving the dependence structure of the biomarkers. The importance of determining the data distribution, is to be able to estimate θ (copula dependence parameter), where θ is φ or ϕ; that is, make use of the data marginal distributions to use all the information when estimating the dependence parameter, using the expression (A3) in Apendix A (Bouyé, Durrleman, Nikeghbali, Riboulet & Roncalli 2000).
We developed the procedure when the validation study includes two continuos biomarkers as validation tests, but, it is possible to generalize our proposed approximation for cases with d biomarkers, and the analytical form of the AUC is as follows:

Simulation Study
We simulated a validation study that includes the use of two biomarkers with continuous expression as indicators of the disease status and a confirmatory test called gold standard, which classifies the individuals without error. Given that we needed to compare our results with those obtained using methods reported in the literature, we generated pairs of observations of variables distributed with a bivariate normal distribution, using the R package.
To have reference values, we used the clinical values reported for triglycerides and LDL cholesterol by the MedlinePlus web page as motivation (National Institutes of Health and others 2004). Then, within the nondisease individuals, data from Test 1 (triglycerides) were simulated using µ X1 = 164 σ X1 = 24.6, and data from Test 2 (LDL cholesterol) µ X2 = 160 σ X2 = 24 were used. Within the diseased individuals, data from Test 1 were simulated with µ Y1 = 224 σ Y1 = 33.6; while for Test 2, the values used were µ Y2 = 209 σ Y2 = 31.35. The variances were obtained establishing 15% as the coefficient of variation for the normal data. We simulated a validation study carried out using a sample of 10000 individuals in a population with a 10% prevalence (1000 diseased individuals and 9000 nondiseased individuals).
We simulated pairs of data u jk , v jk , j = 1, 2; k = 1, 2, . . . , n j with copula dependence (FGM or Gumbel-Barnett). Three different dependence-level values were evaluated (θ = 0.2, θ = 0.5 and θ = 0.9) for each copula function considered. Next, we transformed each pair considering that Φ −1 (u jk ) = z jk then x jk = z jk σ Xj + µ Xj where Φ is the cumulative probability normal function. In that way, we simulated data with copula dependence structure and normal marginal distributions.
The data with an FGM copula dependence structure were generated using the algorithm proposed by Johnson (1987) as follows: 1. To generate independent v 1 and v 2 values from a Uniform(0, 1) distribution.

Run
3. For a given value of θ, compute A = θ(2u 1 − 1) − 1 and The data set of pairs of data with a Gumbel-Barnett dependence structure were simulated using an algorithm based on the inversion of data from a Gumbel Type I distribution (see Gumbel (1960) for details) as follows: Then, solve the nonlinear equation In that way, pairs (w 1 , w 2 ) from a Gumbel Type I distribution are obtained (Gumbel 1960). Applying the transformation it is possible to obtain a vector of pairs of observations (V 1 , V 2 ) with uniform marginal distributions and dependence that can be modeled using a Gumbel-Barnett copula function.
With the simulated data, and with the aim of showing that the simulated data shows a weak or not necessarily linear dependence, we computed the estimate value of the coefficient ρ using the formulas that appears in Appendix B, depended of the chosen copula function. We also estimate Pearson's correlation coefficient directly from the simulated data, whereρ x is the estimate in the group of nondiseased individuals andρ y is the estimate obtained in the other group. In accordance with the results shown in Table 2, it was possible to conclude that the simulated data complies with the desired characteristics of dependence. The scatter plot of the simulated data for each dependence level in diseased and nondiseased individual groups, are shown in Figures 1 and 2. Given that the AUC does not have a closed form, this indicator cannot be obtained analytically; however, the form may be approximated using simulation techniques such as the bootstrap or Monte Carlo (MC) processes.
The proposed algorithm was run 1000 times as follows: 1. Determine the parametric model (i.e., find the distribution functions for each variable that best fits the data). Goodness-of-fit tests can be performed to determine the corresponding distribution. Thanks to Sklar's theorem (Nelsen 2006) and the integral probability transformation, the method is developed with u i values and copula functions. Once the model is chosen, estimate the parameters related to this model, using maximum likelihood or moments method.

Using the estimates of parameters in
Step 1 as parametric values, generate Markov chain Monte Carlo (MCMC) samples from the copula function (with the respective marginal values) for size m for the nondiseased group and n for the diseased group. Using the generated values, calculate theÂU C using equation (11).
4. For each sample generated in Step 3, maintain the respective samples for each group (i.e., m and n), perform bootstrapping, and calculate the FPR and TPR. For each stage of resampling, calculate the AUC that will serve as the input for calculating the intervals corresponding to the 2.5 and 97.5 percentiles.

Repeat
Step 4 many times (at the last 1000 times).

ROC Curves with Simulated date Assuming FGM Dependence
According to the method proposed in section 2.4, a prevalence of 10% was assumed for this case, which, considering a population of 10000 individuals, determines 9000 non-sick and 1000 sick; this is (m, n) = (9000, 1000). Given the context of this work (clinical diagnostic tests), only the positive part of the FGM copula function should be used (Dendukuri & Joseph 2001, Georgiadis, Johnson, Gardner & Singh 2003. AUC estimation error, bootstrap confidence intervals for AUC were estimated, in addition to the specificity, sensitivity (of the joint test) and cutoff points for each test (with their respective standard error). The performance parameters and cutoff points estimates were obtained using the Youden index (Specificity+Sensitivity−1) (Youden 1950), taking the respective values that maximized that index.
According to Table 3, the effect of the dependence level on the AUC estimates is not very important although it is possible to observe a trend to decrease when dependence increases. The observed values for the AUC estimation errors are small, which is expected because the sample sizes are very large. The AUC values are high and similar to one another, considering the variation in the level of dependence. The interval lengths are very small considering that if the AUC value is between 0 and 1, the maximum length of the interval is 1. In the case of specificity and sensitivity, good values (or at least within those typically expected) are seen for each performance parameter, with lower errors in the estimation. This scenario is very similar to the situation with the AUC estimations. The estimates of the cutoff points show little variability and show the same trend to increase when the dependence increases.
A prevalence of 60% is assumed (unlike the initial proposal of 10%), considering a population of 10000 individuals, 4000 non-sick and 6000 sick are determined; this is (m, n) = (4000, 6000). This to verify that the prevalence does not actually affect the AUC estimate, since the latter depends on the performance parameters (sensitivity and specificity) of the test (Table 4). It's important to consider this situation since it is a similar measure of prevalence to that presented later in the case study (Dengue data). The previous section, the FGM copula dependence simulation was performed for different sample sizes (N = 100, N = 500 and N = 1000), considering a dependence level φ = 0.2, and 10% prevalence (see Table 5). Finally, Table 6 shows the average and standard error of the AUC estimates, interval length, performance parameters and cut-off point of simulated data with φ = 0 in the FGM copula function. The above with a population of 1000 individuals, and 10% prevalence.

ROC Curves With Simulated Date Assuming Gumbel Barnett Dependence
Upon adjusting the model with the Gumbel-Barnett copula function, the results are similar to those observed with the other copula function (FGM) in that the AUC estimates are high with low errors, and are nearly constant. A slight increase in the estimates occurred as the level of dependence increased. For the average interval length, we noted that these intervals became increasingly narrow (their amplitude decreased) as the level of dependence increased. The estimates of the performance test parameters and the AUC showed similar behavior. In the same way as for other copula dependence, the estimates of the cutoff point estimates increased slightly when the dependence level increased (Table 7).

Comparison of Estimation Methods Using the Simulated Data
We estimated the parameters using our methodology and the method proposed by Pundir & Amala (2015). It is important to point out that although both methods estimate the AUC in presence of a dependence structure in the data, Pundir and Amala work under the assumption that both biomarkers can be modeled with random variables under a bivariate normal distribution of probabilities, then the dependence between biomarkers is assumed to be linear and can be expressed in the estimation model using Pearson's correlation coefficient. The proposed method does not consider the marginal distributions, and the dependence between biomarkers is not necessarily linear. Thus we needed to compare the methods, we simulated sets of N 2 (µ, σ, ρ) data and using the procedure in Section 2.4; and we fitted Pundir and Amala's model and ours. For both cases the Bootstrap confidence intervals were obtained; while the performance parameters and cutoff point estimates were obtained using the Youden index (Youden 1950).
According to the results in Table 8, Pundir and Amala's estimation method (AUC2) presents lower estimates than those obtained using the proposed method (AUC1). The AUC2 estimates in nearly all scenarios are outside the confidence intervals estimated for AUC1, except for of AUC2, considering a Gumbel dependence of θ = 0.5, where the estimate is within the given interval. The model considering the Gumbel-Barnett copula function is capable of perceiving nonlinear dependencies and/or negative dependencies, which were considered in the construction of the data used in the estimates (data simulated with Gumbel-Barnett copula-type dependence). This phenomenon may explain the difference between the AUC estimates, given that a linear dependence is assumed for AUC2.
The model considering the FGM copula function notes weak and/or low dependencies. Given that the AUC2 estimates use a linear correlation coefficient to measure the dependence present in the data and the dependencies with which the data are constructed (data simulated with an FGM copula-type dependence) based on this coefficient are very low, these dependencies cannot be clearly perceived. Thus the results are slightly lower than those found for AUC1. On the other hand, the confidence intervals obtained assuming GB dependence are narrower than those observed when we assumed the other dependence structure. The observed values of the estimates of the performance test parameters obtained using the proposed method (subindex 1 in Table 9) were higher than those observed using Pundir and Amala's method (subindex 2 in Table 9). If a GB dependence structure is considered, the estimates of the specificity and cutoff points increase for high values of the dependence level when the estimates are obtained using the proposed method. The estimates of the performance test parameter assuming an FGM dependence structure do not show differences when the dependence level changes (see Table 9).
We obtained the AUC estimated for each biomarker assuming independence between tests and considering the dependence structure. In both cases, the estimated AUC was lower than the estimate of the joint AUC (Table 10). The AUC estimates and their respective 95% confidence intervals were estimated using the R pROC package.

Dengue Data
The proposed method was applied to diagnostic tests for detecting dengue; an acute viral disease transmitted by mosquitoes, characterized by high fevers, headaches, pain in muscles and joints, and skin rash. The data set was obtained from the Colombian network for studying dengue (AEDES). In this study, 1380 individuals with symptoms suggestive of dengue were clinically evaluated by a specialist and the results of a hemogram. For each patient an algorithm was run starting with polymerase chain reaction (RT-PCR) test results; the NS1 antigen and antibodies against dengue (IgG and IgM) were applied as the gold standard tests. The tests consist of a leukocyte count (white blood cells: Test 1) and a platelet count (Test 2) in randomly selected individuals. Of these individuals, 744 were diagnosed as having dengue, and 636 individuals were diagnosed as not sick (discarded for dengue but with symptoms of some other condition).
The density plot of both variables had an asymmetric shape in both the diseased and nondiseased groups of individuals, which can be an indicator of the lack of fit to normal distribution (see Figure 3). The scatter plot and the estimates of the correlation or concordance indexes commonly computed for the biomarkers in both groups of patients, showed the presence of a weak dependence structure (see Figures 4 and 5).
The ROC curves of each test, as well as their respective area under the curve (AUC), are presented in Figure 6. The cutoff points found were t1 = 3590 for test 1 and t2 = 158250 for test 2. The sensitivity and specificity were 0.5362903 and 0.7893082, respectively for test 1. Sensitivity and specificity were 0.6854839 and 0.7374214, respectively for test 2.
Even when we observed an asymmetric form in the distribution of the biomarkers used to diagnose dengue, we decided to assume normal distributions to compare the results between the procedures. We fitted our procedure assuming a gamma(α, β) distribution for each biomarker using the parametrization 1/β = λ. The estimates of the FGM dependence parameter and the parameters for the marginal distributions are found in Table 11.   To evaluate the fit of the dataset to the copula functions, the multiplier method of the goodness of fit (GOF) test was used as introduced by Kojadinovic, Yan & Holmes (2011). This method consists of comparing and validating the distance between the empiric copula function and the copula function under consideration. The FGM copula function shows the better fit. Table 12 shows the AUC estimations, performance parameters and cutoff points obtained using the methods explained in the previous section. The data used for these estimates were the original data (no transformations) and the data after logarithmic transformation. The latter were applied with the goal of making data available on the same scale and trying to derive the greatest differences between the averages for the sick and not sick populations to obtain the highest performance of the ROC curve (Pundir & Amala 2015). We obtained the estimates of the parameters, assuming Normal(µ, σ) and Gamma(α, β) distributions for marginal distribution and an FGM dependence structure.
According to the results shown in Table 12, the results obtained using the Pundir and Amala's method are quite far from what was expected and if this analysis were done, the same would lead to the conclusion that the two biomarkers together, could only identify nondiseased individuals. The proposed method results in a better approximation for the set of parameters to be estimated when we used the transformed data, assuming normal distributions for the marginals (see Table 12). Figure 7 shows the AUC curves based on the nontransformed data and both marginal distributions after we used the proposed method of estimation.

Conclusion and Remarks
Many procedures for the clinical classification of individuals include two biological traits (biomarkers whose natural behavior is modified in the presence of disease), and an error-free test known as the gold standard (which classifies individuals without error). Given that both biomarkers are measured in the same individual, it is necessary to include a dependence structure in the statistical model associated with the situation. It is possible that this dependence structure will not be perceptible using scatter plots of the data or commonly used indexes such as Pearson's rho, Spearma's rho or Kendall's tau. In this paper we studied the situation where we have two biomarkers expressed on a continuous scale and are assumed to have a very weak linear dependence structure or a very weak, but not necessarily linear dependence structure. To model the dependence structure, we used two copula functions: the FGM and the Gumbel Barnett copulas, within iterative procedure that allows to obtain the AUC for joint ROC curve.
It is important to point out that under the bivariate normality assumption, the Pundir and Amala's method works very well; but when the marginal distributions are not normal, this aproach does not permit reliable results; wherever inside that scenario the proposed methodology allowed us to obtain good quality estimates, because the method does not need the marginal distributions, a feature of the copula functions. Then the proposed method performs the estimation procedure using the normalized data obtained after we apply the inverse probability transformation, which eliminates the need to have normally-distributed marginals.
Our simulation study allowed us to see in a general way, the effect of the dependence structure between the biomarkers on the AUC estimates, controlling the marginal distributions effect. The FGM dependency does not really change the AUC estimates much, that is, the dependency effect is very weak. For GB-type dependencies, the effect is more evident and the specificity and AUC estimates are modified slightly.
Given that, the purpose is to estimate the joint AUC for the ROC curve, it is necessary to estimate the dependence parameter using the data set and add the estimate obtained to the algorithm developed to estimate the AUC. In this work we first obtained the maximum likelihood estimate; but other estimates obtained using the moments or bayesian methods could be considered.
It is important to note that when working with two biomarkers, the diagnostic procedure must be developed by joining these tests and observing the joint AUC (probability of properly classifying an individual with both tests at the same time). The proposed method is presented as an aid to this process, since once it detects the dependency between the biomarkers, it works correctly. Furthermore, it is proposed as an alternative to the study of weak or not necessarily linear dependency for this type of case.
It was observed that the biomarkers in the dengue detection study case have a weak but positive correlation, so FGM is a better option than the Gumbel-Barnett copula, since the latter considers weak but negative linear correlations. Generally, as mentioned by several authors (Achcar et al. 2019, Dendukuri & Joseph 2001, Georgiadis et al. 2003, Pundir & Amala 2015, Tovar & Achcar 2011a, the correlation between the biomarkers used for clinical diagnosis shows a positive correlation. However, in terms of the method developed in this work of how to fit a copula-type dependency model, positive and negative correlation scenarios were considered, so the copula functions already presented were chosen.

Appendix B. Relationship Between the Correlation Coefficients and Dependency Copula Parameters
The Spearman's correlation coefficient can be written in terms of the Pearson's correlation coefficient (Kruskal 1958): where ρ s represents the Spearman's correlation coefficient and ρ denotes the Pearson's correlation coefficient. Thus From Nelsen (2006) it follows that for FGM: Similarly (Kruskal 1958): where τ represents the Kendall's correlation coefficient. Thus It follows that for FGM (Nelsen 2006): The relationship showed in the formulas: B1 and B2, is valid only for bivariate normal distributions.