Optimal Detection of Bilinear Dependence in Short Panels of Regression Data

In this paper, we propose parametric and nonparametric locally and asymptotically optimal tests for regression models with superdiagonal bilinear time series errors in short panel data (large n, small T ). We establish a local asymptotic normality property– with respect to intercept μ, regression coefficient β, the scale parameter σ of the error, and the parameter b of panel superdiagonal bilinear model (which is the parameter of interest)– for a given density f1 of the error terms. Rank-based versions of optimal parametric tests are provided. This result, which allows, by Hájek’s representation theorem, the construction of locally asymptotically optimal rank-based tests for the null hypothesis b = 0 (absence of panel superdiagonal bilinear model). These tests –at specified innovation densities f1– are optimal (most stringent), but remain valid under any actual underlying density. From contiguity, we obtain the limiting distribution of our test statistics under the null and local sequences of alternatives. The asymptotic relative efficiencies, with respect to the pseudo-Gaussian parametric tests, are derived. A Monte Carlo study confirms the good performance of the proposed tests.


Introduction
Recent evolution in theory and applications has provided very powerful convenient tools for the modelling of time series data, and in the last decades, we have seen a growing interest in nonlinear models. It has been shown that nonlinear time series models gives better approximations than higher-order linear ones simple in modelling nonlinear dynamic systems. One of the approaches to nonlinear time series modelling is the class of bilinear processes, introduced by Granger & Andersen (1978). Assuming (ε t ) is i.i.d. (0, σ 2 ), defines the bilinear process (X t ) of order (p, q; P, Q)-shortly BL (p, q; P, Q). This interest is due to its widespread use in various fields, see for example, Maravall (1983), Rao & Gabr (1984), Weiss (1986). Regardless of theoretical difficulties, the fundamental probabilistic properties have been solved for several particular cases, for example, the stationarity and invertibility have been solved for first-order superdiagonal model by Guegan (1981). Testing problems and there power properties have been treated for null hypothesis of white noise against bilinear dependence in Hallin & Mélard (1988), Saikkonen & Luukkonen (1991), Benghabrit & Hallin (1992), Benghabrit & Hallin (1996) and Guegan & Pham (1992). The statistical problem of estimation of the parameters for some simple models have been considered in Pham & Tran (1981), Grahn (1995), Hristova (2005) and Tan & Wang (2015).
Regression models with correlated errors have been the focus of considerable attention in econometrics and statistics. Various manuscripts treat the problem of correlated errors in regression models in which the errors follow the linear models such as autoregressive (AR), moving average (MA) (e.g., Baltagi & Li, 1995), the mixed autoregressive and moving average (ARMA) models (e.g., Allal & El Melhaoui, 2006), or the nonlinear models such as RCAR, ARCH, fractional ARIMA and bilinear models (e.g., Hwang & Basawa, 1993, Dutta, 1999, Hallin, Taniguchi, Serroukh & Choy, 1999, and Elmezouar, Kadi & Gabr, 2012. Consider the following panel data regression model in Pesaran (2015): where y i,t is the observation on the i th cross-sectional unit for the t th time period, x i,t denotes the K × 1 vector of observations on the non-stochastic regressors.
(µ, β ′ ) ′ ∈ R K+1 is the corresponding regression coefficients. Here, the error terms e i,t are assumed to follow a simple case of bilinear model with panel data, which takes the following form where ε i,t ∼ i.i.d.(0, σ 2 ) for all i and t.
Test of homogeneity for panel bilinear time series model have been treated in Lee, Kim, & Lee (2013) and Kim (2014). Furthermore, probabilistic properties such as stationarity and invertibility have been studied in Quinn (1982) remains valid in panel bilinear model (2). Denote by F i,t (ε) and F i,t (e) the σ−algebras generated by {ε i,s |s t} and {e i,s |s t}, respectively. Then, 1. Equation (2) admits a unique stationary solution e i,t if and only if b 2 σ 2 < 1, and given by (2) is invertible if and only if 2b 2 σ 2 < 1, in this case, one can write

Equation
Clearly, model (1) reduces to the classical multiple regression model with constant coefficients µ and β if and only if b = 0. The detection problem we are addressing consists of testing the null hypothesis H 0 : b = 0 with unspecified µ, β, σ 2 and f 1 against the alternative H 1 : b = 0. Clearly this testing problem corresponds to testing serial independence against bilinear serial dependence in model (1).
Our statistical tests are based on the ULAN property. These tests are shown to be asymptotically efficient and their asymptotic power is also derived.
ULAN plays a fundamental role in this treatment and leads us to construct locally and asymptotically optimal parametric tests. The special case of the pseudo-Gaussian tests (optimal under Gaussian densities but valid under finitevariance non-Gaussian ones) is derived, but unfortunately, their local asymptotic power under non-Gaussian g 1 (especially the skew and heavy-tailed ones), can be extremely poor, which leads us to construct a rank-based optimal tests (van der Waerden, Wilcoxon, Laplace, data-driven scores, etc.) based on the Hájek-Le Cam approach.
Asymptotic relative efficiencies with respect to the pseudo-Gaussian procedure show that the van der Waerden version of our rank-based tests uniformly dominates its pseudo-Gaussian counterpart.
The paper is organized as follows. In Section 2 we introduce notations, assumptions and state the ULAN property for model (1)-(2). Section 3 is devoted to prove a local asymptotic linearity property. These results are used in the derivation of locally asymptotically optimal (most stringent) tests, and in the computation of their asymptotic powers. The particular case of the pseudo-Gaussian tests is investigated in Section 3.2. Optimal rank tests are derived in Section 4 and some special cases (van der Waerden, Wilcoxon and Laplace scores) are considered in Section 4.3. Section 5.1 provides asymptotic relative efficiencies, and simulations are carried out in Section 5.2 to investigate the finite-sample performance of our tests. Finally, we provide some conclusions.

Notations and Main Assumptions
Denote by P (n) µ,β ′ ,σ 2 ,b;f1 the probability distribution of the observations (1) and (2), and by P (n) µ,β ′ ,σ 2 ,0;f1 the probability distribution under the null hypothesis where f 1 belongs to some adequate class of standardized densities (3). We suppose that the vector of starting values e (n) Throughout this paper, we consider the class of standardized densities Note that, for f such that f 1 ∈ F 0 , the median and median absolute deviation are 0 and σ, respectively. This standardization, contrary to the usual one based on the mean and the standard deviation, avoids all moment assumptions; it plays the role of an identification constraint, and has no impact on subsequent results.
The main technical tool in our derivation of optimal tests is the uniform local asymptotic normality, with respect to (µ, β ′ , σ 2 , b) ′ , at (µ, β ′ , σ 2 , 0) ′ , of the families of distributions Establishing ULAN property requires some technical assumptions about the innovation density f 1 (Assumption (A)) and the asymptotic behavior of the regressors (Assumption (B)).

Assumption (A)
Denote by F A the set of all densities satisfying Assumption (A).
the following assumption concern the asymptotic behavior of regression coefficients, it is standard in the context of rank-based inference. Interesting special cases are (i) The Student distributions (with ν > 2 degrees of freedom), with standardized density
(ii) The Gaussian distributions, with standardized density (with mean zero and variance 1/a) with I(f 1 ) = a 0.4549 and J(f 1 ) = 3; these values also can be obtained by taking limits, as ν → ∞, of the corresponding Student values since a ν → a as ν → ∞.

Uniform Local Asymptotic Normality
In this section, we shall state the uniform local asymptotic normality property for the model (1), with respect to intercept µ, regression coefficient β, scale parameter σ 2 and the parameter of interest b, for fixed density f 1 ∈ F A , the reader is referred to Le Cam & Yang (2000).
′ be a sequence of real vectors in R K+3 such that τ (n) ′ τ (n) is uniformly bounded as n → ∞ and let θ := (µ, β ′ , σ 2 , b = 0) ′ . In addition, we consider sequences of local alternatives of the form θ + ν (n) The test is equivalent to Define the standardized residuals as for i = 1, 2, . . . , n; t = 1, 2, . . . , T and note that, under the null hypothesis, it coincides with ε i,t /σ. We have then the following result. and More precisely, for any and ∆ (n) f1 (θ (n) ) converges in distribution to a (K + 3) 2 -variate normal distribution with mean zero and covariance matrix Γ f1 (θ).

Proof. See appendix.
From this result, we have under P Consequently, since the hypotheses P (n)

Locally Asymptotically Optimal Tests
In this section, we are interested in testing the null hypothesis b = 0 of randomness of the error regression model in (1), with unspecified error density f 1 ∈ F 0 , unspecified µ, β and unspecified error scale σ-formally can be written as Parametric alternatives takes the form (for some fixed standardized The parameters µ, β and σ 2 thus are nuisance parameters, while b is the parameter of interest. Before turning to this semiparametric hypothesis H (n) 0 (unspecified density), let us first investigate the parametric problem of testing H

Optimal Parametric Tests
In this subsection, we construct a locally asymptotically optimal (namely, most stringent) tests in presence of nuisance parameters for testing serial independence in model (1). The notion of most stringency is a concept of optimality (see e.g., Wald (1943)). We suppose that the innovation density f 1 is specified, the main consequence of the ULAN results of Proposition 1 is that for each θ, and for given The classical theory of hypothesis testing in Gaussian shifts (see Section 11.9 of Le Cam, 1986) provides the general form for locally asymptotically most stringent tests of hypotheses in ULAN models. In this case, the null hypothesis H where M (Ω) is the linear subspace of dimension K + 2 of R K+3 generated by the matrix Ω ′ := (I K+2 , 0). Such tests, should be based on As θ remains unspecified under the null, we will need to replace it with some estimate. For this purpose, we assume the existence of θ := θ n satisfying the following assumption Assumption (C). The estimate θ n is such that (ii) θ n is locally asymptotically discrete, i.e., for all fixed value s > 0, the number of possible values of θ n in Note that the condition (i) on the rate of convergence in probability of the estimates is satisfied by several estimates such as the maximum likelihood estimates, the Yule-Walker estimates, the M-estimates and the least square estimates; part (ii) has little practical implications.
The following proposition shows that substituting θ n for θ does not influence the asymptotic behavior of the test statistic (11).

Proposition 2 (Asymptotic linearity). Suppose that Assumptions (A),(B) and (C) hold. Let θ n be a deterministic sequence satisfying n
Proof . See appendix.
The following proposition then results from classical results on ULAN families (see, Le Cam, 1986, chapter 11).

Proposition 3. Suppose that Assumptions (A), (B) and (C) hold. Then,
θ;f1 , and asymptotically noncentral chi-square, still with 1 degrees of freedom but with noncentrality parameter λ f1 : ) denotes the noncentral chi-square distribution function with one degree of freedom and non centrality parameter λ f1 .
Proof . See appendix.
The Gaussian tests Q (n) N (θ) unfortunately are valid under normal densities only, i.e., needs f 1 to be indicated as a standardized Gaussian one, then the parameter a also has to be fixed. In the following section, we demonstrate that a proper version-namely, pseudo-Gaussian test, that is, tests that are valid under a broad class of non-Gaussian densities with finite variance, while remaining optimal under Gaussian ones-in general, are preferable.

Pseudo-Gaussian Test
Herein, we construct a pseudo-Gaussian version of the Gaussian test Q N ;4 (θ) allows us to construct asymptotically optimal tests under f 1 = f N , hence for efficient detection of bilinear dependence in the parametric Gaussian model characterized by Gaussian disturbances. Extending the validity of the Gaussian optimal test to densities g 1 in a broad class of densities is of course highly desirable. Let us show that this is indeed possible and that a slight modification, ∆ * (n) N ;4 , say, of the efficient central sequence ∆ (n) N ;4 leads to a pseudo-Gaussian test which remains valid when the actual density On the other hand, it is easy to see that, still under P (n) θ+ν (n) τ (n) ;g1 , ∆ * (n) N ;4 (θ) and the log-likelihood Λ (n) θ+ν (n) τ (n) /θ;g1 are jointly binormal; the desired result then follows from a routine application of Le Cam's third lemma. Since the intercept µ, the regression coefficients β, and the scale parameter σ 2 under the null hypothesis remain unspecified, some care has to be taken with the asymptotic impact of estimating µ, β, and σ 2 under unspecified density g 1 .
Define the non-standardized centered residuals A pseudo Gaussian test may then be based on a statistic of the form with g is defined by g(u) = (1/σ)g 1 (u/σ). Clearly Q * (n) N ;g (β) depends only on β, which justifies the notation.
In practice, the pseudo-Gaussian tests will be based on the statistics where β is an arbitrary n 1/2 (K (n) ) −1 -consistent estimator of β and A of all densities g 1 ∈ F A such that σ 2 g1 < ∞. Then under P (n) θ;g1 , and for any bounded sequence τ (n) = τ The following result is immediate from (19). Let Assumption (B) holds, assume that θ n satisfies Assumptions (C) and fix θ ∈ R K+3 , we have Showing that, under P The following result summarizes the asymptotic properties of the pseudo-Gaussian tests.

Proposition 4. Let Assumptions (A), (B) and (C) hold, for any g 1 ∈ F
(2) A . Then, is asymptotically chi-square with 1 degrees of freedom under P (n) µ,β,σ 2 ,0;g1 , and asymptotically noncentral chi-square, still with 1 degrees of freedom but with noncentrality parameter λ N = (T − l)σ 2 g τ 2 4 under P (n) θ+n −1/2 ν (n) τ ;g1 ; (ii) the sequence of tests rejecting the null hypothesis whenever Q † N > χ 2 1,1−α , is locally asymptotically most stringent, at asymptotic probability level α, for H (n) 2 A against alternatives of the form The test statistic Q †(n) N ( β) thus defines a pseudo-Gaussian test, that is, a test which is optimal under Gaussian assumptions but remains valid under a much broader class of densities.

Optimal Rank Tests
General results by Hallin & Werker (2003) indicate that semiparametrically efficient rank-based procedures can be obtained in relation to ranks being maximal invariants under model-generating groups of transformations (G (nT ) , ⋆). More precisely, note that the null hypothesis H

Rank-Based Versions of Central Sequences
A maximal invariant for the group (G (nT ) , ⋆) is known to be the vector n,T (β). Moreover, µ and σ 2 have no impact on residual ranks, hence we can assume that they are specified, which justifies the notation Z General results on semiparametric efficiency indicate that in such context, the expectation (under the null hypothesis) of the central sequence ∆ (n) f1;4 (θ) conditional on those ranks R (n) yields a version of the semiparametrically efficient central sequence (at f 1 and θ) given by: In practice, the conditional expectation definition (22)  (θ) (an exactscore linear rank statistic) is not convenient, and the explicit approximate-score form (for simplicity, we are using the same notation as for the exact-score version) is preferable and given by (the notation ∆ ∼ (n) f1;4 (β, σ) reflects the fact that it only depends on β and σ) The following asymptotic representation result (25) shows that both (22) and (23) yield rank-based version of the central sequence ∆ (n) f1;4 (θ).

Optimal Rank Tests
The parameters µ, β and σ 2 remain unspecified under the null, since β has only an influence on the ranks, a consistent estimator β := β (n) has to be substituted for the actual β value, yielding aligned ranks R (n) i,t ( β (n) ). The effect of this alignment procedure is taken care of in a similar way as in Section 3, via the asymptotic linearity results of Propositions 6 and 7 below. Proposition 6. Let Assumption (B) holds and fix µ ∈ R, β ∈ R K , σ 2 > 0, f 1 and g 1 ∈ F A . Then, for any bounded sequence τ The following proposition then is an immediate corollary of Proposition 6 and Lemma 4.4 in Kreiss (1987).
Proposition 7. Let Assumption (B) holds, assume that β satisfies Assumption (C) and fix µ ∈ R, β ∈ R K , σ 2 > 0, f 1 and g 1 ∈ F A . Then, under P Local asymptotic optimality with density f 1 is achieved by the test based on More precisely, we have the following result.

Important Particular Cases
The statistic Q ∼ (n) f1 ( β) is providing a general form for the optimal rank tests of the null hypothesis of serial independence of model (1). The three most important particular cases for the test statistic presented are the van der Waerden (normal scores), Wilcoxon (logistic scores) and Laplace (double exponential scores) test statistics, which are respectively optimal at normal, logistic and double exponential distributions.

(i) van der Waerden's test statistic is given by
where Ψ is the standard normal distribution function.

(ii) Wilcoxon's test statistic is given by
(iii) Laplace's test statistic is given by where F 1 is the distribution function of the double-exponential It is worth noting that the scale factors a (for van der Waerden), b (for Wilcoxon) and d (for Laplace) disappear in the final expression of the test statistics, due to the exact standardization by s De respectively. This confirms that the choice of the median of absolute deviations as a scale parameter in the definition of F 0 has no impact on the results.

Asymptotic Relative Efficiencies
The Asymptotic Relative Efficiencies (AREs) of the rank-based tests Q with respect to the pseudo-Gaussian tests Q †(n) N directly follow as ratios of noncentrality parameters under local alternatives. In order to compare the performance of the parametric and nonparametric tests presented, we calculate the AREs of nonparametric tests compared to the pseudo-Gaussian tests.
The results obtained are satisfactory and all are good, particularly so under heavy tails. Also, note that the AREs of the proposed van der Waerden tests with respect to the parametric Gaussian tests are uniformly larger than or equal to one for all distributions considered in Table 1, and are equal to one in the Gaussian case only (this result is proved in Chernoff & Savage (1958)), which means that rank-based tests are asymptotically more powerful than Gaussian tests. Note also that each value is maximum in its corresponding column. Thus, at each of the densities, nonparametric tests perform better, compared to pseudo-Gaussian tests, among the efficiencies achieved by the van der Waerden, Wilcoxon and Laplace tests.

Results of Monte Carlo Simulations
In this section, we conduct a Monte Carlo experiment to investigate the finite sample performance of the proposed procedures and behavior of our rank tests under a variety of error distributions. More precisely, we considered the model with ⋆ i = 1, 2, . . . , 100 and t = 1, 2, . . . , 14 2 , In order to examine the finite sample performances of the proposed procedures, we generated 2500 replications independent samples of size N = n(T − 4) = 1000 from (32). For each replication, we performed the following tests at the asymptotic level α = 5%, the pseudo-Gaussian test based on Q †(n) N in (18) A data-driven choice of the reference density adapting, for instance, to f 's actual skewness and kurtosis. Hallin & Mehta (2015) propose selecting the reference density f by fitting a skew-t distribution (see Azzalini & Capitanio, 2003) with location zero, scale one, and density f δ,ν (z) = 2t ν (z)T ν+1 δz ν + 1 ν + z 2 1/2 , where δ ∈ R is a skewness parameter, ν > 0 is the number of degrees of freedom governing the tails, t ν and T ν+1 are the density distribution and cumulative distribution functions of the Student-t distributions with ν and ν + 1 degrees of freedom, respectively. Estimatorsδ andν are obtained from the residuals Z (n) i,t using a maximum likelihood method (namely, maximizing a skew-t likelihood with respect to (δ, ν)). The f -score functions to be used in the testing procedure then are those associated with the skew-t density fδ ,ν . This approach is also related to the theory of efficient (adaptive) estimation. Additionally, these data-driven scores tests as adaptive tests are valid and asymptotically optimal.
Rejection frequencies are reported in Table 2, they amply confirm the excellent overall performances of our rank-based procedure with data-driven scores. It also appears from the skew normal and skew Student simulations that asymmetry significantly improves the superiority of rank tests over the pseudo-Gaussian one. Table 2: Rejection frequencies (out of 2500 replications), for b = 0 (null hypothesis) and various non-zero values of b (alternative hypotheses), with error density g1 that is Gaussian (N ), logistic (l), double exponential (De), Student (t5), the skew normal (sN (10)) and skew Student t5 (st5 (10)) of the pseudo-Gaussian and rank tests based on van der Waerden, Wilcoxon, Laplace, Student-t5 and data-driven scores.

Conclusions
In the present article, we derive a pseudo-Gaussian and rank-based tests for testing white noise against panel superdiagonal bilinear dependence in a multiple regression model for specified and unspecified error density. Moreover, the pseudo-Gaussian test appears to have quite poor performances under skewed and heavy-tailed distributions, which leads as to consider rank-based tests. These tests are nonparametric and they have better performance in terms of empirical power for van der Waerden, Wilcoxon, Laplace, Student t and data-driven scores.  f1,g1;4 (θ) and the log-likelihood Λ (n) θ+ν (n) τ /θ;g1 are jointly multinormal. Then, the desired result follows from an application of Le Cam's third lemma.