In multivariate time series analysis, the second-order behavior of a multivariate time series is studied by means of the autocovariance matrices in the time domain, or the Fourier spectral density matrix in the frequency domain. A nondegenerate Fourier spectrum is necessarily a curve (or surface) of Hermitian positive definite (HPD) matrices, and one generally constrains a spectral estimator to preserve this property. This in order to ensure interpretability of the estimator as a covariance or spectral density matrix, but also to avoid computational issues in e.g., simulation or bootstrapping.
The package pdSpecEst
contains several useful functions
to generate vector-valued time series observations and perform standard
nonparametric Fourier spectral matrix estimation. First, the function
rExamples1D()
generates stationary vector-valued time
series observations based on a Cramér representation in terms of the
transfer function of several example target spectral matrices ((Brillinger 1981)). The orthogonal increment
process is generated by means of independent complex normal random
variates. Another function that can be used to generate stationary
vector-valued time series observations is rARMA()
, which
generates observations from a vARMA (vector
autoregressive-moving-average) process.
library(pdSpecEst)
## Generate 3-dim. time series with 'bumps' spectrum
set.seed(123)
n <- 2^10
bumps <- rExamples1D(n, example = "bumps", return.ts = T, noise = "periodogram")
str(bumps, digits.d = 1)
#> List of 3
#> $ f : cplx [1:3, 1:3, 1:1024] 9e+00+0e+00i 2e-15+2e-14i -1e-14+1e-14i ...
#> $ P : cplx [1:3, 1:3, 1:1024] 10+0i -7+4i -7+2i ...
#> $ ts: cplx [1:2048, 1:3] -9+3i 11+15i 5-19i ...
If we set return.ts = T
, the function
rExamples1D()
outputs a list with three components: (i) the
(d × d)-dimensional
target HPD spectral matrix f; (ii) a multitaper HPD periodogram of the
generated time series, by default calculated with B = d DPSS (discrete
prolate spheroidal sequence) taper functions; and (iii) the generated
time series observations, with the d columns corresponding to the
components of the time series. Figure 1 displays the true generating
spectrum and a noisy multitaper HPD spectral estimator, with B = 3 DPSS tapers and time-bandwidth
parameter nw = 3, based on
a generated time series of length T = 1024. In general, to compute a
raw or smoothed periodogram of a stationary vector-valued time series,
one can use the function pdPgram()
. By specifying the
argument B, the function
directly computes a multitaper spectral estimator, by default using DPSS
tapering functions, with time-bandwidth parameter nw = 3.
## Multitaper spectral estimator with `pdPgram`
f.hat <- pdPgram(bumps$ts, B = 75); str(f.hat, digits.d = 1)
#> List of 2
#> $ freq: num [1:1024] 0 0.003 0.006 0.009 0.012 ...
#> $ P : cplx [1:3, 1:3, 1:1024] 14+0i 5+1i -6+0.2i ...
Figure 2 displays the multitaper HPD spectral estimator based on the
same generated time series of length T = 1024 using B = 75 DPSS tapers and
time-bandwidth parameter nw = 3. Observe
that a standard multitaper spectral estimator has difficulties adapting
to the varying degrees of smoothness in the target spectrum as the
estimation procedure is based on a single global smoothness parameter
B, i.e., the number of
tapers.
Although standard nonparametric spectral estimation methods, such as multitapering, are suitable to estimate smooth Fourier spectra with globally smooth component curves across frequency in each individual matrix component, they are not able to capture localized features, such as local peaks in the spectral matrix at certain frequencies or frequency bands, or varying degrees of smoothness across components of the spectral matrix. (Chau and Sachs 2019) develops intrinsic wavelet transforms for curves or surfaces in the space of HPD matrices, with applications to denoising, dimension reduction or compression, and more. The intrinsic wavelet transforms are computationally fast and denoising through nonlinear wavelet shrinkage captures localized features, such as peaks or throughs, in the matrix-valued curves or surfaces, always guaranteeing an estimate in the space of HPD matrices. Moreover, and in contrast to existing approaches, wavelet-based spectral estimation in the space of HPD matrices equipped with a specific invariant Riemannian metric is equivariant under a change or basis (i.e., change of coordinate system) of the given time series. For more details, see (Chau and Sachs 2019) or (Chau 2018).
pdSpecEst1D()
In this section, we demonstrate how to use the pdSpecEst
package to denoise curves of HPD matrices corrupted by noise via linear
or nonlinear thresholding of the intrinsic wavelet coefficients. In
particular, we consider the application to spectral matrix estimation of
sta- tionary time series through wavelet-based denoising of HPD
periodogram matrices. Consider again the generated time series
bumps$ts
with generating spectral matrix
bumps$f
and noisy HPD periodogram matrix
bumps$P
as displayed in Figure 1. The function
WavTransf1D()
transforms the generating spectral matrix or
the noisy HPD periodograms (i.e., curves of HPD matrices) to the
intrinsic AI (average-interpolation) wavelet domain.
## Forward intrinsic AI wavelet transform
wt.f <- WavTransf1D(bumps$f, periodic = T)
wt.per <- WavTransf1D(bumps$P, periodic = T)
By default, the order of the intrinsic AI subdivision scheme is
order = 5
, and the space of HPD matrices is equipped with:
1. the affine-invariant Riemannian metric as detailed in e.g., [Bhatia (2009)][Chapter 6] or (Pennec, Fillard, and Ayache 2006), i.e.,
metric = "Riemannian"
. Instead, the metric can also be
specified to be:
metric = "logEuclidean"
);metric = "Cholesky"
);metric = "Euclidean"
);metric = "rootEuclidean"
).The default choice of metric (affine-invariant Riemannian) satisfies
several useful properties not shared by the other metrics, see (Chau and Sachs 2019) or (Chau 2018) for more details. Note that this
comes at the cost of increased computation time in comparison to one of
the other metrics, (due to the frequent use of matrix square roots,
logarithms and exponentials). If the average-interpolation order
satisfies order <= 9
, the predicted midpoints in the
intrinsic AI subdivision scheme are calculated efficiently by a weighted
intrinsic average with pre-determined filter weights. If
order > 9
, the midpoint prediction is computationally
more expensive as it relies on intrinsic polynomial interpolation via
Neville’s algorithm with the function pdNeville()
.
In Figures 3 and 4 below, we plot the Frobenius norms of the
matrix-valued whitened AI wavelet coefficients of the true spectral
matrix bumps$f
and the noisy HPD periodograms
bumps$P
across scales 1 ≤ j ≤ J and locations
0 ≤ k ≤ 2j − 1 − 1,
with maximum wavelet scale J = log2(n) = 10.
#> Warning: Use of `longData$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longData$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longData$value` is discouraged.
#> ℹ Use `value` instead.
#> Warning: Use of `longData$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longData$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longData$value` is discouraged.
#> ℹ Use `value` instead.
Given as input the curve of noisy HPD periodograms
bumps$P
, the function pdSpecEst1D()
computes a
HPD wavelet-denoised spectral matrix estimator by applying the following
steps:
WavTransf1D()
,pdCART()
,InvWavTransf1D()
.The complete estimation procedure is described in more detail in (Chau and Sachs 2019) or Chapter 3 of (Chau 2018). By default, the function applies the intrinsic wavelet transform based on the affine-invariant Riemannian metric, and corresponding (asymptotic) bias-correction as in Chapter 3 of (Chau 2018).
f.hat <- pdSpecEst1D(bumps$P, return.D = "D.white"); str(f.hat, max.level = 1)
#> List of 6
#> $ f : cplx [1:3, 1:3, 1:1024] 5.97-1.35e-12i 2.69+6.11i -6.35+2.40i ...
#> $ D :List of 9
#> $ M0 : cplx [1:3, 1:3, 1:5] 11.132+2.08e-16i -0.335-1.86i 0.964-4.98e-01i ...
#> $ tree.weights:List of 8
#> $ D.raw :List of 9
#> $ D.white :List of 9
The function outputs a list with several components, among which the
most important are the denoised HPD matrix curve f
, and the
pyramid of thresholded wavelet coefficients D
. See the
complete package documentation for additional information on the
functions used in the intermediate steps and more details on the
specifics of the function pdSpecEst1D()
.
By default, the noise is removed by tree-structured thresholding of
the wavelet coefficients based on the trace of the whitened coefficients
through minimization of a complexity penalized residual sum of
squares (CPRESS) criterion via the fast tree-pruning algorithm in
(Donoho 1997). The penalty or sparsity
parameter in the optimization procedure is set equal to
alpha
times the universal threshold, where the noise
standard deviation (homogeneous across scales for independent Wishart
matrices) of the traces of the whitened wavelet coefficients is
determined via the median absolute deviation (MAD) of the coefficients
at the finest wavelet scale.
It is also possible to perform linear thresholding of wavelet scales
using pdSpecEst1D()
by setting the argument
alpha = 0
, (i.e., no nonlinear thresholding), and the
argument jmax
to the maximum wavelet scale we wish to keep
in the intrinsic inverse AI wavelet transform. For instance, if
jmax = 5
, the wavelet coefficients at scales j with j ≤ 5 remain untouched, but all
wavelet coefficients at scales j > 5 will be set to zero. By
default, the function pdSpecEst1D()
sets the maximum
nonzero wavelet scale to J − 2, two scales below the finest
wavelet scale J.
In Figure 5, we displaye the Frobenius norms of the Hermitian
matrix-valued whitened AI wavelet coefficients of the denoised HPD
spectral matrix estimator across scale-locations (j, k), and in Figure 6, we
plot the wavelet-denoised HPD spectral estimator obtained from
f.hat$f
together with the generating spectral matrix
bumps$f
in the same fashion as in Figure 2 above. The
spectral matrix estimator captures both the smooth spectral matrix
behavior in the second half of the frequency domain and the localized
peaks in the low-frequency range, while always guaranteeing positive
definiteness of the estimator.
#> Warning: Use of `longData$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longData$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longData$value` is discouraged.
#> ℹ Use `value` instead.
pdSpecClust1D()
The intrinsic AI wavelet transforms can also be used for fast
clustering of multivariate spectral matrices based on their sparse
representations in the wavelet domain. The function
pdSpecClust1D()
performs clustering of HPD spectral
matrices corrupted by noise (e.g., multitaper HPD periodograms) by
combining wavelet thresholding and fuzzy clustering in the intrinsic
wavelet coefficient domain. In particular, the HPD spectral matrices are
assigned to a number different clusters in a probabilistic fashion
according to the following steps:
pdSpecEst1D()
.j = 0
across subjects, see e.g., (Bezdek 1981). Note that the distance function
in the intrinsic c-means algorithm relies on the chosen metric on the
space of HPD matrices.j = 1
up to
j = jmax
. The tuning parameter tau
controls
the weight given to the cluster assignments obtained in the first step
of the clustering algorithm.As an illustrating example, we simulate stationary two-dimensional
time series data for ten different subjects from two vARMA(2,2)
(vector-autoregressive-moving-average) processes with slightly different
coefficient matrices, and thus non-equal spectral matrices, using the
function rARMA()
. Here, the first group of five subjects is
simulated from a first vARMA-process and the second group of five
subjects is simulated from a second slightly different vARMA-process. We
use pdSpecClust1D()
to assign the different subjects to
K = 2
clusters in a probabilistic fashion. Note that the
true clusters are formed by the first group of five subjects and the
last group of five subjects.
## Fix parameter matrices
Phi1 <- array(c(0.5, 0, 0, 0.6, rep(0, 4)), dim = c(2, 2, 2))
Phi2 <- array(c(0.7, 0, 0, 0.4, rep(0, 4)), dim = c(2, 2, 2))
Theta <- array(c(0.5, -0.7, 0.6, 0.8, rep(0, 4)), dim = c(2, 2, 2))
Sigma <- matrix(c(1, 0.71, 0.71, 2), nrow = 2)
## Generate periodogram data for 10 subjects in 2 groups
pgram <- function(Phi) pdPgram(rARMA(2^9, 2, Phi, Theta, Sigma)$X)$P
P <- array(c(replicate(5, pgram(Phi1)), replicate(5, pgram(Phi2))), dim=c(2,2,2^8,10))
pdSpecClust1D(P, K = 2)$cl.prob
#> Cluster1 Cluster2
#> Subject1 0.0005513815 0.99944862
#> Subject2 0.0040373900 0.99596261
#> Subject3 0.0013578573 0.99864214
#> Subject4 0.0013609546 0.99863905
#> Subject5 0.0001770947 0.99982291
#> Subject6 0.9665993050 0.03340070
#> Subject7 0.9870530643 0.01294694
#> Subject8 0.9874425798 0.01255742
#> Subject9 0.9698785211 0.03012148
#> Subject10 0.9452907061 0.05470929
pdSpecEst2D()
Instead of denoising curves of HPD matrices, the
pdSpecEst
package can also be used to denoise surfaces of
HPD matrices corrupted by noise. This is done through linear or
nonlinear shrinkage of wavelet coefficients obtained by means of an
intrinsic 2D AI wavelet transform as explained in Chapter 5 of (Chau 2018). In particular, we focus on
denoising of HPD surfaces of random Wishart matrices, with in mind the
application to time-varying spectral matrix estimation based on wavelet
denoising of time-varying periodogram matrices.
The primary tool to perform intrinsic wavelet denoising of noisy
surfaces of HPD matrices is the function pdSpecEst2D()
. The
functions below are highly similar to the previously demonstrated
functions, where the suffix -1D
in the context of curves of
HPD matrices is replaced by the suffix -2D
in the context
of surfaces of HPD matrices.
rExamples2D()
First, to generate a noisy surface of size n1 × n2
in the space of (d × d)-dimensional HPD
matrices ℙd × d, we use
the function rExamples2D()
, with the argument
noise = 'wishart'
, which generates HPD matrix observations
from a discretized intrinsic signal plus i.i.d. noise model with respect
to the Riemannian metric: Pij = fij1/2Eijfij1/2 ∈ ℙd × d, 1 ≤ i ≤ n1, 1 ≤ i ≤ n2,
with target surface (fij)ij ∈ ℙd × d
and noise (Eij)ij ∈ ℙd × d
generated from a complex Wishart distribution WdC(B, Id/B),
with B degrees of freedom and
Euclidean mean equal to the (d × d)-dimensional
identity matrix Id, such that the
Euclidean mean of Pij
equals fij.
Informally, such random matrix behavior corresponds to the asymptotic
behavior of localized or segmented HPD periodogram observations (e.g.,
obtained with pdPgram2D()
) of a locally stationary time
series, with generating time-varying spectral matrix equal to the target
surface (fij)ij,
see e.g., Example 5.4.1 in (Chau
2018).
## Generate noisy HPD surface
set.seed(17)
d <- 2; B <- 6; n <- c(2^7, 2^7)
smiley <- rExamples2D(n, d, example = "smiley", noise = "wishart", df.wishart = B)
str(smiley)
#> List of 2
#> $ f: cplx [1:2, 1:2, 1:128, 1:128] 0.607+0i 0.478-0.148i 0.478+0.148i ...
#> $ P: cplx [1:2, 1:2, 1:128, 1:128] 0.66+1.39e-17i 0.797-3.93e-01i 0.797+3.93e-01i ...
Figures 7 and 8 display the matrix-logarithms Log(fij) ∈ ℍd × d
of the (true) HPD target surface and the matrix-logarithms of the
generated HPD matrix observations Log(Pij) ∈ ℍd × d
corresponding to the target surface corrupted by Wishart noise, where
ℍd × d
denotes the space of (d × d)-dimensional
Hermitian matrices.
#> Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
#> 3.5.0.
#> ℹ Please use the `legend.position.inside` argument of `theme()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning: Use of `longdata$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longdata$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longdata$value` is discouraged.
#> ℹ Use `value` instead.
#> Warning: Use of `longdata1$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longdata1$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longdata1$value` is discouraged.
#> ℹ Use `value` instead.
#> Warning: Use of `longdata2$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longdata2$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longdata2$value` is discouraged.
#> ℹ Use `value` instead.
#> Warning: Use of `longdata$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longdata$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longdata$value` is discouraged.
#> ℹ Use `value` instead.
#> Warning: Use of `longdata$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longdata$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longdata$value` is discouraged.
#> ℹ Use `value` instead.
#> Warning: Use of `longdata1$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longdata1$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longdata1$value` is discouraged.
#> ℹ Use `value` instead.
#> Warning: Use of `longdata2$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longdata2$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longdata2$value` is discouraged.
#> ℹ Use `value` instead.
#> Warning: Use of `longdata$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longdata$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longdata$value` is discouraged.
#> ℹ Use `value` instead.
The function pdSpecEst2D()
computes a wavelet-denoised
HPD surface estimator by applying the following steps to an initial
noisy HPD surface (obtained with e.g., pdPgram2D()
):
WavTransf2D()
,pdCART()
,InvWavTransf2D()
.For more details on the functions used in the intermediate steps, we
refer to the complete package documentation. Currently, the intrinsic
forward and inverse 2D AI wavelet transforms act only on dyadic grids
using a natural dyadic refinement pyramid as explained in Chapter 5 of
(Chau 2018). By default, the marginal
average-interpolation orders of the forward and inverse 2D wavelet
transform are set to order = c(3, 3)
, and the predicted
midpoints in the 2D AI subdivision scheme are calculated efficiently by
means of local intrinsic weighted averages using pre-determined filter
weights. As in the 1D forward and inverse AI wavelet transforms above,
by default, the intrinsic wavelet transform is computed with respect to
the affine-invariant Riemannian metric, i.e.,
metric = "Riemannian"
, but this can also be one of:
"logEuclidean"
, "Cholesky"
,
"rootEuclidean"
or "Euclidean"
.
f.hat <- pdSpecEst2D(smiley$P, order = c(1, 1), jmax = 6, B = B)
str(f.hat, max.level = 1)
#> List of 5
#> $ f : cplx [1:2, 1:2, 1:128, 1:128] 0.647+6.94e-18i 0.522-1.80e-01i 0.522+1.80e-01i ...
#> $ D :List of 7
#> $ M0 : cplx [1:2, 1:2, 1, 1] 0.312+2.78e-17i 0.145-3.15e-02i 0.145+3.15e-02i ...
#> $ tree.weights:List of 6
#> $ D.raw :List of 7
The function pdSpecEst2D()
outputs a list with several
components, among which the most important are the denoised HPD surface
f
at the given rectangular observation grid, and the
pyramid of thresholded wavelet coefficients D
. See the
function documentation for more information about its specific arguments
and outputs.
By default, shrinkage of the wavelet coefficients in the function
pdSpecEst2D()
is carried out through nonlinear
tree-structured thresholding of entire matrix-valued wavelet
coefficients based on the trace of the whitened coefficients. This is
done in the same way as before through minimization of a CPRESS
criterion via a fast tree-pruning algorithm as in (Donoho 1997). For thresholding of 2D wavelet
coefficients on a non-square time-frequency grid, there is a discrepancy
between the constant noise variance of the traces of the whitened
coefficients at the first |J1 − J2|
coarse scales and the remaining finer scales, where J1 = log2(n1)
and J2 = log2(n2)
with n1 and n2 the (dyadic) number of
observations in each marginal direction of the (n1 × n2)-dimensional
observation grid. To correct for this discrepancy, the variances are
normalized to a unit noise variance across wavelet scales via the
semiparametric method described in Chapter 5 of (Chau 2018). Note that if the observation grid
is square, i.e., n1 = n2,
the variances of the traces of the whitened coefficients are homogeneous
across all wavelet scales as in the 1D context described above. The
penalty parameter λ in the
CPRESS criterion is set equal to the universal threshold rescaled by a
factor α, i.e., $\lambda = \alpha \sqrt{2 \log(N)}$, where
N is the total number of
pooled coefficients, and by default the rescaling factor α is equal to 1,
(alpha = 1
), such that λ equals the universal
threshold.
Analogous to the function pdSpecEst1D()
, linear
thresholding of wavelet scales is performed by specifying the argument
alpha = 0
, (i.e., zero nonlinear threshold), and setting
the argument jmax
to the maximum nonzero wavelet scale to
be kept in the inverse wavelet transform. By default, the function
pdSpecEst2D()
sets the maximum nonzero wavelet scale
jmax
to J − 2,
two scales below the finest wavelet scale J. Figure 9 displays the
matrix-logarithms of the nonlinear wavelet-thresholded HPD surface
f.hat$f
in the same fashion as plotted in Figures 7 and 8
above.
#> Warning: Use of `longdata$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longdata$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longdata$value` is discouraged.
#> ℹ Use `value` instead.
#> Warning: Use of `longdata1$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longdata1$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longdata1$value` is discouraged.
#> ℹ Use `value` instead.
#> Warning: Use of `longdata2$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longdata2$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longdata2$value` is discouraged.
#> ℹ Use `value` instead.
#> Warning: Use of `longdata$Var1` is discouraged.
#> ℹ Use `Var1` instead.
#> Warning: Use of `longdata$Var2` is discouraged.
#> ℹ Use `Var2` instead.
#> Warning: Use of `longdata$value` is discouraged.
#> ℹ Use `value` instead.