c =
cov (x)
¶c =
cov (x, y)
¶c =
cov (…, opt)
¶c =
cov (…, nanflag)
¶Compute the covariance matrix.
The covariance between two variable vectors A and B is calculated as:
cov (a,b) = 1/(N-1) * SUM_i (a(i) - mean (a)) * (b(i) - mean (b))
where N is the length of the vectors a and b.
If called with one argument, compute cov (x, x)
. If
x is a vector, this is the scalar variance of x. If x is
a matrix, each row of x is treated as an observation, and each column
as a variable, and the (i, j)-th entry of
cov (x)
is the covariance between the i-th and
j-th columns in x. If x has dimensions n x m, the output
c will be a m x m square covariance matrix.
If called with two arguments, compute cov (x, y)
, the
covariance between two random variables x and y. x and
y must have the same number of elements, and will be treated as
vectors with the covariance computed as
cov (x(:), y(:))
. The output will be a 2 x 2
covariance matrix.
The optional argument opt determines the type of normalization to use. Valid values are
Normalize with N-1. This provides the best unbiased estimator of the covariance.
Normalize with N. This provides the second moment around the mean. opt is set to 1 for N = 1.
The optional argument nanflag must appear last in the argument list
and controls how NaN values are handled by cov
. The three valid
values are:
Leave NaN values in x and y. Output will follow the normal rules for handling NaN values in arithmetic operations.
Rows containing NaN values are trimmed from both x and y prior to calculating the covariance. A NaN in one variable will remove that row from both x and y.
Rows containing NaN values are ignored from both x and y independently for each i-th and j-th covariance calculation. This may result in a different number of observations, N, being used to calculated each element of the covariance matrix.
Compatibility Note: Previous versions of cov
treated rows
x and y as multivariate random variables. This version
attempts to maintain full compatibility with MATLAB by treating
x and y as two univariate distributions regardless of shape,
resulting in a 2x2 output matrix. Code relying on Octave’s previous
definition will need to be modified when running this newer version of
cov
. The previous behavior can be obtained by using the
NaN package’s covm
function as covm (x, y, "D")
.
See also: corr.
r =
corr (x)
¶r =
corr (x, y)
¶Compute matrix of correlation coefficients.
If each row of x and y is an observation and each column is
a variable, then the (i, j)-th entry of
corr (x, y)
is the correlation between the
i-th variable in x and the j-th variable in y.
x and y must have the same number of rows (observations).
corr (x,y) = cov (x,y) / (std (x) * std (y))
If called with one argument, compute corr (x, x)
,
the correlation between the columns of x.
See also: cov.
r =
corrcoef (x)
¶r =
corrcoef (x, y)
¶r =
corrcoef (…, param, value, …)
¶[r, p] =
corrcoef (…)
¶[r, p, lci, hci] =
corrcoef (…)
¶Compute a matrix of correlation coefficients.
x is an array where each column contains a variable and each row is an observation.
If a second input y (of the same size as x) is given then calculate the correlation coefficients between x and y.
param, value are optional pairs of parameters and values which modify the calculation. Valid options are:
"alpha"
Confidence level used for the bounds of the confidence interval, lci and hci. Default is 0.05, i.e., 95% confidence interval.
"rows"
Determine processing of NaN values. Acceptable values are "all"
,
"complete"
, and "pairwise"
. Default is "all"
.
With "complete"
, only the rows without NaN values are considered.
With "pairwise"
, the selection of NaN-free rows is made for each
pair of variables.
Output r is a matrix of Pearson’s product moment correlation coefficients for each pair of variables.
Output p is a matrix of pair-wise p-values testing for the null hypothesis of a correlation coefficient of zero.
Outputs lci and hci are matrices containing, respectively, the lower and higher bounds of the 95% confidence interval of each correlation coefficient.
rho =
spearman (x)
¶rho =
spearman (x, y)
¶Compute Spearman’s rank correlation coefficient rho.
For two data vectors x and y, Spearman’s rho is the correlation coefficient of the ranks of x and y.
If x and y are drawn from independent distributions,
rho
has zero mean and variance
1 / (N - 1)
,
where N is the length of the x and y vectors, and is
asymptotically normally distributed.
spearman (x)
is equivalent to
spearman (x, x)
.
tau =
kendall (x)
¶tau =
kendall (x, y)
¶Compute Kendall’s tau.
For two data vectors x, y of common length N, Kendall’s tau is the correlation of the signs of all rank differences of x and y; i.e., if both x and y have distinct entries, then
1 tau = ------- SUM sign (q(i) - q(j)) * sign (r(i) - r(j)) N (N-1) i,j
in which the q(i) and r(i) are the ranks of x and y, respectively.
If x and y are drawn from independent distributions,
Kendall’s
tau
is asymptotically normal with mean 0 and variance
(2 * (2N+5)) / (9 * N * (N-1))
.
kendall (x)
is equivalent to kendall (x,
x)
.