Next: Correlation and Regression Analysis, Previous: Statistics on Sliding Windows of Data, Up: Statistics [Contents][Index]

Octave supports various helpful statistical functions. Many are useful as initial steps to prepare a data set for further analysis. Others provide different measures from those of the basic descriptive statistics.

- :
`y`=**center***(*¶`x`) - :
`y`=**center***(*¶`x`,`dim`) Center data by subtracting its mean.

If

`x`is a vector, subtract its mean.If

`x`is a matrix, do the above for each column.If the optional argument

`dim`is given, operate along this dimension.Programming Note:

`center`

has obvious application for normalizing statistical data. It is also useful for improving the precision of general numerical calculations. Whenever there is a large value that is common to a batch of data, the mean can be subtracted off, the calculation performed, and then the mean added back to obtain the final answer.**See also:**zscore.

- :
`z`=**zscore***(*¶`x`) - :
`z`=**zscore***(*¶`x`,`opt`) - :
`z`=**zscore***(*¶`x`,`opt`,`dim`) - :
*[*`z`,`mu`,`sigma`] =**zscore***(…)*¶ Compute the Z score of

`x`.If

`x`is a vector, subtract its mean and divide by its standard deviation. If the standard deviation is zero, divide by 1 instead.The optional parameter

`opt`determines the normalization to use when computing the standard deviation and has the same definition as the corresponding parameter for`std`

.If

`x`is a matrix, calculate along the first non-singleton dimension. If the third optional argument`dim`is given, operate along this dimension.The optional outputs

`mu`and`sigma`contain the mean and standard deviation.

- :
`z`=**normalize***(*¶`x`) - :
`z`=**normalize***(*¶`x`,`dim`) - :
`z`=**normalize***(…,*¶`method`) - :
`z`=**normalize***(…,*¶`method`,`option`) - :
`z`=**normalize***(…,*¶`scale`,`scaleoption`,`center`,`centeroption`) - :
*[*`z`,`c`,`s`] =**normalize***(…)*¶ -
Return a normalization of the data in

`x`using one of several available scaling and centering methods.`normalize`

by default will return the`zscore`

of`x`, defined as the number of standard deviations each element is from the mean of`x`. This is equivalent to centering at the mean of the data and scaling by the standard deviation.The returned value

`z`will have the same size as`x`. The optional return variables`c`and`s`are the centering and scaling factors used in the normalization such that:`z`= (`x`-`c`) ./`s`If

`x`is a vector,`normalize`

will operate on the data in`x`.If

`x`is a matrix,`normalize`

will operate independently on each column in`x`.If

`x`is an N-dimensional array,`normalize`

will operate independently on the first non-singleton dimension in`x`.If the optional second argument

`dim`is given, operate along this dimension.The optional inputs

`method`and`option`can be used to specify the type of normalization performed on`x`. Note that only the`scale`and`center`options may be specified together using any of the methods defined below. Valid normalization methods are:`zscore`

(Default) Normalizes the elements in

`x`to the scaled distance from a central value. Valid Options:`std`

(Default) Data is centered at

`mean (`

and scaled by the standard deviation.`x`)`robust`

Data is centered at

`median (`

and scaled by the median absolute deviation.`x`)

`norm`

`z`is the general vector norm of`x`, with`option`being the normalization factor`p`that determines the vector norm type according to:`z`= [sum (abs (`x`) .^`p`)] ^ (1/`p`)`p`can be any positive scalar, specific values being:`p`= 1`x`is normalized by`sum (abs (`

.`x`))`p`= 2(Default)

`x`is normalized by the Euclidian norm, or vector magnitude, of the elements.`P`= Inf`x`is normalized by`max (abs (`

.`x`))

`scale`

`x`is scaled by a factor determined by`option`, which can be a numeric scalar or one of the following:`std`

(Default)

`x`is scaled by its standard deviation.`mad`

`x`is scaled by its median absolute deviation.`first`

`x`is scaled by its first element.`iqr`

`x`is scaled by its interquartile range.

`range`

`x`is scaled to fit the range specified by`option`as a two element scalar row vector. The default range is [0, 1].`center`

`x`is shifted by an amount determined by`option`, which can be a numeric scalar or one of the following:`mean`

(Default)

`x`is shifted by`mean (`

.`x`)`median`

`x`is shifted by`median (`

.`x`)

`medianiqr`

`x`is shifted by`median (`

and scaled by the interquartile range.`x`)

Known MATLAB incompatibilities:

- The option
`DataVariables`is not yet implemented for Table class`x`inputs. - Certain arrays containing NaN elements may not return MATLAB compatible output.

**See also:**zscore, iqr, norm, rescale, std, median, mean, mad.

- :
`n`=**histc***(*¶`x`,`edges`) - :
`n`=**histc***(*¶`x`,`edges`,`dim`) - :
*[*`n`,`idx`] =**histc***(…)*¶ Compute histogram counts.

When

`x`is a vector, the function counts the number of elements of`x`that fall in the histogram bins defined by`edges`. This must be a vector of monotonically increasing values that define the edges of the histogram bins.

contains the number of elements in`n`(k)`x`for which

. The final element of`edges`(k) <=`x`<`edges`(k+1)`n`contains the number of elements of`x`exactly equal to the last element of`edges`.When

`x`is an*N*-dimensional array, the computation is carried out along dimension`dim`. If not specified`dim`defaults to the first non-singleton dimension.When a second output argument is requested an index matrix is also returned. The

`idx`matrix has the same size as`x`. Each element of`idx`contains the index of the histogram bin in which the corresponding element of`x`was counted.**See also:**hist.

`unique`

function documented at unique is often
useful for statistics.

- :
`c`=**nchoosek***(*¶`n`,`k`) - :
`c`=**nchoosek***(*¶`set`,`k`) -
Compute the binomial coefficient of

`n`or list all possible combinations of a`set`of items.If

`n`is a scalar then calculate the binomial coefficient of`n`and`k`which is defined as/ \ | n | n (n-1) (n-2) … (n-k+1) n! | | = ------------------------- = --------- | k | k! k! (n-k)! \ /

This is the number of combinations of

`n`items taken in groups of size`k`.If the first argument is a vector,

`set`, then generate all combinations of the elements of`set`, taken`k`at a time, with one row per combination. The result`c`has`k`columns and`nchoosek (length (`

rows.`set`),`k`)For example:

How many ways can three items be grouped into pairs?

nchoosek (3, 2) ⇒ 3

What are the possible pairs?

nchoosek (1:3, 2) ⇒ 1 2 1 3 2 3

Programming Note: When calculating the binomial coefficient

`nchoosek`

works only for non-negative, integer arguments. Use`bincoeff`

for non-integer and negative scalar arguments, or for computing many binomial coefficients at once with vector inputs for`n`or`k`.

- :
`P`=**perms***(*¶`v`) - :
`P`=**perms***(*¶`v`, "unique") Generate all permutations of vector

`v`with one row per permutation.Results are returned in inverse lexicographic order. The result has size

`factorial (`

, where`n`) *`n``n`is the length of`v`. Any repeated elements are included in the output.If the optional argument

`"unique"`

is given then only unique permutations are returned, using less memory and generally taking less time than calling`unique (perms (`

.`v`), "rows")Example 1

perms ([1, 2, 3]) ⇒ 3 2 1 3 1 2 2 3 1 2 1 3 1 3 2 1 2 3

Example 2

perms ([1, 1, 2, 2], "unique") ⇒ 2 2 1 1 2 1 2 1 2 1 1 2 1 2 2 1 1 2 1 2 1 1 2 2

Programming Note: If the

`"unique"`

option is not used, the length of`v`should be no more than 10-12 to limit memory consumption. Even with`"unique"`

, there should be no more than 10-12 unique elements in`v`.

- :
`y`=**ranks***(*¶`x`) - :
`y`=**ranks***(*¶`x`,`dim`) - :
`y`=**ranks***(*¶`x`,`dim`,`rtype`) Return the ranks (in the sense of order statistics) of

`x`along the first non-singleton dimension adjusted for ties.If the optional

`dim`argument is given, operate along this dimension.The optional parameter

`rtype`determines how ties are handled. All examples below assume an input of`[ 1, 2, 2, 4 ]`

.- 0 or
`"fractional"`

(default) for fractional ranking (1, 2.5, 2.5, 4);

- 1 or
`"competition"`

for competition ranking (1, 2, 2, 4); - 2 or
`"modified"`

for modified competition ranking (1, 3, 3, 4); - 3 or
`"ordinal"`

for ordinal ranking (1, 2, 3, 4); - 4 or
`"dense"`

for dense ranking (1, 2, 2, 3).

- 0 or

- :
`cnt`=**run_count***(*¶`x`,`n`) - :
`cnt`=**run_count***(*¶`x`,`n`,`dim`) Count the upward runs along the first non-singleton dimension of

`x`of length 1, 2, …,`n`-1 and greater than or equal to`n`.If the optional argument

`dim`is given then operate along this dimension.**See also:**runlength.

- :
*count =***runlength***(*¶`x`) - :
*[count, value] =***runlength***(*¶`x`) Find the lengths of all sequences of common values.

`count`is a vector with the lengths of each repeated value.The optional output

`value`contains the value that was repeated in the sequence.runlength ([2, 2, 0, 4, 4, 4, 0, 1, 1, 1, 1]) ⇒ 2 1 3 1 4

**See also:**run_count.