26.2 Statistics on Sliding Windows of Data

It is often useful to calculate descriptive statistics over a subsection (i.e., window) of a full dataset. Octave provides the function movfun which will call an arbitrary function handle with windows of data and accumulate the results. Many of the most commonly desired functions, such as the moving average over a window of data (movmean), are already provided.

 
: y = movfun (fcn, x, wlen)
: y = movfun (fcn, x, [nb, na])
: y = movfun (…, "property", value)

Apply function fcn to a moving window of length wlen on data x.

The moving window length input wlen can either be a numeric scalar or a 2-element numeric array [nbna]. The elements included in the moving window depend on the size and value of wlen as well as whether the "SamplePoints" option has been specified. For full details of element inclusion, see movslice.

During calculations the data input x is reshaped into a 2-dimensional wlen-by-N matrix and fcn is called on this new matrix. Therefore, fcn must accept an array input argument and apply the computation along dimension 1, i.e., down the columns of the array.

When applied to an array (possibly multi-dimensional) with n columns, fcn may return a result in either of two formats: Format 1) an array of size 1-by-n-by-dim3-by-…-by-dimN. This is the typical output format from Octave core functions. Type demo ("movfun", 5) for an example of this use case. Format 2) a row vector of length n * numel_higher_dims where numel_higher_dims is prod (size (x)(3:end)). The output of fcn for the i-th input column must be found in the output at indices i:n:(n*numel_higher_dims). This format is useful when concatenating functions into arrays, or when using nthargout. Type demo ("movfun", 6) for an example of this case.

The calculation can be controlled by specifying property/value pairs. Valid properties are

"dim"

Operate along the dimension specified, rather than the default of the first non-singleton dimension.

"SamplePoints"

This property specifies a sorted, numeric vector of unique coordinate positions of the data points in x. The default value is the vector [1 : numel (x)]. When a non-default SamplePoints vector is specified, the moving window length wlen is measured against the SamplePoints positions to determine which points are included in each window slice. SamplePoints need not be uniformly spaced. This can result in window slices containing different numbers of points.

"Endpoints"

This property controls how results are calculated at the boundaries (endpoints) of the window. Possible values are:

"shrink" (default)

The window is truncated at the beginning and end of the array to exclude elements for which there is no source data. For example, with a window of length 3, y(1) = fcn (x(1:2)), and y(end) = fcn (x(end-1:end)).

"discard"

Any y values that use a window extending beyond the original data array are deleted. For example, with a 10-element data vector and a window of length 3, the output will contain only 8 elements. The first element would require calculating the function over indices [0, 1, 2] and is therefore discarded. The last element would require calculating the function over indices [9, 10, 11] and is therefore discarded.

"fill"

Any window elements outside the data array are replaced by NaN. For example, with a window of length 3, y(1) = fcn ([NaN, x(1:2)]), and y(end) = fcn ([x(end-1:end), NaN]). This option usually results in y having NaN values at the boundaries, although it is influenced by how fcn handles NaN, and also by the property "nancond".

user_value

Any window elements outside the data array are replaced by the specified value user_value which must be a numeric scalar. For example, with a window of length 3, y(1) = fcn ([user_value, x(1:2)]), and y(end) = fcn ([x(end-1:end), user_value]). A common choice for user_value is 0.

"same"

Any window elements outside the data array are replaced by the value of x at the boundary. For example, with a window of length 3, y(1) = fcn ([x(1), x(1:2)]), and y(end) = fcn ([x(end-1:end), x(end)]).

"periodic"

The window is wrapped so that any missing data elements are taken from the other side of the data. For example, with a window of length 3, y(1) = fcn ([x(end), x(1:2)]), and y(end) = fcn ([x(end-1:end), x(1)]).

Note 1: For non-uniform SamplePoint spacing, the only permitted value for "EndPoints" is "shrink".

Note 2: For some "Endpoints" options, the window size at the boundaries may not be the same as for the central part, and fcn must work in these cases.

"nancond"

Controls haw NaN and NA values affect the output of "movfun". The value "includenan" (default) causes NaN and NA values to be included in the moving window, and any window slice containing NaN or NA values will return NaN for that element. The value "omitnan" causes "movfun" to ignore any NaN or NA values resulting in fewer elements being used to calculate the result for that window slice. If "omitnan" is specified and a window slice contains all NaN or NA values, "movfun" returns the value specified by the "nanval" property for that element. The values "includemissing" and "omitmissing" may be used synonymously with "includenan" and "omitnan", respectively.

"nanval"

Specifies the value to return when "nancond" is set to "omitnan" or "omitmissing" and all elements in a window are NaN or NA. "nanval" must be a numeric scalar value or NaN (default).

"outdim"

A row vector that selects which dimensions of the calculation will appear in the output y. This is only useful when fcn returns an N-dimensional array in Format 1. The default is to return all output dimensions.

Programming Note: The property "outdim" can be used to save memory when the output of fcn has many dimensions, or when a wrapper to the base function that selects the desired outputs is too costly. When memory is not an issue, the easiest way to select output dimensions is to first calculate the complete result with movfun and then filter that result with indexing. If code complexity is not an issue then a wrapper can be created using anonymous functions. For example, if basefcn is a function returning a K-dimensional row output, and only dimension D is desired, then the following wrapper could be used.

fcn = @(x) basefcn (x)(:,columns(x) * (D-1) + (1:columns(x)));
y = movfun (@fcn, ...);

See also: movslice, prepad, postpad, permute, reshape.

 
: slcidx = movslice (N, wlen)
: slcidx = movslice (N, wlen, samplepoints)
: [slcidx, C, Cpre, Cpost, win, wlen, scalar_wlen] = movslice (…)

Generate indices to slice a vector of length N into windows of length wlen.

The input N must be a positive integer.

The moving window length input wlen can either be a numeric scalar or a 2-element numeric array. The elements included in the moving window will depend on the size and value of wlen as well as whether the samplepoints input was specified.

The optional input samplepoints is a sorted, numeric vector of unique positions of the N data points. The default value is the vector [1 : N]. When a non-default samplepoints vector is specified, the moving window length wlen is measured against the samplepoints positions to determine which points are included in each window slice. It should be noted that samplepoints need not be uniformly spaced which can result in window slices containing different numbers of points. Because of this, as specified below the shape and content of some movslice outputs will be different when a non-default samplepoints is used.

The moving window size and included elements will be defined as follows:

  • If samplepoints has the default value of 1:N (or has not been specified):
    • For integer-valued wlen:
      • For odd, integer-valued, scalar wlen the window is symmetric and includes (wlen - 1) / 2 elements on either side of the central element. For example, the window slice at index 5 with a window length of 3 will include the elements [4, 5, 6].
      • For even, integer-valued, scalar wlen the window is asymmetric and has wlen/2 elements to the left of the central element and wlen/2 - 1 elements to the right of the central element. For example, the window slice at index 5 with a window length of 4 will include the elements [3, 4, 5, 6].
      • For integer-valued vector wlen of the form [nbna] where nb and na are integer valued the window includes nb elements to the left of the central element and na elements to the right of the central element. For example, given wlen = [3, 1], the window slice at index 5 will include the elements [2, 3, 4, 5, 6].
    • For non-integer-valued scalar wlen:
      • Non-integer-valued scalar wlen will be converted to two-element vector form with nb = na = fix (wlen / 2), and then processed as stated above for integer-valued vectors. For example, the window slice at index 5 with wlen = 2.5 will include the elements [3, 4, 5, 6, 7].
      • Non-integer-valued vector wlen will be truncated to integer values with wlen = fix (wlen) and then processed as stated above for integer-valued vectors. For example, the window slice at index 5 with wlen = [1.2, 2.3] will include the elements [4, 5, 6, 7].
  • If samplepoints has been specified with a non-default vector:
    • For vector wlen specified as [nbna], the window will include all points within a distance less than or equal to nb before and na after the central element’s position, with point positions defined by the elements of samplepoints. For example, at index 5 with wlen = [2, 3] and the 3rd-8th elements of samplepoints being [1, 3, 5, 7, 8 ,9], the window slice will include the elements [4, 5, 6, 7] corresponding to samplepoints [3, 5, 7, 8].
    • Scalar wlen will be converted to two-element vector form with nb = na = wlen / 2. The window will then include all points within a distance of less than or equal to nb before and less than, but not equal to, na after the central element’s position, [nbna). For example, at index 5 with wlen = [2, 3] and the 3rd-8th elements of samplepoints being [1, 3, 5, 7, 8 ,9], the window slice will include the elements [4, 5, 6] corresponding to samplepoints [3, 5, 7].

The output slcidx is an array of indices of the slices of the vector.

  • If samplepoints is default or unspecified, slcidx will contain only the indices of the slices that fit fully within the vector. Each column will be the indices of one slice as the window moves from left to right. The slices will have fix (wlen) elements for scalar wlen, or nb + na + 1 elements for array valued wlen.
  • If a non-default samplepoints has been specified, slcidx will be a 2xN array with the first and second rows containing the first and last elements of each slice, respectively.

Optional output C is a row vector of window center positions where the window stays fully within the vector.

Optional outputs Cpre and Cpost contain the vector elements at the start and end of the vector, respectively, that result in the window extending beyond the ends of the vector.

Optional output win contains information for creating the moving window.

  • If samplepoints is default or unspecified, win is a column vector with the same number of rows as slcidx that contains the moving window defined as a center relative position stencil.
  • If a non-default samplepoints has been specified, win will be a 2xN array with the first and second rows containing the left and right bounds of each window slice, respectively, using the same coordinates as samplepoints. These bounds may lie outside of the position vector specified by samplepoints.

Optional output wlen returns the window length used by movslice in two-element [nbna] form.

Optional logical output scalar_wlen returns the scalar or vector state of the input wlen so that calling functions can determinine whether the moving window should be inclusive or exclusive of the right window endpoints. I.e., inclusive [nbna] for vector wlen or exclusive [nbna) for scalar wlen.

See also: movfun.

 
: y = movmad (x, wlen)
: y = movmad (x, [nb, na])
: y = movmad (…, dim)
: y = movmad (…, nancond)
: y = movmad (…, property, value)

Calculate the moving median or mean absolute deviation over a sliding window of length wlen on data x.

The moving window length input wlen can either be a numeric scalar or a 2-element numeric array [nbna]. The elements included in the moving window depend on the size and value of wlen as well as whether the "SamplePoints" option has been specified. For full details of element inclusion, see movslice.

If the optional argument dim is given, operate along this dimension.

The optional argument nancond is a string that controls how NaN and NA values affect the output of "movmad". The value "includenan" causes NaN and NA values to be included in the moving window, and any window slice containing NaN or NA values will return NaN for that element. The value "omitnan" (default) causes "movmad" to ignore any NaN or NA values resulting in fewer elements being used to calculate the mad for that window slice. If "omitnan" is specified and a window slice contains all NaN or NA values, "movmad" returns NaN for that element. The values "includemissing" and "omitmissing" may be used synonymously with "includenan" and "omitnan", respectively.

The calculation can be controlled by specifying property/value pairs:

  • The "method" property can take the value "median" (default) or "mean" to control whether "movmad" performs median or mean absolute deviation calculations on the data.
  • Additional valid properties are "Endpoints" and "SamplePoints". For full descriptions of these properties and valid options, see movfun.

Programming Note: This function is a wrapper which calls movfun. For full documentation of inputs and options, see movfun.

Compatibility Note: Prior to Octave 10 this function only calculated mean absolute deviation. For MATLAB compatibility, the default has been changed to median absolute deviation. The "method" property is now provided to enable access to both "mad" calculation methods. This property should not be expected to be functional outside of Octave code.

See also: mad, movfun, movslice, movmax, movmean, movmedian, movmin, movprod, movstd, movsum, movvar.

 
: y = movmax (x, wlen)
: y = movmax (x, [nb, na])
: y = movmax (…, dim)
: y = movmax (…, nancond)
: y = movmax (…, property, value)

Calculate the moving maximum over a sliding window of length wlen on data x.

The moving window length input wlen can either be a numeric scalar or a 2-element numeric array [nbna]. The elements included in the moving window depend on the size and value of wlen as well as whether the "SamplePoints" option has been specified. For full details of element inclusion, see movslice.

If the optional argument dim is given, operate along this dimension.

The optional argument nancond is a string that controls how NaN and NA values affect the output of "movmax". The value "includenan" causes NaN and NA values to be included in the moving window, and any window slice containing NaN or NA values will return NaN for that element. The value "omitnan" (default) causes "movmax" to ignore any NaN or NA values resulting in fewer elements being used to calculate the maximum for that window slice. If "omitnan" is specified and a window slice contains all NaN or NA values, "movmax" returns NaN for that element. The values "includemissing" and "omitmissing" may be used synonymously with "includenan" and "omitnan", respectively.

The calculation can be controlled by specifying property/value pairs. Valid properties are "Endpoints" and "SamplePoints". For full descriptions of these properties and valid options, see movfun.

Programming Note: This function is a wrapper which calls movfun. For full documentation of inputs and options, see movfun.

See also: max, movfun, movslice, movmad, movmean, movmedian, movmin, movprod, movstd, movsum, movvar.

 
: y = movmean (x, wlen)
: y = movmean (x, [nb, na])
: y = movmean (…, dim)
: y = movmean (…, nancond)
: y = movmean (…, property, value)

Calculate the moving average over a sliding window of length wlen on data x.

The moving window length input wlen can either be a numeric scalar or a 2-element numeric array [nbna]. The elements included in the moving window depend on the size and value of wlen as well as whether the "SamplePoints" option has been specified. For full details of element inclusion, see movslice.

If the optional argument dim is given, operate along this dimension.

The optional argument nancond is a string that controls how NaN and NA values affect the output of "movmean". The value "includenan" (default) causes NaN and NA values to be included in the moving window, and any window slice containing NaN or NA values will return NaN for that element. The value "omitnan" causes "movmean" to ignore any NaN or NA values resulting in fewer elements being used to calculate the mean for that window slice. If "omitnan" is specified and a window slice contains all NaN or NA values, "movmean" returns NaN for that element. The values "includemissing" and "omitmissing" may be used synonymously with "includenan" and "omitnan", respectively.

The calculation can be controlled by specifying property/value pairs. Valid properties are "Endpoints" and "SamplePoints". For full descriptions of these properties and valid options, see movfun.

Programming Note: This function is a wrapper which calls movfun. For full documentation of inputs and options, see movfun.

See also: mean, movfun, movslice, movmad, movmax, movmedian, movmin, movprod, movstd, movsum, movvar.

 
: y = movmedian (x, wlen)
: y = movmedian (x, [nb, na])
: y = movmedian (…, dim)
: y = movmedian (…, nancond)
: y = movmedian (…, property, value)

Calculate the moving median over a sliding window of length wlen on data x.

The moving window length input wlen can either be a numeric scalar or a 2-element numeric array [nbna]. The elements included in the moving window depend on the size and value of wlen as well as whether the "SamplePoints" option has been specified. For full details of element inclusion, see movslice.

If the optional argument dim is given, operate along this dimension.

The optional argument nancond is a string that controls how NaN and NA values affect the output of "movmedian". The value "includenan" (default) causes NaN and NA values to be included in the moving window, and any window slice containing NaN or NA values will return NaN for that element. The value "omitnan" causes "movmedian" to ignore any NaN or NA values resulting in fewer elements being used to calculate the median for that window slice. If "omitnan" is specified and a window slice contains all NaN or NA values, "movmedian" returns NaN for that element. The values "includemissing" and "omitmissing" may be used synonymously with "includenan" and "omitnan", respectively.

The calculation can be controlled by specifying property/value pairs. Valid properties are "Endpoints" and "SamplePoints". For full descriptions of these properties and valid options, see movfun.

Programming Note: This function is a wrapper which calls movfun. For full documentation of inputs and options, see movfun.

See also: median, movfun, movslice, movmad, movmax, movmean, movmin, movprod, movstd, movsum, movvar.

 
: y = movmin (x, wlen)
: y = movmin (x, [nb, na])
: y = movmin (…, dim)
: y = movmin (…, nancond)
: y = movmin (…, property, value)

Calculate the moving minimum over a sliding window of length wlen on data x.

The moving window length input wlen can either be a numeric scalar or a 2-element numeric array [nbna]. The elements included in the moving window depend on the size and value of wlen as well as whether the "SamplePoints" option has been specified. For full details of element inclusion, see movslice.

If the optional argument dim is given, operate along this dimension.

The optional argument nancond is a string that controls how NaN and NA values affect the output of "movmin". The value "includenan" causes NaN and NA values to be included in the moving window, and any window slice containing NaN or NA values will return NaN for that element. The value "omitnan" (default) causes "movmin" to ignore any NaN or NA values resulting in fewer elements being used to calculate the minimum for that window slice. If "omitnan" is specified and a window slice contains all NaN or NA values, "movmin" returns NaN for that element. The values "includemissing" and "omitmissing" may be used synonymously with "includenan" and "omitnan", respectively.

The calculation can be controlled by specifying property/value pairs. Valid properties are "Endpoints" and "SamplePoints". For full descriptions of these properties and valid options, see movfun.

Programming Note: This function is a wrapper which calls movfun. For full documentation of inputs and options, see movfun.

See also: min, movfun, movslice, movmad, movmax, movmean, movmedian, movprod, movstd, movsum, movvar.

 
: y = movprod (x, wlen)
: y = movprod (x, [nb, na])
: y = movprod (…, dim)
: y = movprod (…, nancond)
: y = movprod (…, property, value)

Calculate the moving product over a sliding window of length wlen on data x.

The moving window length input wlen can either be a numeric scalar or a 2-element numeric array [nbna]. The elements included in the moving window depend on the size and value of wlen as well as whether the "SamplePoints" option has been specified. For full details of element inclusion, see movslice.

If the optional argument dim is given, operate along this dimension.

The optional argument nancond is a string that controls how NaN and NA values affect the output of "movprod". The value "includenan" (default) causes NaN and NA values to be included in the moving window, and any window slice containing NaN or NA values will return NaN for that element. The value "omitnan" causes "movprod" to ignore any NaN or NA values resulting in fewer elements being used to calculate the product for that window slice. If "omitnan" is specified and a window slice contains all NaN or NA values, "movprod" returns 1 for that element. The values "includemissing" and "omitmissing" may be used synonymously with "includenan" and "omitnan", respectively.

The calculation can be controlled by specifying property/value pairs. Valid properties are "Endpoints" and "SamplePoints". For full descriptions of these properties and valid options, see movfun.

Programming Note: This function is a wrapper which calls movfun. For full documentation of inputs and options, see movfun.

See also: prod, movfun, movslice, movmad, movmax, movmean, movmedian, movmin, movstd, movsum, movvar.

 
: y = movstd (x, wlen)
: y = movstd (x, [nb, na])
: y = movstd (…, opt)
: y = movstd (…, opt, dim)
: y = movstd (…, nancond)
: y = movstd (…, property, value)

Calculate the moving standard deviation over a sliding window of length wlen on data x.

The moving window length input wlen can either be a numeric scalar or a 2-element numeric array [nbna]. The elements included in the moving window depend on the size and value of wlen as well as whether the "SamplePoints" option has been specified. For full details of element inclusion, see movslice.

The optional argument opt determines the type of normalization to use. Valid values are

0:

normalize with N-1, provides the square root of the best unbiased estimator of the variance [default]

1:

normalize with N, this provides the square root of the second moment around the mean

If the optional argument dim is given, operate along this dimension. The normalization argument opt must be given before the dimension. To use the default value for opt you may pass an empty input argument [].

The optional argument nancond is a string that controls how NaN and NA values affect the output of "movstd". The value "includenan" (default) causes NaN and NA values to be included in the moving window, and any window slice containing NaN or NA values will return NaN for that element. The value "omitnan" causes "movstd" to ignore any NaN or NA values resulting in fewer elements being used to calculate the standard deviation for that window slice. If "omitnan" is specified and a window slice contains all NaN or NA values, "movstd" returns NaN for that element. The values "includemissing" and "omitmissing" may be used synonymously with "includenan" and "omitnan", respectively.

The calculation can be controlled by specifying property/value pairs. Valid properties are "Endpoints" and "SamplePoints". For full descriptions of these properties and valid options, see movfun.

Programming Note: This function is a wrapper which calls movfun. For full documentation of inputs and options, see movfun.

See also: std, movfun, movslice, movmad, movmax, movmean, movmedian, movmin, movprod, movsum, movvar.

 
: y = movsum (x, wlen)
: y = movsum (x, [nb, na])
: y = movsum (…, dim)
: y = movsum (…, nancond)
: y = movsum (…, property, value)

Calculate the moving sum over a sliding window of length wlen on data x.

The moving window length input wlen can either be a numeric scalar or a 2-element numeric array [nbna]. The elements included in the moving window depend on the size and value of wlen as well as whether the "SamplePoints" option has been specified. For full details of element inclusion, see movslice.

If the optional argument dim is given, operate along this dimension.

The optional argument nancond is a string that controls how NaN and NA values affect the output of "movsum". The value "includenan" (default) causes NaN and NA values to be included in the moving window, and any window slice containing NaN or NA values will return NaN for that element. The value "omitnan" causes "movsum" to ignore any NaN or NA values resulting in fewer elements being used to calculate the sum for that window slice. If "omitnan" is specified and a window slice contains all NaN or NA values, "movsum" returns 0 for that element. The values "includemissing" and "omitmissing" may be used synonymously with "includenan" and "omitnan", respectively.

The calculation can be controlled by specifying property/value pairs. Valid properties are "Endpoints" and "SamplePoints". For full descriptions of these properties and valid options, see movfun.

Programming Note: This function is a wrapper which calls movfun. For full documentation of inputs and options, see movfun.

See also: sum, movfun, movslice, movmad, movmax, movmean, movmedian, movmin, movprod, movstd, movvar.

 
: y = movvar (x, wlen)
: y = movvar (x, [nb, na])
: y = movvar (…, opt)
: y = movvar (…, opt, dim)
: y = movvar (…, nancond)
: y = movvar (…, property, value)

Calculate the moving variance over a sliding window of length wlen on data x.

The moving window length input wlen can either be a numeric scalar or a 2-element numeric array [nbna]. The elements included in the moving window depend on the size and value of wlen as well as whether the "SamplePoints" option has been specified. For full details of element inclusion, see movslice.

The optional argument opt determines the type of normalization to use. Valid values are:

0:

normalize with N-1, provides the best unbiased estimator of the variance [default]

1:

normalizes with N, this provides the second moment around the mean

If the optional argument dim is given, operate along this dimension. The normalization argument opt must be given before the dimension. To use the default value for opt you may pass an empty input argument [].

The optional argument nancond is a string that controls how NaN and NA values affect the output of "movvar". The value "includenan" (default) causes NaN and NA values to be included in the moving window, and any window slice containing NaN or NA values will return NaN for that element. The value "omitnan" causes "movvar" to ignore any NaN or NA values resulting in fewer elements being used to calculate the variance for that window slice. If "omitnan" is specified and a window slice contains all NaN or NA values, "movvar" returns NaN for that element. The values "includemissing" and "omitmissing" may be used synonymously with "includenan" and "omitnan", respectively.

The calculation can be controlled by specifying property/value pairs. Valid properties are "Endpoints" and "SamplePoints". For full descriptions of these properties and valid options, see movfun.

Programming Note: This function is a wrapper which calls movfun. For full documentation of inputs and options, see movfun.

See also: var, movfun, movslice, movmad, movmax, movmean, movmedian, movmin, movprod, movstd, movsum.