27 Sets

Octave has a number of functions for managing sets of data. A set is defined as a collection of unique elements and is typically represented by a vector of numbers sorted in ascending order. Any vector or matrix can be converted to a set by removing duplicates through the use of the unique function. However, it isn’t necessary to explicitly create a set as all of the functions which operate on sets will convert their input to a set before proceeding.

: y = unique (x)
: y = unique (x, "rows")
: y = unique (…, "sorted")
: y = unique (…, "stable")
: [y, i, j] = unique (…)
: [y, i, j] = unique (…, "first")
: [y, i, j] = unique (…, "last")
: [y, i, j] = unique (…, "legacy")

Return the unique elements of x.

If the input x is a column vector then return a column vector; Otherwise, return a row vector. x may also be a cell array of strings.

If the optional argument "rows" is given then return the unique rows of x. The input must be a 2-D numeric matrix to use this option.

The optional argument "sorted"/"stable" controls the order in which unique values appear in the output. The default is "sorted" and values in the output are placed in ascending order. The alternative "stable" preserves the order found in the input x.

If requested, return column index vectors i and j such that y = x(i) and x = y(j).

Additionally, if i is a requested output then one of the flags "first" or "last" may be given. If "last" is specified, return the highest possible indices in i, otherwise, if "first" is specified, return the lowest. The default is "first".

Example 1 : sort order

unique ([3, 1, 1, 2])
⇒ [1, 2, 3]
unique ([3, 1, 1, 2], "stable")
⇒ [3, 1, 2]

Example 2 : index selection

[~, i] = unique ([3, 1, 1, 2], "first")
⇒ i = [2; 4; 1]
[~, i] = unique ([3, 1, 1, 2], "last")
⇒ i = [3; 4; 1]

Programming Notes: The input flag "legacy" changes the algorithm to be compatible with MATLAB releases prior to R2012b. Specifically, The index ordering flag is changed to "last", and the shape of the outputs i, j will follow the shape of the input x rather than always being column vectors.

The third output, j, has not been implemented yet when the sort order is "stable".

See also: union, intersect, setdiff, setxor, ismember.

: c = uniquetol (A)
: c = uniquetol (A, tol)
: c = uniquetol (…, property, value)
: [c, ia, ic] = uniquetol (…)

Return the unique elements of A within tolerance tol.

Two values, x and y, are within relative tolerance if abs (x - y) <= tol * max (abs (A(:))).

The input A must be a real (non-complex) floating point type (double or single).

If tol is unspecified, the default tolerance is 1e-12 for double precision input or 1e-6 for single precision input.

The function may also be called with the following optional property/value pairs. Property/value pairs must be passed after other input arguments:

"ByRows" (default: false)

When true, return the unique rows of A. A must be a 2-D array to use this option. For rows, the criteria for uniqueness is changed to all (abs (x - y) <= tol*max (abs (A),[],1)) which compares each column component of a row against a column-specific tolerance.

"DataScale"

The tolerance test is changed to abs (x - y) <= tol*DS where DS is a scalar unless the property "ByRows" is true. In that case, DS can either be a scalar or a vector with a length equal to the number of columns in A. Using a value of 1.0 for DS will change the tolerance from a relative one to an absolute tolerance. Using a value of Inf will disable testing.

"OutputAllIndices" (default: false)

When true, ia is a cell array (not a vector) that contains the indices for all elements in A that are within tolerance of a value in C. That is, each cell in ia corresponds to a single unique value in C, and the values in each cell correspond to locations in A.

The output c is a row vector if the input A is a row vector. For all other cases, a column vector is returned.

The optional output ia is a column index vector such that c = A(ia). If the "ByRows" property is true, the condition is c = A(ia, :). If the "OutputAllIndices" property is true, then the values A(ia{i}) are all within tolerance of the unique value c(i).

The optional output ic is a column index vector such that A = c(ic) when A is a vector. When A is a matrix, A(:) = c(ic). If the "ByRows" property is true then A = c(ic,:).

Example: small round-off errors require uniquetol, not unique

x = [1:5];
## Inverse_Function (Function (x)) should return exactly x
y = exp (log (x));
D = unique ([x, y])
⇒ [1   2   3   3   4   5   5]
C = uniquetol ([x, y])
⇒ [1   2   3   4   5]

See also: unique, union, intersect, setdiff, setxor, ismember.