Next: , Previous:   [Contents][Index]

## 27 Sets

Octave has a number of functions for managing sets of data. A set is defined as a collection of unique elements and is typically represented by a vector of numbers sorted in ascending order. Any vector or matrix can be converted to a set by removing duplicates through the use of the `unique` function. However, it isn’t necessary to explicitly create a set as all of the functions which operate on sets will convert their input to a set before proceeding.

: y = unique (x)
: y = unique (x, "rows")
: y = unique (…, "sorted")
: y = unique (…, "stable")
: [y, i, j] = unique (…)
: [y, i, j] = unique (…, "first")
: [y, i, j] = unique (…, "last")
: [y, i, j] = unique (…, "legacy")

Return the unique elements of x.

If the input x is a column vector then return a column vector; Otherwise, return a row vector. x may also be a cell array of strings.

If the optional argument `"rows"` is given then return the unique rows of x. The input must be a 2-D numeric matrix to use this option.

The optional argument `"sorted"`/`"stable"` controls the order in which unique values appear in the output. The default is `"sorted"` and values in the output are placed in ascending order. The alternative `"stable"` preserves the order found in the input x.

If requested, return column index vectors i and j such that `y = x(i)` and `x = y(j)`.

Additionally, if i is a requested output then one of the flags `"first"` or `"last"` may be given. If `"last"` is specified, return the highest possible indices in i, otherwise, if `"first"` is specified, return the lowest. The default is `"first"`.

Example 1 : sort order

```unique ([3, 1, 1, 2])
⇒ [1, 2, 3]
unique ([3, 1, 1, 2], "stable")
⇒ [3, 1, 2]
```

Example 2 : index selection

```[~, i] = unique ([3, 1, 1, 2], "first")
⇒ i = [2; 4; 1]
[~, i] = unique ([3, 1, 1, 2], "last")
⇒ i = [3; 4; 1]
```

Programming Notes: The input flag `"legacy"` changes the algorithm to be compatible with MATLAB releases prior to R2012b. Specifically, The index ordering flag is changed to `"last"`, and the shape of the outputs i, j will follow the shape of the input x rather than always being column vectors.

The third output, j, has not been implemented yet when the sort order is `"stable"`.

: c = uniquetol (A)
: c = uniquetol (A, tol)
: c = uniquetol (…, property, value)
: [c, ia, ic] = uniquetol (…)

Return the unique elements of A within tolerance tol.

Two values, x and y, are within relative tolerance if `abs (x - y) <= tol * max (abs (A(:)))`.

The input A must be a real (non-complex) floating point type (double or single).

If tol is unspecified, the default tolerance is 1e-12 for double precision input or 1e-6 for single precision input.

The function may also be called with the following optional property/value pairs. Property/value pairs must be passed after other input arguments:

`"ByRows"` (default: `false`)

When true, return the unique rows of A. A must be a 2-D array to use this option. For rows, the criteria for uniqueness is changed to `all (abs (x - y) <= tol*max (abs (A),[],1))` which compares each column component of a row against a column-specific tolerance.

`"DataScale"`

The tolerance test is changed to `abs (x - y) <= tol*DS` where DS is a scalar unless the property `"ByRows"` is true. In that case, DS can either be a scalar or a vector with a length equal to the number of columns in A. Using a value of `1.0` for DS will change the tolerance from a relative one to an absolute tolerance. Using a value of `Inf` will disable testing.

`"OutputAllIndices"` (default: `false`)

When true, ia is a cell array (not a vector) that contains the indices for all elements in A that are within tolerance of a value in C. That is, each cell in ia corresponds to a single unique value in C, and the values in each cell correspond to locations in A.

The output c is a row vector if the input A is a row vector. For all other cases, a column vector is returned.

The optional output ia is a column index vector such that `c = A(ia)`. If the `"ByRows"` property is true, the condition is `c = A(ia, :)`. If the `"OutputAllIndices"` property is true, then the values `A(ia{i})` are all within tolerance of the unique value `c(i)`.

The optional output ic is a column index vector such that `A = c(ic)` when A is a vector. When A is a matrix, `A(:) = c(ic)`. If the `"ByRows"` property is true then `A = c(ic,:)`.

Example: small round-off errors require `uniquetol`, not `unique`

```x = [1:5];
## Inverse_Function (Function (x)) should return exactly x
y = exp (log (x));
D = unique ([x, y])
⇒ [1   2   3   3   4   5   5]
C = uniquetol ([x, y])
⇒ [1   2   3   4   5]
```