GNU Octave: Audio Data Processing

This is an old version of the Octave manual. Find the latest version at: https://docs.octave.org/latest.

33.5 Audio Data Processing

Octave provides a few functions for dealing with audio data. An audio ‘sample’ is a single output value from an A/D converter, i.e., a small integer number (usually 8 or 16 bits), and audio data is just a series of such samples. It can be characterized by three parameters: the sampling rate (measured in samples per second or Hz, e.g., 8000 or 44100), the number of bits per sample (e.g., 8 or 16), and the number of channels (1 for mono, 2 for stereo, etc.).

There are many different formats for representing such data. Currently, only the two most popular, linear encoding and mu-law encoding, are supported by Octave. There is an excellent FAQ on audio formats by Guido van Rossum guido@cwi.nl which can be found at any FAQ ftp site, in particular in the directory /pub/usenet/news.answers/audio-fmts of the archive site rtfm.mit.edu.

Octave simply treats audio data as vectors of samples (non-mono data are not supported yet). It is assumed that audio files using linear encoding have one of the extensions lin or raw, and that files holding data in mu-law encoding end in au, mu, or snd.

Function File: lin2mu (x, n)

Convert audio data from linear to mu-law.

Mu-law values use 8-bit unsigned integers. Linear values use n-bit signed integers or floating point values in the range -1 ≤ x ≤ 1 if n is 0.

If n is not specified it defaults to 0, 8, or 16 depending on the range of values in x.

See also: mu2lin.

Function File: mu2lin (x, n)

Convert audio data from mu-law to linear.

Mu-law values are 8-bit unsigned integers. Linear values use n-bit signed integers or floating point values in the range -1≤y≤1 if n is 0.

If n is not specified it defaults to 0.

See also: lin2mu.

Function File: record (sec)

Function File: record (sec, fs)

Record sec seconds of audio from the system’s default audio input at a sampling rate of 8000 samples per second.

If the optional argument fs is given, it specifies the sampling rate for recording.

For more control over audio recording, use the audiorecorder class.

See also: sound, soundsc.

Function File: sound (y)

Function File: sound (y, fs)

Function File: sound (y, fs, nbits)

Play audio data y at sample rate fs to the default audio device.

The audio signal y can be a vector or a two-column array, representing mono or stereo audio, respectively.

If fs is not given, a default sample rate of 8000 samples per second is used.

The optional argument nbits specifies the bit depth to play to the audio device and defaults to 8 bits.

For more control over audio playback, use the audioplayer class.

See also: soundsc, record.

Function File: soundsc (y)

Function File: soundsc (y, fs)

Function File: soundsc (y, fs, nbits)

Function File: soundsc (…, [ymin, ymax])

Scale the audio data y and play it at sample rate fs to the default audio device.

The audio signal y can be a vector or a two-column array, representing mono or stereo audio, respectively.

If fs is not given, a default sample rate of 8000 samples per second is used.

The optional argument nbits specifies the bit depth to play to the audio device and defaults to 8 bits.

By default, y is automatically normalized to the range [-1, 1]. If the range [ymin, ymax] is given, then elements of y that fall within the range ymin ≤ y ≤ ymax are scaled to the range [-1, 1] instead.

For more control over audio playback, use the audioplayer class.

See also: sound, record.

Function File: y = wavread (filename)

Function File: [y, fs, nbits] = wavread (filename)

Function File: […] = wavread (filename, n)

Function File: […] = wavread (filename, [n1 n2])

Function File: […] = wavread (…, datatype)

Function File: sz = wavread (filename, "size")

Function File: [n_samp, n_chan] = wavread (filename, "size")

Read the audio signal y from the RIFF/WAVE sound file filename.

If the file contains multichannel data, then y is a matrix with the channels represented as columns.

If n is specified, only the first n samples of the file are returned. If [n1 n2] is specified, only the range of samples from n1 to n2 is returned. A value of Inf can be used to represent the total number of samples in the file.

If the option "size" is given, then the size of the audio signal is returned instead of the data. The size is returned in a row vector of the form [samples channels]. If there are two output arguments, the number of samples is assigned to the first and the number of channels is assigned to the second.

The optional return value fs is the sample rate of the audio file in Hz. The optional return value nbits is the number of bits per sample as encoded in the file.

See also: audioread, audiowrite, wavwrite.

Function File: wavwrite (y, filename)

Function File: wavwrite (y, fs, filename)

Function File: wavwrite (y, fs, nbits, filename)

Write the audio signal y to the RIFF/WAVE sound file filename.

If y is a matrix, the columns represent multiple audio channels.

The optional argument fs specifies the sample rate of the audio signal in Hz.

The optional argument nbits specifies the number of bits per sample to write to filename.

The default sample rate is 8000 Hz and the default bit depth is 16 bits per sample.

See also: audiowrite, audioread, wavread.