newstr =
strrep (str, ptn, rep)
¶newstr =
strrep (cellstr, ptn, rep)
¶newstr =
strrep (…, "overlaps", val)
¶Replace all occurrences of the pattern ptn in the string str with the string rep and return the result.
The optional argument "overlaps"
determines whether the pattern
can match at every position in str (true), or only for unique
occurrences of the complete pattern (false). The default is true.
s may also be a cell array of strings, in which case the replacement is done for each element and a cell array is returned.
Example:
strrep ("This is a test string", "is", "&%$") ⇒ "Th&%$ &%$ a test string"
newstr =
erase (str, ptn)
¶Delete all occurrences of ptn within str.
str and ptn can be ordinary strings, cell array of strings, or character arrays.
Examples
## string, single pattern erase ("Hello World!", " World") ⇒ "Hello!" ## cellstr, single pattern erase ({"Hello", "World!"}, "World") ⇒ {"Hello", "!"} ## string, multiple patterns erase ("The Octave interpreter is fabulous", ... {"interpreter ", "The "}) ⇒ "Octave is fabulous" ## cellstr, multiple patterns erase ({"The ", "Octave interpreter ", "is fabulous"}, ... {"interpreter ", "The "}) ⇒ {"", "Octave ", "is fabulous"}
Programming Note: erase
deletes the first instance of a pattern in a
string when there are overlapping occurrences. For example:
erase ("abababa", "aba") ⇒ "b"
For processing overlaps, see strrep
.
[s, e, te, m, t, nm, sp] =
regexp (str, pat)
¶[…] =
regexp (str, pat, "opt1", …)
¶Regular expression string matching.
Search for pat in UTF-8 encoded str and return the positions and substrings of any matches, or empty values if there are none.
The matched pattern pat can include any of the standard regex operators, including:
.
Match any character
* + ? {}
Repetition operators, representing
*
Match zero or more times
+
Match one or more times
?
Match zero or one times
{n}
Match exactly n times
{n,}
Match n or more times
{m,n}
Match between m and n times
[…] [^…]
List operators. The pattern will match any character listed between
"["
and "]"
. If the first character is "^"
then the
pattern is inverted and any character except those listed between brackets
will match.
Escape sequences defined below can also be used inside list operators. For
example, a template for a floating point number might be [-+.\d]+
.
() (?:)
Grouping operator. The first form, parentheses only, also creates a token.
|
Alternation operator. Match one of a choice of regular expressions. The
alternatives must be delimited by the grouping operator ()
above.
^ $
Anchoring operators. Requires pattern to occur at the start (^
) or
end ($
) of the string.
In addition, the following escaped characters have special meaning.
\d
Match any digit
\D
Match any non-digit
\s
Match any whitespace character
\S
Match any non-whitespace character
\w
Match any word character
\W
Match any non-word character
\<
Match the beginning of a word
\>
Match the end of a word
\B
Match within a word
Implementation Note: For compatibility with MATLAB, escape sequences
in pat (e.g., "\n"
=> newline) are expanded
even when pat has been defined with single quotes. To disable
expansion use a second backslash before the escape sequence (e.g.,
"\\n") or use the regexptranslate
function.
The outputs of regexp
default to the order given below
The start indices of each matching substring
The end indices of each matching substring
The extents of each matched token surrounded by (…)
in
pat
A cell array of the text of each match
A cell array of the text of each token matched
A structure containing the text of each matched named token, with the name
being used as the fieldname. A named token is denoted by
(?<name>…)
.
A cell array of the text not returned by match, i.e., what remains if you split the string based on pat.
Particular output arguments, or the order of the output arguments, can be selected by additional opt arguments. These are strings and the correspondence between the output arguments and the optional argument are
'start' | s | ||
'end' | e | ||
'tokenExtents' | te | ||
'match' | m | ||
'tokens' | t | ||
'names' | nm | ||
'split' | sp |
Additional arguments are summarized below.
Return only the first occurrence of the pattern.
Make the matching case sensitive. (default)
Alternatively, use (?-i) in the pattern.
Ignore case when matching the pattern to the string.
Alternatively, use (?i) in the pattern.
Match the anchor characters at the beginning and end of the string. (default)
Alternatively, use (?-m) in the pattern.
Match the anchor characters at the beginning and end of the line.
Alternatively, use (?m) in the pattern.
The pattern .
matches all characters including the newline character.
(default)
Alternatively, use (?s) in the pattern.
The pattern .
matches all characters except the newline character.
Alternatively, use (?-s) in the pattern.
All characters in the pattern, including whitespace, are significant and are used in pattern matching. (default)
Alternatively, use (?-x) in the pattern.
The pattern may include arbitrary whitespace and also comments beginning with the character ‘#’.
Alternatively, use (?x) in the pattern.
Zero-length matches are not returned. (default)
Return zero-length matches.
regexp ('a', 'b*', 'emptymatch')
returns [1 2]
because there
are zero or more 'b'
characters at positions 1 and end-of-string.
Stack Limitation Note: Pattern searches are done with a recursive function which can overflow the program stack when there are a high number of matches. For example,
regexp (repmat ('a', 1, 1e5), '(a)+')
may lead to a segfault. As an alternative, consider constructing pattern
searches that reduce the number of matches (e.g., by creatively using set
complement), and then further processing the return variables (now reduced in
size) with successive regexp
searches.
Octave’s regexp
implementation is based on the Perl Compatible
Regular Expressions library (https://www.pcre.org/). For a more
comprehensive list of regexp
operator syntax see the
"PCRE Syntax quick-reference summary".
[s, e, te, m, t, nm, sp] =
regexpi (str, pat)
¶[…] =
regexpi (str, pat, "opt1", …)
¶Case insensitive regular expression string matching.
Search for pat in UTF-8 encoded str and return the positions and
substrings of any matches, or empty values if there are none.
See regexp
, for details on the syntax of the search
pattern.
See also: regexp.
outstr =
regexprep (string, pat, repstr)
¶outstr =
regexprep (string, pat, repstr, "opt1", …)
¶Replace occurrences of pattern pat in string with repstr.
The pattern is a regular expression as documented for regexp
.
See regexp
.
All strings must be UTF-8 encoded.
The replacement string may contain $i
, which substitutes for the ith
set of parentheses in the match string. For example,
regexprep ("Bill Dunn", '(\w+) (\w+)', '$2, $1')
returns "Dunn, Bill"
Options in addition to those of regexp
are
Replace only the first occurrence of pat in the result.
This option is present for compatibility but is ignored.
Implementation Note: For compatibility with MATLAB, escape sequences
in pat (e.g., "\n"
=> newline) are expanded
even when pat has been defined with single quotes. To disable
expansion use a second backslash before the escape sequence (e.g.,
"\\n") or use the regexptranslate
function.
str =
regexptranslate (op, s)
¶Translate a string for use in a regular expression.
This may include either wildcard replacement or special character escaping.
The behavior is controlled by op which can take the following values
"wildcard"
The wildcard characters .
, *
, and ?
are replaced with
wildcards that are appropriate for a regular expression. For example:
regexptranslate ("wildcard", "*.m") ⇒ '.*\.m'
"escape"
The characters $.?[]
, that have special meaning for regular
expressions are escaped so that they are treated literally. For example:
regexptranslate ("escape", "12.5") ⇒ '12\.5'