module Xstr_match:sig
..end
Copyright 1999 by Gerd Stolpmann
type
variable
A 'variable' can record matched regions
type
charset
sets of characters
type
matcher =
| |
Literal of |
| |
Anystring |
| |
Lazystring |
| |
Anychar |
| |
Anystring_from of |
| |
Lazystring_from of |
| |
Anychar_from of |
| |
Nullstring |
| |
Alternative of |
| |
Optional of |
| |
Record of |
| |
Scanner of |
Literal s: matches literally s and nothing else
Anystring/Lazystring matches a string of arbitrary length with arbitrary
contents
Anystring_from s/
Lazystring_from s matches a string of arbitrary length with characters
from charset s
Anychar: matches an arbitrary character
Anychar_from s: matches a character from charset s
Nullstring: matches the empty string
Alternative
ml1; ml2; ...
first tries the sequence ml1, then ml2, and so on
until one of the sequences leads to a match of the
whole string
Optional ml: first tries the sequence ml, them the empty string.
= Alternative ml; [Nullstring]
Record (v, ml): matches the same as ml, but the region of the string
is recorded in v
Scanner f: f s is called where s is the rest to match. The function
should return the number of characters it can match,
or raise Not_found
val match_string : matcher list -> string -> bool
match_string ml s: Tries to match 'ml' against the string 's'; returns true on success, and false otherwise. As side-effect, the variables in 'ml' are set. Matching proceeds from left to right, and for some of the matchers there are particular matching orders. The first match that is found using this order is returned (i.e. the variables get their values from this match). Notes:
type
replacer =
| |
ReplaceLiteral of |
| |
ReplaceVar of |
| |
ReplaceFunction of |
type
rflag =
| |
Anchored |
|||
| |
Limit of |
(* | | RightToLeft | *) |
val replace_matched_substrings : matcher list ->
replacer list -> rflag list -> string -> string * int
replace_matched_substrings ml rl fl s:
All substrings of 's' are matched against 'ml' in turn, and all non-overlapping matchings are replaced according 'rl'. The standard behaviour is to test from left to right, and to replace all occurences of substrings. This can be modified by 'fl':
val var : string -> variable
var s: creates new variable with initial value s. If this variable is used in a subsequent matching, and a value is found, the value is overwritten; otherwise the old value persists.
Note thread-safety: variables must not be shared by multiple threads.
val var_matched : variable -> bool
returns true if the variable matched a value in the last match_string
val string_of_var : variable -> string
returns the current value of the variable
val found_string_of_var : variable -> string
returns the current value of the variable only if there was a match for this variable in the last match_string; otherwise raise Not_found
val mkset : string -> charset
creates a set from readable description. The string simply enumerates the characters of the set, and the notation "x-y" is possible, too. To include '-' in the set, put it at the beginning or end.
val mknegset : string -> charset
creates the complement that mkset would create
----------------------------------------------------------------------
EXAMPLE:
let v = var "" in
let _ = match_string Literal "("; Record (v, [Anystring]); Literal ")"
s
in found_string_of_var v
VARIANT I:
let v = var "" in
let _ = match_string Lazystring;
Literal "("; Record (v, [Lazystring]); Literal ")";
Anystring
s
in found_string_of_var v
To get the last substring, swap Lazystring and Anystring at the beginning resp. end.
VARIANT II:
let v = var "" in
let _ = match_string Lazystring;
Literal "("; Record (v, [Anystring]); Literal ")";
Anystring
s
in found_string_of_var v
----------------------------------------------------------------------
EXAMPLE:
let v = var "" in
let digits = mkset "0-9" in
let digits_re = Record(v, [ Anychar_from digits; Anystring_from digits])
in
replace_matched_substrings digits_re ReplaceLiteral "D"
[] "ab012cd456fg"
yields: ("abDcdDfg", 2)
VARIANT I:
replace_matched_substrings digits_re ReplaceLiteral "D"
Limit 1
"ab012cd456fg"
yields: ("abDcd456fg", 1)
VARIANT II:
replace_matched_substrings digits_re ReplaceLiteral "D"
Anchored
"ab012cd456fg"
yields: ("ab012cd456fg", 0)
VARIANT III:
replace_matched_substrings digits_re ReplaceLiteral "D"
Anchored
"012"
yields: ("D", 1)
VARIANT IV:
let f() = string_of_int(1+int_of_string(string_of_var v)) in
replace_matched_substrings digits_re ReplaceFunction f
[] "ab012cd456fg"
yields: ("ab13cd457fg", 2)