From 1f15026060293b28e35d9037f3145476dbedc083 Mon Sep 17 00:00:00 2001
From: John Maddock Copyright (c) 1998-2001 Dr John Maddock Permission to use, copy, modify,
- distribute and sell this software and its documentation
- for any purpose is hereby granted without fee, provided
- that the above copyright notice appear in all copies and
- that both that copyright notice and this permission
- notice appear in supporting documentation. Dr John
- Maddock makes no representations about the suitability of
- this software for any purpose. It is provided "as is"
- without express or implied warranty. This is the first port of regex++ to the boost library, and is
-based on regex++ 2.x, see changes.txt for a full list of changes
-from the previous version. There are no known functionality bugs
-except that POSIX style equivalence classes are only guaranteed
-correct if the Win32 localization model is used (the default for
-Win32 builds of the library). There are some aspects of the code that C++ puritans will
-consider to be poor style, in particular the use of goto in some
-of the algorithms. The code could be cleaned up, by changing to a
-recursive implementation, although it is likely to be slower in
-that case. The performance of the algorithms should be satisfactory in
-most cases. For example the times taken to match the ftp response
-expression "^([0-9]+)(\-| |$)(.*)$" against the string
-"100- this is a line of ftp response which contains a
-message string" are: BSD implementation 450 micro seconds,
-GNU implementation 271 micro seconds, regex++ 127 micro seconds (Pentium
-P90, Win32 console app under MS Windows 95). However it should be noted that there are some "pathological"
-expressions which may require exponential time for matching;
-these all involve nested repetition operators, for example
-attempting to match the expression "(a*a)*b" against N
-letter a's requires time proportional to 2N.
-These expressions can (almost) always be rewritten in such a way
-as to avoid the problem, for example "(a*a)*b" could be
-rewritten as "a*b" which requires only time linearly
-proportional to N to solve. In the general case, non-nested
-repeat expressions require time proportional to N2,
-however if the clauses are mutually exclusive then they can be
-matched in linear time - this is the case with "a*b",
-for each character the matcher will either match an "a"
-or a "b" or fail, where as with "a*a" the
-matcher can't tell which branch to take (the first "a"
-or the second) and so has to try both. Be careful how you
-write your regular expressions and avoid nested repeats if you
-can! New to this version, some previously pathological cases have
-been fixed - in particular searching for expressions which
-contain leading repeats and/or leading literal strings should be
-much faster than before. Literal strings are now searched for
-using the Knuth/Morris/Pratt algorithm (this is used in
-preference to the Boyer/More algorithm because it allows the
-tracking of newline characters). Some aspects of the POSIX regular expression syntax are
-implementation defined: Class reg_expression<> and its typedefs regex and wregex
-are thread safe, in that compiled regular expressions can safely
-be shared between threads. The matching algorithms regex_match,
-regex_search, regex_grep, regex_format and regex_merge are all re-entrant
-and thread safe. Class match_results is now thread safe, in that
-the results of a match can be safely copied from one thread to
-another (for example one thread may find matches and push
-match_results instances onto a queue, while another thread pops
-them off the other end), otherwise use a separate instance of
-match_results per thread. The POSIX API functions are all re-entrant and thread safe,
-regular expressions compiled with regcomp can also be
-shared between threads. The class RegEx is only thread safe if each thread gets its
-own RegEx instance (apartment threading) - this is a consequence
-of RegEx handling both compiling and matching regular expressions.
- Finally note that changing the global locale invalidates all
-compiled regular expressions, therefore calling set_locale
-from one thread while another uses regular expressions will
-produce unpredictable results. There is also a requirement that there is only one thread
-executing prior to the start of main(). Regex++ provides extensive support for run-time
-localization, the localization model used can be split into two
-parts: front-end and back-end. Front-end localization deals with everything which the user
-sees - error messages, and the regular expression syntax itself.
-For example a French application could change [[:word:]] to [[:mot:]]
-and \w to \m. Modifying the front end locale requires active
-support from the developer, by providing the library with a
-message catalogue to load, containing the localized strings.
-Front-end locale is affected by the LC_MESSAGES category only. Back-end localization deals with everything that occurs after
-the expression has been parsed - in other words everything that
-the user does not see or interact with directly. It deals with
-case conversion, collation, and character class membership. The
-back-end locale does not require any intervention from the
-developer - the library will acquire all the information it
-requires for the current locale from the underlying operating
-system / run time library. This means that if the program user
-does not interact with regular expressions directly - for example
-if the expressions are embedded in your C++ code - then no
-explicit localization is required, as the library will take care
-of everything for you. For example embedding the expression [[:word:]]+
-in your code will always match a whole word, if the program is
-run on a machine with, for example, a Greek locale, then it will
-still match a whole word, but in Greek characters rather than
-Latin ones. The back-end locale is affected by the LC_TYPE and
-LC_COLLATE categories. There are three separate localization mechanisms supported by
-regex++: Win32 localization model. This is the default model when the library is compiled under
-Win32, and is encapsulated by the traits class w32_regex_traits.
-When this model is in effect there is a single global locale as
-defined by the user's control panel settings, and returned by
-GetUserDefaultLCID. All the settings used by regex++ are acquired
-directly from the operating system bypassing the C run time
-library. Front-end localization requires a resource dll,
-containing a string table with the user-defined strings. The
-traits class exports the function: static std::string set_message_catalogue(const std::string&
-s); which needs to be called with a string identifying the name of
-the resource dll, before your code compiles any regular
-expressions (but not necessarily before you construct any reg_expression
-instances): boost::w32_regex_traits<char>::set_message_catalogue("mydll.dll");
- Note that this API sets the dll name for both the
-narrow and wide character specializations of w32_regex_traits. This model does not currently support thread specific locales
-(via SetThreadLocale under Windows NT), the library provides full
-Unicode support under NT, under Windows 9x the library degrades
-gracefully - characters 0 to 255 are supported, the remainder are
-treated as "unknown" graphic characters. C localization model. This is the default model when the library is compiled under
-an operating system other than Win32, and is encapsulated by the
-traits class c_regex_traits,
-Win32 users can force this model to take effect by defining the
-pre-processor symbol BOOST_REGEX_USE_C_LOCALE. When this model is
-in effect there is a single global locale, as set by setlocale.
-All settings are acquired from your run time library,
-consequently Unicode support is dependent upon your run time
-library implementation. Front end localization requires a POSIX
-message catalogue. The traits class exports the function: static std::string set_message_catalogue(const std::string&
-s); which needs to be called with a string identifying the name of
-the message catalogue, before your code compiles any
-regular expressions (but not necessarily before you construct any
-reg_expression instances): boost::c_regex_traits<char>::set_message_catalogue("mycatalogue");
- Note that this API sets the dll name for both the
-narrow and wide character specializations of c_regex_traits. If
-your run time library does not support POSIX message catalogues,
-then you can either provide your own implementation of
-<nl_types.h> or define BOOST_RE_NO_CAT to disable front-end
-localization via message catalogues. Note that calling setlocale invalidates all compiled
-regular expressions, calling setlocale(LC_ALL, "C")
-will make this library behave equivalent to most traditional
-regular expression libraries including version 1 of this library.
- C++ localization model.
- This model is only in effect if the library is built with the
-pre-processor symbol BOOST_REGEX_USE_CPP_LOCALE defined. When
-this model is in effect each instance of reg_expression<>
-has its own instance of std::locale, class reg_expression<>
-also has a member function imbue which allows the locale
-for the expression to be set on a per-instance basis. Front end
-localization requires a POSIX message catalogue, which will be
-loaded via the std::messages facet of the expression's locale,
-the traits class exports the symbol: static std::string set_message_catalogue(const std::string&
-s); which needs to be called with a string identifying the name of
-the message catalogue, before your code compiles any
-regular expressions (but not necessarily before you construct any
-reg_expression instances): boost::cpp_regex_traits<char>::set_message_catalogue("mycatalogue");
- Note that calling reg_expression<>::imbue will
-invalidate any expression currently compiled in that instance of
-reg_expression<>. This model is the one which closest fits
-the ethos of the C++ standard library, however it is the model
-which will produce the slowest code, and which is the least well
-supported by current standard library implementations, for
-example I have yet to find an implementation of std::locale which
-supports either message catalogues, or locales other than "C"
-or "POSIX". Finally note that if you build the library with a non-default
-localization model, then the appropriate pre-processor symbol (BOOST_REGEX_USE_C_LOCALE
-or BOOST_REGEX_USE_CPP_LOCALE) must be defined both when you
-build the support library, and when you include <boost/regex.hpp>
-or <boost/cregex.hpp> in your code. The best way to ensure
-this is to add the #define to <boost/regex/detail/regex_options.hpp>.
- Providing a message catalogue: In order to localize the front end of the library, you need to
-provide the library with the appropriate message strings
-contained either in a resource dll's string table (Win32 model),
-or a POSIX message catalogue (C or C++ models). In the latter
-case the messages must appear in message set zero of the
-catalogue. The messages and their id's are as follows: Custom error messages are loaded as follows: Custom character class names are loaded as followed: Finally, custom collating element names are loaded starting
-from message id 400, and terminating when the first load
-thereafter fails. Each message looks something like: "tagname
-string" where tagname is the name used inside [[.tagname.]]
-and string is the actual text of the collating element.
-Note that the value of collating element [[.zero.]] is used for
-the conversion of strings to numbers - if you replace this with
-another value then that will be used for string parsing - for
-example use the Unicode character 0x0660 for [[.zero.]] if you
-want to use Unicode Arabic-Indic digits in your regular
-expressions in place of Latin digits. Note that the POSIX defined names for character classes and
-collating elements are always available - even if custom names
-are defined, in contrast, custom error messages, and custom
-syntax messages replace the default ones. There are three demo applications that ship with this library,
-they all come with makefiles for Borland, Microsoft and gcc
-compilers, otherwise you will have to create your own makefiles. A regression test application that gives the matching/searching
-algorithms a full workout. The presence of this program is your
-guarantee that the library will behave as claimed - at least as
-far as those items tested are concerned - if anyone spots
-anything that isn't being tested I'd be glad to hear about it. Files: parse.cpp, regress.cpp, tests.cpp. A simple grep implementation, run with no command line options
-to find out its usage. Look at fileiter.cpp/fileiter.hpp
-and the mapfile class to see an example of a "smart"
-bidirectional iterator that can be used with regex++ or any other
-STL algorithm. A simple interactive expression matching application, the
-results of all matches are timed, allowing the programmer to
-optimize their regular expressions where performance is critical.
- Files: regex_timer.cpp.
- The snippets examples contain the code examples used in the
-documentation: regex_match_example.cpp:
-ftp based regex_match example. regex_search_example.cpp:
-regex_search example: searches a cpp file for class definitions. regex_grep_example_1.cpp:
-regex_grep example 1: searches a cpp file for class definitions. regex_merge_example.cpp:
-regex_merge example: converts a C++ file to syntax highlighted
-HTML. regex_grep_example_2.cpp:
-regex_grep example 2: searches a cpp file for class definitions,
-using a global callback function. regex_grep_example_3.cpp:
-regex_grep example 2: searches a cpp file for class definitions,
-using a bound member function callback. regex_grep_example_4.cpp:
-regex_grep example 2: searches a cpp file for class definitions,
-using a C++ Builder closure as a callback. regex_split_example_1.cpp:
-regex_split example: split a string into tokens. regex_split_example_2.cpp:
-regex_split example: spit out linked URL's. There are two main headers used by this library: <boost/regex.hpp>
-provides full access to the entire library, while <boost/cregex.hpp>
-provides access to just the high level class RegEx, and the POSIX
-API functions. If you are using Microsoft or Borland C++ and link to a
-dll version of the run time library, then you will also link to
-one of the dll versions of regex++. While these dll's are
-redistributable, there are no "standard" versions, so
-when installing on the users PC, you should place these in a
-directory private to your application, and not in the PC's
-directory path. Note that if you link to a static version of your
-run time library, then you will also link to a static version of
-regex++ and no dll's will need to be distributed. The possible
-regex++ dll and library names are computed according to the
-following formula: "boost_regex_" Note: you can disable automatic library selection by defining
-the symbol BOOST_REGEX_NO_LIB when compiling, this is useful if
-you want to statically link even though you're using the dll
-version of your run time library, or if you need to debug regex++.
- This version of regex++ is the first to be ported to the boost project, and as a result
-has a number of changes to comply with the boost coding
-guidelines. Headers have been changed from <header> or <header.h>
-to <boost/header.hpp> The library namespace has changed from "jm", to
-"boost". The reg_xxx algorithms have been renamed regex_xxx (to improve
-naming consistency). Algorithm query_match has been renamed regex_match, and only
-returns true if the expression matches the whole of the input
-string (think input data validation). Compiling existing code: The directory, libs/regex/old_include contains a set of
-headers that make this version of regex++ compatible with
-previous ones, either add this directory to your include path, or
-copy these headers to the root directory of your boost
-installation. The contents of these headers are deprecated and
-undocumented - really these are just here for existing code - for
-new projects use the new header forms. The author can be contacted at John_Maddock@compuserve.com,
-the home page for this library is at http://ourworld.compuserve.com/homepages/John_Maddock/regexpp.htm,
-and the official boost version can be obtained from www.boost.org/libraries.htm. I am indebted to Robert Sedgewick's "Algorithms in C++"
-for forcing me to think about algorithms and their performance,
-and to the folks at boost for forcing me to think, period.
-The following people have all contributed useful comments or
-fixes: Dave Abrahams, Mike Allison, Edan Ayal, Jayashree
-Balasubramanian, Jan Bölsche, Beman Dawes, Paul Baxter, David
-Bergman, David Dennerline, Edward Diener, Peter Dimov, Robert
-Dunn, Fabio Forno, Tobias Gabrielsson, Rob Gillen, Marc Gregoire,
-Chris Hecker, Nick Hodapp, Jesse Jones, Martin Jost, Boris
-Krasnovskiy, Jan Hermelink, Max Leung, Wei-hao Lin, Jens Maurer,
-Richard Peters, Heiko Schmidt, Jason Shirk, Gerald Slacik, Scobie
-Smith, Mike Smyth, Alexander Sokolovsky, Hervé Poirier, Michael
-Raykh, Marc Recht, Scott VanCamp, Bruno Voigt, Alexey Voinov,
-Jerry Waldorf, Rob Ward, Lealon Watts, Thomas Witt and Yuval
-Yosef. I am also grateful to the manuals supplied with the Henry
-Spencer, Perl and GNU regular expression libraries - wherever
-possible I have tried to maintain compatibility with these
-libraries and with the POSIX standard - the code however is
-entirely my own, including any bugs! I can absolutely guarantee
-that I will not fix any bugs I don't know about, so if you have
-any comments or spot any bugs, please get in touch. Useful further information can be found at: A short tutorial on regular expressions can
-be found here. The Open
-Unix Specification contains a wealth of useful material,
-including the regular expression syntax, and specifications for <regex.h>
-and <nl_types.h>.
- The Pattern
-Matching Pointers site is a "must visit" resource
-for anyone interested in pattern matching. Glimpse and Agrep,
-use a simplified regular expression syntax to achieve faster
-search times. Udi Manber
-and Ricardo Baeza-Yates
-both have a selection of useful pattern matching papers available
-from their respective web sites. Copyright Dr
-John Maddock 1998-2000 all rights reserved.
+
-
-
-
-
-
-
- 
- Regex++, Appendices.
-
-
-Appendix 1: Implementation notes
-
-
-
-
-
-
-Appendix 2: Thread safety
-
-
-
-Appendix 3: Localization
-
-
-
-
-
-
-
-
- Message id
- Meaning
- Default value
-
-
-
-
- 101
- The character used to start
- a sub-expression.
- "("
-
-
-
-
- 102
- The character used to end a
- sub-expression declaration.
- ")"
-
-
-
-
- 103
- The character used to denote
- an end of line assertion.
- "$"
-
-
-
-
- 104
- The character used to denote
- the start of line assertion.
- "^"
-
-
-
-
- 105
- The character used to denote
- the "match any character expression".
- "."
-
-
-
-
- 106
- The match zero or more times
- repetition operator.
- "*"
-
-
-
-
- 107
- The match one or more
- repetition operator.
- "+"
-
-
-
-
- 108
- The match zero or one
- repetition operator.
- "?"
-
-
-
-
- 109
- The character set opening
- character.
- "["
-
-
-
-
- 110
- The character set closing
- character.
- "]"
-
-
-
-
- 111
- The alternation operator.
- "|"
-
-
-
-
- 112
- The escape character.
- "\\"
-
-
-
-
- 113
- The hash character (not
- currently used).
- "#"
-
-
-
-
- 114
- The range operator.
- "-"
-
-
-
-
- 115
- The repetition operator
- opening character.
- "{"
-
-
-
-
- 116
- The repetition operator
- closing character.
- "}"
-
-
-
-
- 117
- The digit characters.
- "0123456789"
-
-
-
-
- 118
- The character which when
- preceded by an escape character represents the word
- boundary assertion.
- "b"
-
-
-
-
- 119
- The character which when
- preceded by an escape character represents the non-word
- boundary assertion.
- "B"
-
-
-
-
- 120
- The character which when
- preceded by an escape character represents the word-start
- boundary assertion.
- "<"
-
-
-
-
- 121
- The character which when
- preceded by an escape character represents the word-end
- boundary assertion.
- ">"
-
-
-
-
- 122
- The character which when
- preceded by an escape character represents any word
- character.
- "w"
-
-
-
-
- 123
- The character which when
- preceded by an escape character represents a non-word
- character.
- "W"
-
-
-
-
- 124
- The character which when
- preceded by an escape character represents a start of
- buffer assertion.
- "`A"
-
-
-
-
- 125
- The character which when
- preceded by an escape character represents an end of
- buffer assertion.
- "'z"
-
-
-
-
- 126
- The newline character.
- "\n"
-
-
-
-
- 127
- The comma separator.
- ","
-
-
-
-
- 128
- The character which when
- preceded by an escape character represents the bell
- character.
- "a"
-
-
-
-
- 129
- The character which when
- preceded by an escape character represents the form feed
- character.
- "f"
-
-
-
-
- 130
- The character which when
- preceded by an escape character represents the newline
- character.
- "n"
-
-
-
-
- 131
- The character which when
- preceded by an escape character represents the carriage
- return character.
- "r"
-
-
-
-
- 132
- The character which when
- preceded by an escape character represents the tab
- character.
- "t"
-
-
-
-
- 133
- The character which when
- preceded by an escape character represents the vertical
- tab character.
- "v"
-
-
-
-
- 134
- The character which when
- preceded by an escape character represents the start of a
- hexadecimal character constant.
- "x"
-
-
-
-
- 135
- The character which when
- preceded by an escape character represents the start of
- an ASCII escape character.
- "c"
-
-
-
-
- 136
- The colon character.
- ":"
-
-
-
-
- 137
- The equals character.
- "="
-
-
-
-
- 138
- The character which when
- preceded by an escape character represents the ASCII
- escape character.
- "e"
-
-
-
-
- 139
- The character which when
- preceded by an escape character represents any lower case
- character.
- "l"
-
-
-
-
- 140
- The character which when
- preceded by an escape character represents any non-lower
- case character.
- "L"
-
-
-
-
- 141
- The character which when
- preceded by an escape character represents any upper case
- character.
- "u"
-
-
-
-
- 142
- The character which when
- preceded by an escape character represents any non-upper
- case character.
- "U"
-
-
-
-
- 143
- The character which when
- preceded by an escape character represents any space
- character.
- "s"
-
-
-
-
- 144
- The character which when
- preceded by an escape character represents any non-space
- character.
- "S"
-
-
-
-
- 145
- The character which when
- preceded by an escape character represents any digit
- character.
- "d"
-
-
-
-
- 146
- The character which when
- preceded by an escape character represents any non-digit
- character.
- "D"
-
-
-
-
- 147
- The character which when
- preceded by an escape character represents the end quote
- operator.
- "E"
-
-
-
-
- 148
- The character which when
- preceded by an escape character represents the start
- quote operator.
- "Q"
-
-
-
-
- 149
- The character which when
- preceded by an escape character represents a Unicode
- combining character sequence.
- "X"
-
-
-
-
- 150
- The character which when
- preceded by an escape character represents any single
- character.
- "C"
-
-
-
-
- 151
- The character which when
- preceded by an escape character represents end of buffer
- operator.
- "Z"
-
-
-
-
- 152
- The character which when
- preceded by an escape character represents the
- continuation assertion.
- "G"
-
-
-
-
- 153
- The character which when preceeded by (? indicates a
- zero width negated forward lookahead assert.
- !
-
-
-
-
-
-
-
-
-
- Message ID
- Error message ID
- Default string
-
-
-
-
- 201
- REG_NOMATCH
- "No match"
-
-
-
-
- 202
- REG_BADPAT
- "Invalid regular
- expression"
-
-
-
-
- 203
- REG_ECOLLATE
- "Invalid collation
- character"
-
-
-
-
- 204
- REG_ECTYPE
- "Invalid character
- class name"
-
-
-
-
- 205
- REG_EESCAPE
- "Trailing backslash"
-
-
-
-
-
- 206
- REG_ESUBREG
- "Invalid back reference"
-
-
-
-
-
- 207
- REG_EBRACK
- "Unmatched [ or [^"
-
-
-
-
-
- 208
- REG_EPAREN
- "Unmatched ( or \\("
-
-
-
-
-
- 209
- REG_EBRACE
- "Unmatched \\{"
-
-
-
-
- 210
- REG_BADBR
- "Invalid content of
- \\{\\}"
-
-
-
-
- 211
- REG_ERANGE
- "Invalid range end"
-
-
-
-
-
- 212
- REG_ESPACE
- "Memory exhausted"
-
-
-
-
-
- 213
- REG_BADRPT
- "Invalid preceding
- regular expression"
-
-
-
-
- 214
- REG_EEND
- "Premature end of
- regular expression"
-
-
-
-
- 215
- REG_ESIZE
- "Regular expression too
- big"
-
-
-
-
- 216
- REG_ERPAREN
- "Unmatched ) or \\)"
-
-
-
-
-
- 217
- REG_EMPTY
- "Empty expression"
-
-
-
-
-
- 218
- REG_E_UNKNOWN
- "Unknown error"
-
-
-
-
-
-
-
-
-
- Message ID
- Description
- Equivalent default class
- name
-
-
-
-
- 300
- The character class name for
- alphanumeric characters.
- "alnum"
-
-
-
-
- 301
- The character class name for
- alphabetic characters.
- "alpha"
-
-
-
-
- 302
- The character class name for
- control characters.
- "cntrl"
-
-
-
-
- 303
- The character class name for
- digit characters.
- "digit"
-
-
-
-
- 304
- The character class name for
- graphics characters.
- "graph"
-
-
-
-
- 305
- The character class name for
- lower case characters.
- "lower"
-
-
-
-
- 306
- The character class name for
- printable characters.
- "print"
-
-
-
-
- 307
- The character class name for
- punctuation characters.
- "punct"
-
-
-
-
- 308
- The character class name for
- space characters.
- "space"
-
-
-
-
- 309
- The character class name for
- upper case characters.
- "upper"
-
-
-
-
- 310
- The character class name for
- hexadecimal characters.
- "xdigit"
-
-
-
-
- 311
- The character class name for
- blank characters.
- "blank"
-
-
-
-
- 312
- The character class name for
- word characters.
- "word"
-
-
-
-
- 313
- The character class name for
- Unicode characters.
- "unicode"
-
-
-
-
-Appendix 4: Example Applications
-
-regress.exe:
-
-jgrep.exe
-
-timer.exe
-
-
-
-Appendix 5: Header Files
-
-
-
-Appendix 6: Redistributables
-
-
-
-+ BOOST_LIB_TOOLSET
-+ "_"
-+ BOOST_LIB_THREAD_OPT
-+ BOOST_LIB_RT_OPT
-+ BOOST_LIB_LINK_OPT
-+ BOOST_LIB_DEBUG_OPT
-
-These are defined as:
-
-BOOST_LIB_TOOLSET: The compiler toolset name (vc6, vc7, bcb5 etc).
-
-BOOST_LIB_THREAD_OPT: "s" for single thread builds,
-"m" for multithread builds.
-
-BOOST_LIB_RT_OPT: "s" for static runtime,
-"d" for dynamic runtime.
-
-BOOST_LIB_LINK_OPT: "s" for static link,
-"i" for dynamic link.
-
-BOOST_LIB_DEBUG_OPT: nothing for release builds,
-"d" for debug builds,
-"dd" for debug-diagnostic builds (_STLP_DEBUG).
-
-Notes for upgraders
-
-
-
-Further Information (Contacts and
-Acknowledgements)
-
-
-
-
+
+
+
+
+
+
+ 
+
+ Boost.Regex
+ Standards Conformance
+
+
+
+ ![]()
Boost.regex is intended to conform to the + regular expression standardization proposal, which will appear in a + future C++ standard technical report (and hopefully in a future version of the + standard). Currently there are some differences in how the regular + expression traits classes are defined, these will be fixed in a future release.
+All of the ECMAScript regular expression syntax features are supported, except + that:
+Negated class escapes (\S, \D and \W) are not permitted inside character class + definitions ( [...] ).
+The escape sequence \u matches any upper case character (the same as + [[:upper:]]) rather than a Unicode escape sequence; use \x{DDDD} for + Unicode escape sequences.
+Almost all Perl features are supported, except for:
+\N{name} Use [[:name:]] instead.
+\pP and \PP
+(?imsx-imsx)
+(?<=pattern)
+(?<!pattern)
+(?{code})
+(??{code})
+(?(condition)yes-pattern) and (?(condition)yes-pattern|no-pattern)
+These embarrassments / limitations will be removed in due course, mainly + dependent upon user demand.
+All the POSIX basic and extended regular expression features are supported, + except that:
+No character collating names are recognized except those specified in the POSIX + standard for the C locale, unless they are explicitly registered with the + traits class.
+Character equivalence classes ( [[=a=]] etc) are probably buggy except on + Win32. Implementing this feature requires knowledge of the format of the + string sort keys produced by the system; if you need this, and the default + implementation doesn't work on your platform, then you will need to supply a + custom traits class.
+Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + + diff --git a/doc/Attic/sub_match.html b/doc/Attic/sub_match.html new file mode 100644 index 00000000..db995312 --- /dev/null +++ b/doc/Attic/sub_match.html @@ -0,0 +1,426 @@ + + + ++
|
+ |
+
+ Boost.Regex+sub_match+ |
+
+ |
+
#include <boost/regex.hpp> +
+Regular expressions are different from many simple pattern-matching algorithms + in that as well as finding an overall match they can also produce + sub-expression matches: each sub-expression being delimited in the pattern by a + pair of parenthesis (...). There has to be some method for reporting + sub-expression matches back to the user: this is achieved this by defining a + class match_results that acts as an + indexed collection of sub-expression matches, each sub-expression match being + contained in an object of type sub_match + . +
Objects of type sub_match may only obtained by subscripting an object + of type match_results + . +
When the marked sub-expression denoted by an object of type sub_match<>
+ participated in a regular expression match then member matched evaluates
+ to true, and members first and second denote the
+ range of characters [first,second) which formed that match.
+ Otherwise matched is false, and members first and second
+ contained undefined values.
If an object of type sub_match<> represents sub-expression 0
+ - that is to say the whole match - then member matched is always
+ true, unless a partial match was obtained as a result of the flag match_partial
+ being passed to a regular expression algorithm, in which case member matched
+ is false, and members first and second represent the
+ character range that formed the partial match.
+namespace boost{
+
+template <class BidirectionalIterator>
+class sub_match : public std::pair<BidirectionalIterator, BidirectionalIterator>
+{
+public:
+ typedef typename iterator_traits<BidirectionalIterator>::value_type value_type;
+ typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type;
+ typedef BidirectionalIterator iterator;
+
+ bool matched;
+
+ difference_type length()const;
+ operator basic_string<value_type>()const;
+ basic_string<value_type> str()const;
+
+ int compare(const sub_match& s)const;
+ int compare(const basic_string<value_type>& s)const;
+ int compare(const value_type* s)const;
+};
+
+template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator == (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator != (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator < (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator > (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator >= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator <= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+
+template <class BidirectionalIterator>
+bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+template <class BidirectionalIterator>
+bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+template <class charT, class traits, class BidirectionalIterator>
+basic_ostream<charT, traits>&
+ operator << (basic_ostream<charT, traits>& os,
+ const sub_match<BidirectionalIterator>& m);
+
+} // namespace boost
+ typedef typename std::iterator_traits<iterator>::value_type value_type;+
The type pointed to by the iterators.
+typedef typename std::iterator_traits<iterator>::difference_type difference_type;+
A type that represents the difference between two iterators.
+typedef iterator iterator_type;+
The iterator type.
+iterator first+
An iterator denoting the position of the start of the match.
+iterator second+
An iterator denoting the position of the end of the match.
+bool matched+
A Boolean value denoting whether this sub-expression participated in the match.
+static difference_type length();+ +
+ Effects: returns (matched ? 0 : distance(first, second)).
operator basic_string<value_type>()const;+ +
+ Effects: returns (matched ? basic_string<value_type>(first,
+ second) : basic_string<value_type>()).
basic_string<value_type> str()const;+ +
+ Effects: returns (matched ? basic_string<value_type>(first,
+ second) : basic_string<value_type>()).
int compare(const sub_match& s)const;+ +
+ Effects: returns str().compare(s.str()).
int compare(const basic_string<value_type>& s)const;+ +
+ Effects: returns str().compare(s).
int compare(const value_type* s)const;+ +
+ Effects: returns str().compare(s).
template <class BidirectionalIterator> +bool operator == (const sub_match<BidirectionalIterator>& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs.compare(rhs) == 0.
template <class BidirectionalIterator> +bool operator != (const sub_match<BidirectionalIterator>& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs.compare(rhs) != 0.
template <class BidirectionalIterator> +bool operator < (const sub_match<BidirectionalIterator>& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs.compare(rhs) < 0.
template <class BidirectionalIterator> +bool operator <= (const sub_match<BidirectionalIterator>& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs.compare(rhs) <= 0.
template <class BidirectionalIterator> +bool operator >= (const sub_match<BidirectionalIterator>& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs.compare(rhs) >= 0.
template <class BidirectionalIterator> +bool operator > (const sub_match<BidirectionalIterator>& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs.compare(rhs) > 0.
template <class BidirectionalIterator> +bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs == rhs.str().
template <class BidirectionalIterator> +bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs != rhs.str().
template <class BidirectionalIterator> +bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs < rhs.str().
template <class BidirectionalIterator> +bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs > rhs.str().
template <class BidirectionalIterator> +bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs >= rhs.str().
template <class BidirectionalIterator> +bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs <= rhs.str().
template <class BidirectionalIterator> +bool operator == (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const* rhs);+ +
+ Effects: returns lhs.str() == rhs.
template <class BidirectionalIterator> +bool operator != (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const* rhs);+ +
+ Effects: returns lhs.str() != rhs.
template <class BidirectionalIterator> +bool operator < (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const* rhs);+ +
+ Effects: returns lhs.str() < rhs.
template <class BidirectionalIterator> +bool operator > (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const* rhs);+ +
+ Effects: returns lhs.str() > rhs.
template <class BidirectionalIterator> +bool operator >= (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const* rhs);+ +
+ Effects: returns lhs.str() >= rhs.
template <class BidirectionalIterator> +bool operator <= (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const* rhs);+ +
+ Effects: returns lhs.str() <= rhs.
template <class BidirectionalIterator> +bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs == rhs.str().
template <class BidirectionalIterator> +bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs != rhs.str().
template <class BidirectionalIterator> +bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs < rhs.str().
template <class BidirectionalIterator> +bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs > rhs.str().
template <class BidirectionalIterator> +bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs >= rhs.str().
template <class BidirectionalIterator> +bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs <= rhs.str().
template <class BidirectionalIterator> +bool operator == (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const& rhs);+ +
+ Effects: returns lhs.str() == rhs.
template <class BidirectionalIterator> +bool operator != (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const& rhs);+ +
+ Effects: returns lhs.str() != rhs.
template <class BidirectionalIterator> +bool operator < (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const& rhs);+ +
+ Effects: returns lhs.str() < rhs.
template <class BidirectionalIterator> +bool operator > (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const& rhs);+ +
+ Effects: returns lhs.str() > rhs.
template <class BidirectionalIterator> +bool operator >= (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const& rhs);+ +
+ Effects: returns lhs.str() >= rhs.
template <class BidirectionalIterator> +bool operator <= (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const& rhs);+ +
+ Effects: returns lhs.str() <= rhs.
template <class charT, class traits, class BidirectionalIterator> +basic_ostream<charT, traits>& + operator << (basic_ostream<charT, traits>& os + const sub_match<BidirectionalIterator>& m);+ +
+ Effects: returns (os << m.str()).
+
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + + diff --git a/doc/Attic/syntax.html b/doc/Attic/syntax.html new file mode 100644 index 00000000..f776cd3c --- /dev/null +++ b/doc/Attic/syntax.html @@ -0,0 +1,773 @@ + + + ++
|
+ |
+
+ Boost.Regex+Regular Expression Syntax+ |
+
+ |
+
This section covers the regular expression syntax used by this library, this is + a programmers guide, the actual syntax presented to your program's users will + depend upon the flags used during expression compilation. +
+All characters are literals except: ".", "|", "*", "?", "+", "(", ")", "{", + "}", "[", "]", "^", "$" and "\". These characters are literals when preceded by + a "\". A literal is a character that matches itself, or matches the result of + traits_type::translate(), where traits_type is the traits template parameter to + class basic_regex.
+The dot character "." matches any single character except : when match_not_dot_null + is passed to the matching algorithms, the dot does not match a null character; + when match_not_dot_newline is passed to the matching algorithms, then + the dot does not match a newline character. +
+A repeat is an expression that is repeated an arbitrary number of times. An + expression followed by "*" can be repeated any number of times including zero. + An expression followed by "+" can be repeated any number of times, but at least + once, if the expression is compiled with the flag regex_constants::bk_plus_qm + then "+" is an ordinary character and "\+" represents a repeat of once or more. + An expression followed by "?" may be repeated zero or one times only, if the + expression is compiled with the flag regex_constants::bk_plus_qm then "?" is an + ordinary character and "\?" represents the repeat zero or once operator. When + it is necessary to specify the minimum and maximum number of repeats + explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a" + repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2 + and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with + no upper limit. Note that there must be no white-space inside the {}, and there + is no upper limit on the values of the lower and upper bounds. When the + expression is compiled with the flag regex_constants::bk_braces then "{" and + "}" are ordinary characters and "\{" and "\}" are used to delimit bounds + instead. All repeat expressions refer to the shortest possible previous + sub-expression: a single character; a character set, or a sub-expression + grouped with "()" for example. +
+Examples: +
+"ba*" will match all of "b", "ba", "baaa" etc. +
+"ba+" will match "ba" or "baaaa" for example but not "b". +
+"ba?" will match "b" or "ba". +
+"ba{2,4}" will match "baa", "baaa" and "baaaa". +
+Whenever the "extended" regular expression syntax is in use (the default) then + non-greedy repeats are possible by appending a '?' after the repeat; a + non-greedy repeat is one which will match the shortest possible string. +
+For example to match html tag pairs one could use something like: +
+"<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>" +
+In this case $1 will contain the text between the tag pairs, and will be the + shortest possible matching string. +
+Parentheses serve two purposes, to group items together into a sub-expression, + and to mark what generated the match. For example the expression "(ab)*" would + match all of the string "ababab". The matching algorithms + regex_match and regex_search + each take an instance of match_results + that reports what caused the match, on exit from these functions the + match_results contains information both on what the whole expression + matched and on what each sub-expression matched. In the example above + match_results[1] would contain a pair of iterators denoting the final "ab" of + the matching string. It is permissible for sub-expressions to match null + strings. If a sub-expression takes no part in a match - for example if it is + part of an alternative that is not taken - then both of the iterators that are + returned for that sub-expression point to the end of the input string, and the matched + parameter for that sub-expression is false. Sub-expressions are indexed + from left to right starting from 1, sub-expression 0 is the whole expression. +
+Sometimes you need to group sub-expressions with parenthesis, but don't want + the parenthesis to spit out another marked sub-expression, in this case a + non-marking parenthesis (?:expression) can be used. For example the following + expression creates no sub-expressions: +
+"(?:abc)*"
+There are two forms of these; one for positive forward lookahead asserts, and + one for negative lookahead asserts:
+"(?=abc)" matches zero characters only if they are followed by the expression + "abc".
+"(?!abc)" matches zero characters only if they are not followed by the + expression "abc".
+"(?>expression)" matches "expression" as an independent atom (the algorithm + will not backtrack into it if a failure occurs later in the expression).
+Alternatives occur when the expression can match either one sub-expression or + another, each alternative is separated by a "|", or a "\|" if the flag + regex_constants::bk_vbar is set, or by a newline character if the flag + regex_constants::newline_alt is set. Each alternative is the largest possible + previous sub-expression; this is the opposite behavior from repetition + operators. +
+Examples: +
+"a(b|c)" could match "ab" or "ac". +
+"abc|def" could match "abc" or "def". +
+A set is a set of characters that can match any single character that is a + member of the set. Sets are delimited by "[" and "]" and can contain literals, + character ranges, character classes, collating elements and equivalence + classes. Set declarations that start with "^" contain the compliment of the + elements that follow. +
+Examples: +
+Character literals: +
+"[abc]" will match either of "a", "b", or "c". +
+"[^abc] will match any character other than "a", "b", or "c". +
+Character ranges: +
+"[a-z]" will match any character in the range "a" to "z". +
+"[^A-Z]" will match any character other than those in the range "A" to "Z". +
+Note that character ranges are highly locale dependent if the flag + regex_constants::collate is set: they match any character that collates between + the endpoints of the range, ranges will only behave according to ASCII rules + when the default "C" locale is in effect. For example if the library is + compiled with the Win32 localization model, then [a-z] will match the ASCII + characters a-z, and also 'A', 'B' etc, but not 'Z' which collates just after + 'z'. This locale specific behavior is disabled by default (in perl mode), and + forces ranges to collate according to ASCII character code. +
+Character classes are denoted using the syntax "[:classname:]" within a set
+ declaration, for example "[[:space:]]" is the set of all whitespace characters.
+ Character classes are only available if the flag regex_constants::char_classes
+ is set. The available character classes are:
+
+
+
+
| + | alnum | +Any alpha numeric character. | ++ |
| + | alpha | +Any alphabetical character a-z and A-Z. Other + characters may also be included depending upon the locale. | ++ |
| + | blank | +Any blank character, either a space or a tab. | ++ |
| + | cntrl | +Any control character. | ++ |
| + | digit | +Any digit 0-9. | ++ |
| + | graph | +Any graphical character. | ++ |
| + | lower | +Any lower case character a-z. Other characters may + also be included depending upon the locale. | ++ |
| + | Any printable character. | ++ | |
| + | punct | +Any punctuation character. | ++ |
| + | space | +Any whitespace character. | ++ |
| + | upper | +Any upper case character A-Z. Other characters may + also be included depending upon the locale. | ++ |
| + | xdigit | +Any hexadecimal digit character, 0-9, a-f and A-F. | ++ |
| + | word | +Any word character - all alphanumeric characters plus + the underscore. | ++ |
| + | Unicode | +Any character whose code is greater than 255, this + applies to the wide character traits classes only. | ++ |
There are some shortcuts that can be used in place of the character classes, + provided the flag regex_constants::escape_in_lists is set then you can use: +
+\w in place of [:word:] +
+\s in place of [:space:] +
+\d in place of [:digit:] +
+\l in place of [:lower:] +
+\u in place of [:upper:] +
+Collating elements take the general form [.tagname.] inside a set declaration, + where tagname is either a single character, or a name of a collating + element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is + equivalent to [,]. The library supports all the standard POSIX collating + element names, and in addition the following digraphs: "ae", "ch", "ll", "ss", + "nj", "dz", "lj", each in lower, upper and title case variations. + Multi-character collating elements can result in the set matching more than one + character, for example [[.ae.]] would match two characters, but note that + [^[.ae.]] would only match one character. +
++ Equivalence classes take the general form[=tagname=] inside a set declaration, + where tagname is either a single character, or a name of a collating + element, and matches any character that is a member of the same primary + equivalence class as the collating element [.tagname.]. An equivalence class is + a set of characters that collate the same, a primary equivalence class is a set + of characters whose primary sort key are all the same (for example strings are + typically collated by character, then by accent, and then by case; the primary + sort key then relates to the character, the secondary to the accentation, and + the tertiary to the case). If there is no equivalence class corresponding to tagname + , then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no + locale independent method of obtaining the primary sort key for a character, + except under Win32. For other operating systems the library will "guess" the + primary sort key from the full sort key (obtained from strxfrm), so + equivalence classes are probably best considered broken under any operating + system other than Win32. +
+To include a literal "-" in a set declaration then: make it the first character + after the opening "[" or "[^", the endpoint of a range, a collating element, or + if the flag regex_constants::escape_in_lists is set then precede with an escape + character as in "[\-]". To include a literal "[" or "]" or "^" in a set then + make them the endpoint of a range, a collating element, or precede with an + escape character if the flag regex_constants::escape_in_lists is set. +
+An anchor is something that matches the null string at the start or end of a + line: "^" matches the null string at the start of a line, "$" matches the null + string at the end of a line. +
+A back reference is a reference to a previous sub-expression that has already + been matched, the reference is to what the sub-expression matched, not to the + expression itself. A back reference consists of the escape character "\" + followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2" + to the second etc. For example the expression "(.*)\1" matches any string that + is repeated about its mid-point for example "abcabc" or "xyzxyz". A back + reference to a sub-expression that did not participate in any match, matches + the null string: NB this is different to some other regular expression + matchers. Back references are only available if the expression is compiled with + the flag regex_constants::bk_refs set. +
+This is an extension to the algorithm that is not available in other libraries, + it consists of the escape character followed by the digit "0" followed by the + octal character code. For example "\023" represents the character whose octal + code is 23. Where ambiguity could occur use parentheses to break the expression + up: "\0103" represents the character whose code is 103, "(\010)3 represents the + character 10 followed by "3". To match characters by their hexadecimal code, + use \x followed by a string of hexadecimal digits, optionally enclosed inside + {}, for example \xf0 or \x{aff}, notice the latter example is a Unicode + character.
+The following operators are provided for compatibility with the GNU regular + expression library. +
+"\w" matches any single character that is a member of the "word" character + class, this is identical to the expression "[[:word:]]". +
+"\W" matches any single character that is not a member of the "word" character + class, this is identical to the expression "[^[:word:]]". +
+"\<" matches the null string at the start of a word. +
+"\>" matches the null string at the end of the word. +
+"\b" matches the null string at either the start or the end of a word. +
+"\B" matches a null string within a word. +
+The start of the sequence passed to the matching algorithms is considered to be + a potential start of a word unless the flag match_not_bow is set. The end of + the sequence passed to the matching algorithms is considered to be a potential + end of a word unless the flag match_not_eow is set. +
+The following operators are provided for compatibility with the GNU regular + expression library, and Perl regular expressions: +
+"\`" matches the start of a buffer. +
+"\A" matches the start of the buffer. +
+"\'" matches the end of a buffer. +
+"\z" matches the end of a buffer. +
+"\Z" matches the end of a buffer, or possibly one or more new line characters + followed by the end of the buffer. +
+A buffer is considered to consist of the whole sequence passed to the matching + algorithms, unless the flags match_not_bob or match_not_eob are set. +
+The escape character "\" has several meanings. +
+Inside a set declaration the escape character is a normal character unless the + flag regex_constants::escape_in_lists is set in which case whatever follows the + escape is a literal character regardless of its normal meaning. +
+The escape operator may introduce an operator for example: back references, or + a word operator. +
+The escape operator may make the following character normal, for example "\*" + represents a literal "*" rather than the repeat operator. +
+The following escape sequences are aliases for single characters:
+
+
+
+
| + | Escape sequence + | +Character code + | +Meaning + | ++ |
| + | \a + | +0x07 + | +Bell character. + | ++ |
| + | \f + | +0x0C + | +Form feed. + | ++ |
| + | \n + | +0x0A + | +Newline character. + | ++ |
| + | \r + | +0x0D + | +Carriage return. + | ++ |
| + | \t + | +0x09 + | +Tab character. + | ++ |
| + | \v + | +0x0B + | +Vertical tab. + | ++ |
| + | \e + | +0x1B + | +ASCII Escape character. + | ++ |
| + | \0dd + | +0dd + | +An octal character code, where dd is one or + more octal digits. + | ++ |
| + | \xXX + | +0xXX + | +A hexadecimal character code, where XX is one or more + hexadecimal digits. + | ++ |
| + | \x{XX} + | +0xXX + | +A hexadecimal character code, where XX is one or more + hexadecimal digits, optionally a Unicode character. + | ++ |
| + | \cZ + | +z-@ + | +An ASCII escape sequence control-Z, where Z is any + ASCII character greater than or equal to the character code for '@'. + | ++ |
The following are provided mostly for perl compatibility, but note that there
+ are some differences in the meanings of \l \L \u and \U:
+
+
+
+
| + | \w + | +Equivalent to [[:word:]]. + | ++ |
| + | \W + | +Equivalent to [^[:word:]]. + | ++ |
| + | \s + | +Equivalent to [[:space:]]. + | ++ |
| + | \S + | +Equivalent to [^[:space:]]. + | ++ |
| + | \d + | +Equivalent to [[:digit:]]. + | ++ |
| + | \D + | +Equivalent to [^[:digit:]]. + | ++ |
| + | \l + | +Equivalent to [[:lower:]]. + | ++ |
| + | \L + | +Equivalent to [^[:lower:]]. + | ++ |
| + | \u + | +Equivalent to [[:upper:]]. + | ++ |
| + | \U + | +Equivalent to [^[:upper:]]. + | ++ |
| + | \C + | +Any single character, equivalent to '.'. + | ++ |
| + | \X + | +Match any Unicode combining character sequence, for + example "a\x 0301" (a letter a with an acute). + | ++ |
| + | \Q + | +The begin quote operator, everything that follows is + treated as a literal character until a \E end quote operator is found. + | ++ |
| + | \E + | +The end quote operator, terminates a sequence begun + with \Q. + | ++ |
+ When the expression is compiled as a Perl-compatible regex then the matching + algorithms will perform a depth first search on the state machine and report + the first match found.
++ When the expression is compiled as a POSIX-compatible regex then the matching + algorithms will match the first possible matching string, if more than one + string starting at a given location can match then it matches the longest + possible string, unless the flag match_any is set, in which case the first + match encountered is returned. Use of the match_any option can reduce the time + taken to find the match - but is only useful if the user is less concerned + about what matched - for example it would not be suitable for search and + replace operations. In cases where their are multiple possible matches all + starting at the same location, and all of the same length, then the match + chosen is the one with the longest first sub-expression, if that is the same + for two or more matches, then the second sub-expression will be examined and so + on. +
+ The following table examples illustrate the main differences between Perl and + POSIX regular expression matching rules: +
++
|
+ Expression + |
+
+ Text + |
+
+ POSIX leftmost longest match + |
+
+ ECMAScript depth first search match + |
+
|
+
|
+
+
|
+
+
|
+
+
|
+
|
+
|
+
+
|
+
+ $0 = " abc def xyz " |
+
+ $0 = " abc def xyz " |
+
|
+
|
+
+
|
+
+
|
+
+
|
+
These differences between Perl matching rules, and POSIX matching rules, mean + that these two regular expression syntaxes differ not only in the features + offered, but also in the form that the state machine takes and/or the + algorithms used to traverse the state machine.
+Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + + diff --git a/doc/Attic/syntax_option_type.html b/doc/Attic/syntax_option_type.html new file mode 100644 index 00000000..532d6386 --- /dev/null +++ b/doc/Attic/syntax_option_type.html @@ -0,0 +1,332 @@ + + + ++
|
+ |
+
+ Boost.Regex+syntax_option_type+ |
+
+ |
+
Type syntax_option type is an implementation defined bitmask type that controls + how a regular expression string is to be interpreted. For convenience + note that all the constants listed here, are also duplicated within the scope + of class template basic_regex.
+namespace std{ namespace regex_constants{
+
+typedef bitmask_type syntax_option_type;
+// these flags are standardized:
+static const syntax_option_type normal;
+static const syntax_option_type icase;
+static const syntax_option_type nosubs;
+static const syntax_option_type optimize;
+static const syntax_option_type collate;
+static const syntax_option_type ECMAScript = normal;
+static const syntax_option_type JavaScript = normal;
+static const syntax_option_type JScript = normal;
+static const syntax_option_type basic;
+static const syntax_option_type extended;
+static const syntax_option_type awk;
+static const syntax_option_type grep;
+static const syntax_option_type egrep;
+static const syntax_option_type sed = basic;
+static const syntax_option_type perl;
// these are boost.regex specific:
static const syntax_option_type escape_in_lists;
static const syntax_option_type char_classes;
static const syntax_option_type intervals;
static const syntax_option_type limited_ops;
static const syntax_option_type newline_alt;
static const syntax_option_type bk_plus_qm;
static const syntax_option_type bk_braces;
static const syntax_option_type bk_parens;
static const syntax_option_type bk_refs;
static const syntax_option_type bk_vbar;
static const syntax_option_type use_except;
static const syntax_option_type failbit;
static const syntax_option_type literal;
static const syntax_option_type nocollate;
static const syntax_option_type perlex;
static const syntax_option_type emacs;
+} // namespace regex_constants
+} // namespace std
+ The type syntax_option_type is an implementation defined bitmask
+ type (17.3.2.1.2). Setting its elements has the effects listed in the table
+ below, a valid value of type syntax_option_type will always have
+ exactly one of the elements normal, basic, extended, awk, grep, egrep, sed
+ or perl set.
Note that for convenience all the constants listed here are duplicated within + the scope of class template basic_regex, so you can use any of:
+boost::regex_constants::constant_name+
or
+boost::regex::constant_name+
or
+boost::wregex::constant_name+
in an interchangeable manner.
++
|
+ Element + |
+
+ Effect if set + |
+
|
+ normal + |
+
+ Specifies that the grammar recognized by the regular expression engine uses its + normal semantics: that is the same as that given in the ECMA-262, ECMAScript + Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects + (FWD.1). +boost.regex also recognizes most perl-compatible extensions in this mode. + |
+
|
+ icase + |
+
+ Specifies that matching of regular expressions against a character container + sequence shall be performed without regard to case. + |
+
|
+ nosubs + |
+
+ Specifies that when a regular expression is matched against a character + container sequence, then no sub-expression matches are to be stored in the + supplied match_results structure. + |
+
|
+ optimize + |
+
+ Specifies that the regular expression engine should pay more attention to the + speed with which regular expressions are matched, and less to the speed with + which regular expression objects are constructed. Otherwise it has no + detectable effect on the program output. This currently has no effect for + boost.regex. + |
+
|
+ collate + |
+
+ Specifies that character ranges of the form "[a-b]" should be locale sensitive. + |
+
|
+ ECMAScript + |
+
+ The same as normal. + |
+
|
+ JavaScript + |
+
+ The same as normal. + |
+
|
+ JScript + |
+
+ The same as normal. + |
+
|
+ basic + |
+
+ Specifies that the grammar recognized by the regular expression engine is the + same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001, + Portable Operating System Interface (POSIX ), Base Definitions and Headers, + Section 9, Regular Expressions (FWD.1). + + |
+
|
+ extended + |
+
+ Specifies that the grammar recognized by the regular expression engine is the + same as that used by POSIX extended regular expressions in IEEE Std + 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and + Headers, Section 9, Regular Expressions (FWD.1). + |
+
|
+ awk + |
+
+ Specifies that the grammar recognized by the regular expression engine is the + same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable + Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk + (FWD.1). +That is to say: the same as POSIX extended syntax, but with escape sequences in + character classes permitted. + |
+
|
+ grep + |
+
+ Specifies that the grammar recognized by the regular expression engine is the + same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable + Operating System Interface (POSIX ), Shells and Utilities, Section 4, + Utilities, grep (FWD.1). +That is to say, the same as POSIX basic syntax, but with the newline character + acting as an alternation character in addition to "|". + |
+
|
+ egrep + |
+
+ Specifies that the grammar recognized by the regular expression engine is the + same as that used by POSIX utility grep when given the -E option in IEEE Std + 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and + Utilities, Section 4, Utilities, grep (FWD.1). +That is to say, the same as POSIX extended syntax, but with the newline + character acting as an alternation character in addition to "|". + |
+
|
+ sed + |
+
+ The same as basic. + |
+
|
+ perl + |
+
+ The same as normal. + |
+
The following constants are specific to this particular regular expression + implementation and do not appear in the + regular expression standardization proposal:
++
| regbase::escape_in_lists | +Allows the use of the escape "\" character in sets of + characters, for example [\]] represents the set of characters containing only + "]". If this flag is not set then "\" is an ordinary character inside sets. | +
| regbase::char_classes | +When this bit is set, character classes [:classname:] + are allowed inside character set declarations, for example "[[:word:]]" + represents the set of all characters that belong to the character class "word". | +
| regbase:: intervals | +When this bit is set, repetition intervals are + allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter + a's. | +
| regbase:: limited_ops | +When this bit is set all of "+", "?" and "|" are + ordinary characters in all situations. | +
| regbase:: newline_alt | +When this bit is set, then the newline character "\n" + has the same effect as the alternation operator "|". | +
| regbase:: bk_plus_qm | +When this bit is set then "\+" represents the one or + more repetition operator and "\?" represents the zero or one repetition + operator. When this bit is not set then "+" and "?" are used instead. | +
| regbase:: bk_braces | +When this bit is set then "\{" and "\}" are used for + bounded repetitions and "{" and "}" are normal characters. This is the opposite + of default behavior. | +
| regbase:: bk_parens | +When this bit is set then "\(" and "\)" are used to + group sub-expressions and "(" and ")" are ordinary characters, this is the + opposite of default behavior. | +
| regbase:: bk_refs | +When this bit is set then back references are + allowed. | +
| regbase:: bk_vbar | +When this bit is set then "\|" represents the + alternation operator and "|" is an ordinary character. This is the opposite of + default behavior. | +
| regbase:: use_except | +When this bit is set then a bad_expression + exception will be thrown on error. Use of this flag is deprecated - + basic_regex will always throw on error. | +
| regbase:: failbit | +This bit is set on error, if regbase::use_except is + not set, then this bit should be checked to see if a regular expression is + valid before usage. | +
| regbase::literal | +All characters in the string are treated as literals, + there are no special characters or escape sequences. | +
| regbase::emacs | +Provides compatability with the emacs + editor, eqivalent to: bk_braces | bk_parens | bk_refs | bk_vbar. | +
Revised + + 17 May 2003 +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + + diff --git a/doc/Attic/thread_safety.html b/doc/Attic/thread_safety.html new file mode 100644 index 00000000..eeda681d --- /dev/null +++ b/doc/Attic/thread_safety.html @@ -0,0 +1,68 @@ + + + ++
|
+ |
+
+ Boost.Regex+Thread Safety+ |
+
+ |
+
Class basic_regex<> and its typedefs regex + and wregex are thread safe, in that compiled regular expressions can safely be + shared between threads. The matching algorithms regex_match, + regex_search, regex_grep, + regex_format and regex_merge + are all re-entrant and thread safe. Class match_results + is now thread safe, in that the results of a match can be safely copied from + one thread to another (for example one thread may find matches and push + match_results instances onto a queue, while another thread pops them off the + other end), otherwise use a separate instance of match_results + per thread. +
+The POSIX API functions are all re-entrant and + thread safe, regular expressions compiled with regcomp can also be + shared between threads. +
+The class RegEx is only thread safe if each thread + gets its own RegEx instance (apartment threading) - this is a consequence of + RegEx handling both compiling and matching regular expressions. +
+Finally note that changing the global locale invalidates all compiled regular + expressions, therefore calling set_locale from one thread while another + uses regular expressions will produce unpredictable results. +
++ There is also a requirement that there is only one thread executing prior to + the start of main().
+Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/Attic/uarrow.gif b/doc/Attic/uarrow.gif new file mode 100644 index 0000000000000000000000000000000000000000..6afd20c3857127c21fc9bcd52ec347e32c21578c GIT binary patch literal 1666 zcmZ?wbhEHb)Me0S_|CwP5S3S3-SznXe?>*j_wT+
|
+ |
+
+ Boost.Regex+Standards Conformance+ |
+
+ |
+
Boost.regex is intended to conform to the + regular expression standardization proposal, which will appear in a + future C++ standard technical report (and hopefully in a future version of the + standard). Currently there are some differences in how the regular + expression traits classes are defined, these will be fixed in a future release.
+All of the ECMAScript regular expression syntax features are supported, except + that:
+Negated class escapes (\S, \D and \W) are not permitted inside character class + definitions ( [...] ).
+The escape sequence \u matches any upper case character (the same as + [[:upper:]]) rather than a Unicode escape sequence; use \x{DDDD} for + Unicode escape sequences.
+Almost all Perl features are supported, except for:
+\N{name} Use [[:name:]] instead.
+\pP and \PP
+(?imsx-imsx)
+(?<=pattern)
+(?<!pattern)
+(?{code})
+(??{code})
+(?(condition)yes-pattern) and (?(condition)yes-pattern|no-pattern)
+These embarrassments / limitations will be removed in due course, mainly + dependent upon user demand.
+All the POSIX basic and extended regular expression features are supported, + except that:
+No character collating names are recognized except those specified in the POSIX + standard for the C locale, unless they are explicitly registered with the + traits class.
+Character equivalence classes ( [[=a=]] etc) are probably buggy except on + Win32. Implementing this feature requires knowledge of the format of the + string sort keys produced by the system; if you need this, and the default + implementation doesn't work on your platform, then you will need to supply a + custom traits class.
+Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + + diff --git a/doc/sub_match.html b/doc/sub_match.html new file mode 100644 index 00000000..db995312 --- /dev/null +++ b/doc/sub_match.html @@ -0,0 +1,426 @@ + + + ++
|
+ |
+
+ Boost.Regex+sub_match+ |
+
+ |
+
#include <boost/regex.hpp> +
+Regular expressions are different from many simple pattern-matching algorithms + in that as well as finding an overall match they can also produce + sub-expression matches: each sub-expression being delimited in the pattern by a + pair of parenthesis (...). There has to be some method for reporting + sub-expression matches back to the user: this is achieved this by defining a + class match_results that acts as an + indexed collection of sub-expression matches, each sub-expression match being + contained in an object of type sub_match + . +
Objects of type sub_match may only obtained by subscripting an object + of type match_results + . +
When the marked sub-expression denoted by an object of type sub_match<>
+ participated in a regular expression match then member matched evaluates
+ to true, and members first and second denote the
+ range of characters [first,second) which formed that match.
+ Otherwise matched is false, and members first and second
+ contained undefined values.
If an object of type sub_match<> represents sub-expression 0
+ - that is to say the whole match - then member matched is always
+ true, unless a partial match was obtained as a result of the flag match_partial
+ being passed to a regular expression algorithm, in which case member matched
+ is false, and members first and second represent the
+ character range that formed the partial match.
+namespace boost{
+
+template <class BidirectionalIterator>
+class sub_match : public std::pair<BidirectionalIterator, BidirectionalIterator>
+{
+public:
+ typedef typename iterator_traits<BidirectionalIterator>::value_type value_type;
+ typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type;
+ typedef BidirectionalIterator iterator;
+
+ bool matched;
+
+ difference_type length()const;
+ operator basic_string<value_type>()const;
+ basic_string<value_type> str()const;
+
+ int compare(const sub_match& s)const;
+ int compare(const basic_string<value_type>& s)const;
+ int compare(const value_type* s)const;
+};
+
+template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator == (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator != (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator < (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator > (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator >= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator <= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+
+template <class BidirectionalIterator>
+bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+template <class BidirectionalIterator>
+bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+template <class charT, class traits, class BidirectionalIterator>
+basic_ostream<charT, traits>&
+ operator << (basic_ostream<charT, traits>& os,
+ const sub_match<BidirectionalIterator>& m);
+
+} // namespace boost
+ typedef typename std::iterator_traits<iterator>::value_type value_type;+
The type pointed to by the iterators.
+typedef typename std::iterator_traits<iterator>::difference_type difference_type;+
A type that represents the difference between two iterators.
+typedef iterator iterator_type;+
The iterator type.
+iterator first+
An iterator denoting the position of the start of the match.
+iterator second+
An iterator denoting the position of the end of the match.
+bool matched+
A Boolean value denoting whether this sub-expression participated in the match.
+static difference_type length();+ +
+ Effects: returns (matched ? 0 : distance(first, second)).
operator basic_string<value_type>()const;+ +
+ Effects: returns (matched ? basic_string<value_type>(first,
+ second) : basic_string<value_type>()).
basic_string<value_type> str()const;+ +
+ Effects: returns (matched ? basic_string<value_type>(first,
+ second) : basic_string<value_type>()).
int compare(const sub_match& s)const;+ +
+ Effects: returns str().compare(s.str()).
int compare(const basic_string<value_type>& s)const;+ +
+ Effects: returns str().compare(s).
int compare(const value_type* s)const;+ +
+ Effects: returns str().compare(s).
template <class BidirectionalIterator> +bool operator == (const sub_match<BidirectionalIterator>& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs.compare(rhs) == 0.
template <class BidirectionalIterator> +bool operator != (const sub_match<BidirectionalIterator>& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs.compare(rhs) != 0.
template <class BidirectionalIterator> +bool operator < (const sub_match<BidirectionalIterator>& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs.compare(rhs) < 0.
template <class BidirectionalIterator> +bool operator <= (const sub_match<BidirectionalIterator>& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs.compare(rhs) <= 0.
template <class BidirectionalIterator> +bool operator >= (const sub_match<BidirectionalIterator>& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs.compare(rhs) >= 0.
template <class BidirectionalIterator> +bool operator > (const sub_match<BidirectionalIterator>& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs.compare(rhs) > 0.
template <class BidirectionalIterator> +bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs == rhs.str().
template <class BidirectionalIterator> +bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs != rhs.str().
template <class BidirectionalIterator> +bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs < rhs.str().
template <class BidirectionalIterator> +bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs > rhs.str().
template <class BidirectionalIterator> +bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs >= rhs.str().
template <class BidirectionalIterator> +bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs <= rhs.str().
template <class BidirectionalIterator> +bool operator == (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const* rhs);+ +
+ Effects: returns lhs.str() == rhs.
template <class BidirectionalIterator> +bool operator != (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const* rhs);+ +
+ Effects: returns lhs.str() != rhs.
template <class BidirectionalIterator> +bool operator < (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const* rhs);+ +
+ Effects: returns lhs.str() < rhs.
template <class BidirectionalIterator> +bool operator > (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const* rhs);+ +
+ Effects: returns lhs.str() > rhs.
template <class BidirectionalIterator> +bool operator >= (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const* rhs);+ +
+ Effects: returns lhs.str() >= rhs.
template <class BidirectionalIterator> +bool operator <= (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const* rhs);+ +
+ Effects: returns lhs.str() <= rhs.
template <class BidirectionalIterator> +bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs == rhs.str().
template <class BidirectionalIterator> +bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs != rhs.str().
template <class BidirectionalIterator> +bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs < rhs.str().
template <class BidirectionalIterator> +bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs > rhs.str().
template <class BidirectionalIterator> +bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs >= rhs.str().
template <class BidirectionalIterator> +bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, + const sub_match<BidirectionalIterator>& rhs);+ +
+ Effects: returns lhs <= rhs.str().
template <class BidirectionalIterator> +bool operator == (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const& rhs);+ +
+ Effects: returns lhs.str() == rhs.
template <class BidirectionalIterator> +bool operator != (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const& rhs);+ +
+ Effects: returns lhs.str() != rhs.
template <class BidirectionalIterator> +bool operator < (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const& rhs);+ +
+ Effects: returns lhs.str() < rhs.
template <class BidirectionalIterator> +bool operator > (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const& rhs);+ +
+ Effects: returns lhs.str() > rhs.
template <class BidirectionalIterator> +bool operator >= (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const& rhs);+ +
+ Effects: returns lhs.str() >= rhs.
template <class BidirectionalIterator> +bool operator <= (const sub_match<BidirectionalIterator>& lhs, + typename iterator_traits<BidirectionalIterator>::value_type const& rhs);+ +
+ Effects: returns lhs.str() <= rhs.
template <class charT, class traits, class BidirectionalIterator> +basic_ostream<charT, traits>& + operator << (basic_ostream<charT, traits>& os + const sub_match<BidirectionalIterator>& m);+ +
+ Effects: returns (os << m.str()).
+
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + + diff --git a/doc/syntax.html b/doc/syntax.html new file mode 100644 index 00000000..f776cd3c --- /dev/null +++ b/doc/syntax.html @@ -0,0 +1,773 @@ + + + ++
|
+ |
+
+ Boost.Regex+Regular Expression Syntax+ |
+
+ |
+
This section covers the regular expression syntax used by this library, this is + a programmers guide, the actual syntax presented to your program's users will + depend upon the flags used during expression compilation. +
+All characters are literals except: ".", "|", "*", "?", "+", "(", ")", "{", + "}", "[", "]", "^", "$" and "\". These characters are literals when preceded by + a "\". A literal is a character that matches itself, or matches the result of + traits_type::translate(), where traits_type is the traits template parameter to + class basic_regex.
+The dot character "." matches any single character except : when match_not_dot_null + is passed to the matching algorithms, the dot does not match a null character; + when match_not_dot_newline is passed to the matching algorithms, then + the dot does not match a newline character. +
+A repeat is an expression that is repeated an arbitrary number of times. An + expression followed by "*" can be repeated any number of times including zero. + An expression followed by "+" can be repeated any number of times, but at least + once, if the expression is compiled with the flag regex_constants::bk_plus_qm + then "+" is an ordinary character and "\+" represents a repeat of once or more. + An expression followed by "?" may be repeated zero or one times only, if the + expression is compiled with the flag regex_constants::bk_plus_qm then "?" is an + ordinary character and "\?" represents the repeat zero or once operator. When + it is necessary to specify the minimum and maximum number of repeats + explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a" + repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2 + and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with + no upper limit. Note that there must be no white-space inside the {}, and there + is no upper limit on the values of the lower and upper bounds. When the + expression is compiled with the flag regex_constants::bk_braces then "{" and + "}" are ordinary characters and "\{" and "\}" are used to delimit bounds + instead. All repeat expressions refer to the shortest possible previous + sub-expression: a single character; a character set, or a sub-expression + grouped with "()" for example. +
+Examples: +
+"ba*" will match all of "b", "ba", "baaa" etc. +
+"ba+" will match "ba" or "baaaa" for example but not "b". +
+"ba?" will match "b" or "ba". +
+"ba{2,4}" will match "baa", "baaa" and "baaaa". +
+Whenever the "extended" regular expression syntax is in use (the default) then + non-greedy repeats are possible by appending a '?' after the repeat; a + non-greedy repeat is one which will match the shortest possible string. +
+For example to match html tag pairs one could use something like: +
+"<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>" +
+In this case $1 will contain the text between the tag pairs, and will be the + shortest possible matching string. +
+Parentheses serve two purposes, to group items together into a sub-expression, + and to mark what generated the match. For example the expression "(ab)*" would + match all of the string "ababab". The matching algorithms + regex_match and regex_search + each take an instance of match_results + that reports what caused the match, on exit from these functions the + match_results contains information both on what the whole expression + matched and on what each sub-expression matched. In the example above + match_results[1] would contain a pair of iterators denoting the final "ab" of + the matching string. It is permissible for sub-expressions to match null + strings. If a sub-expression takes no part in a match - for example if it is + part of an alternative that is not taken - then both of the iterators that are + returned for that sub-expression point to the end of the input string, and the matched + parameter for that sub-expression is false. Sub-expressions are indexed + from left to right starting from 1, sub-expression 0 is the whole expression. +
+Sometimes you need to group sub-expressions with parenthesis, but don't want + the parenthesis to spit out another marked sub-expression, in this case a + non-marking parenthesis (?:expression) can be used. For example the following + expression creates no sub-expressions: +
+"(?:abc)*"
+There are two forms of these; one for positive forward lookahead asserts, and + one for negative lookahead asserts:
+"(?=abc)" matches zero characters only if they are followed by the expression + "abc".
+"(?!abc)" matches zero characters only if they are not followed by the + expression "abc".
+"(?>expression)" matches "expression" as an independent atom (the algorithm + will not backtrack into it if a failure occurs later in the expression).
+Alternatives occur when the expression can match either one sub-expression or + another, each alternative is separated by a "|", or a "\|" if the flag + regex_constants::bk_vbar is set, or by a newline character if the flag + regex_constants::newline_alt is set. Each alternative is the largest possible + previous sub-expression; this is the opposite behavior from repetition + operators. +
+Examples: +
+"a(b|c)" could match "ab" or "ac". +
+"abc|def" could match "abc" or "def". +
+A set is a set of characters that can match any single character that is a + member of the set. Sets are delimited by "[" and "]" and can contain literals, + character ranges, character classes, collating elements and equivalence + classes. Set declarations that start with "^" contain the compliment of the + elements that follow. +
+Examples: +
+Character literals: +
+"[abc]" will match either of "a", "b", or "c". +
+"[^abc] will match any character other than "a", "b", or "c". +
+Character ranges: +
+"[a-z]" will match any character in the range "a" to "z". +
+"[^A-Z]" will match any character other than those in the range "A" to "Z". +
+Note that character ranges are highly locale dependent if the flag + regex_constants::collate is set: they match any character that collates between + the endpoints of the range, ranges will only behave according to ASCII rules + when the default "C" locale is in effect. For example if the library is + compiled with the Win32 localization model, then [a-z] will match the ASCII + characters a-z, and also 'A', 'B' etc, but not 'Z' which collates just after + 'z'. This locale specific behavior is disabled by default (in perl mode), and + forces ranges to collate according to ASCII character code. +
+Character classes are denoted using the syntax "[:classname:]" within a set
+ declaration, for example "[[:space:]]" is the set of all whitespace characters.
+ Character classes are only available if the flag regex_constants::char_classes
+ is set. The available character classes are:
+
+
+
+
| + | alnum | +Any alpha numeric character. | ++ |
| + | alpha | +Any alphabetical character a-z and A-Z. Other + characters may also be included depending upon the locale. | ++ |
| + | blank | +Any blank character, either a space or a tab. | ++ |
| + | cntrl | +Any control character. | ++ |
| + | digit | +Any digit 0-9. | ++ |
| + | graph | +Any graphical character. | ++ |
| + | lower | +Any lower case character a-z. Other characters may + also be included depending upon the locale. | ++ |
| + | Any printable character. | ++ | |
| + | punct | +Any punctuation character. | ++ |
| + | space | +Any whitespace character. | ++ |
| + | upper | +Any upper case character A-Z. Other characters may + also be included depending upon the locale. | ++ |
| + | xdigit | +Any hexadecimal digit character, 0-9, a-f and A-F. | ++ |
| + | word | +Any word character - all alphanumeric characters plus + the underscore. | ++ |
| + | Unicode | +Any character whose code is greater than 255, this + applies to the wide character traits classes only. | ++ |
There are some shortcuts that can be used in place of the character classes, + provided the flag regex_constants::escape_in_lists is set then you can use: +
+\w in place of [:word:] +
+\s in place of [:space:] +
+\d in place of [:digit:] +
+\l in place of [:lower:] +
+\u in place of [:upper:] +
+Collating elements take the general form [.tagname.] inside a set declaration, + where tagname is either a single character, or a name of a collating + element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is + equivalent to [,]. The library supports all the standard POSIX collating + element names, and in addition the following digraphs: "ae", "ch", "ll", "ss", + "nj", "dz", "lj", each in lower, upper and title case variations. + Multi-character collating elements can result in the set matching more than one + character, for example [[.ae.]] would match two characters, but note that + [^[.ae.]] would only match one character. +
++ Equivalence classes take the general form[=tagname=] inside a set declaration, + where tagname is either a single character, or a name of a collating + element, and matches any character that is a member of the same primary + equivalence class as the collating element [.tagname.]. An equivalence class is + a set of characters that collate the same, a primary equivalence class is a set + of characters whose primary sort key are all the same (for example strings are + typically collated by character, then by accent, and then by case; the primary + sort key then relates to the character, the secondary to the accentation, and + the tertiary to the case). If there is no equivalence class corresponding to tagname + , then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no + locale independent method of obtaining the primary sort key for a character, + except under Win32. For other operating systems the library will "guess" the + primary sort key from the full sort key (obtained from strxfrm), so + equivalence classes are probably best considered broken under any operating + system other than Win32. +
+To include a literal "-" in a set declaration then: make it the first character + after the opening "[" or "[^", the endpoint of a range, a collating element, or + if the flag regex_constants::escape_in_lists is set then precede with an escape + character as in "[\-]". To include a literal "[" or "]" or "^" in a set then + make them the endpoint of a range, a collating element, or precede with an + escape character if the flag regex_constants::escape_in_lists is set. +
+An anchor is something that matches the null string at the start or end of a + line: "^" matches the null string at the start of a line, "$" matches the null + string at the end of a line. +
+A back reference is a reference to a previous sub-expression that has already + been matched, the reference is to what the sub-expression matched, not to the + expression itself. A back reference consists of the escape character "\" + followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2" + to the second etc. For example the expression "(.*)\1" matches any string that + is repeated about its mid-point for example "abcabc" or "xyzxyz". A back + reference to a sub-expression that did not participate in any match, matches + the null string: NB this is different to some other regular expression + matchers. Back references are only available if the expression is compiled with + the flag regex_constants::bk_refs set. +
+This is an extension to the algorithm that is not available in other libraries, + it consists of the escape character followed by the digit "0" followed by the + octal character code. For example "\023" represents the character whose octal + code is 23. Where ambiguity could occur use parentheses to break the expression + up: "\0103" represents the character whose code is 103, "(\010)3 represents the + character 10 followed by "3". To match characters by their hexadecimal code, + use \x followed by a string of hexadecimal digits, optionally enclosed inside + {}, for example \xf0 or \x{aff}, notice the latter example is a Unicode + character.
+The following operators are provided for compatibility with the GNU regular + expression library. +
+"\w" matches any single character that is a member of the "word" character + class, this is identical to the expression "[[:word:]]". +
+"\W" matches any single character that is not a member of the "word" character + class, this is identical to the expression "[^[:word:]]". +
+"\<" matches the null string at the start of a word. +
+"\>" matches the null string at the end of the word. +
+"\b" matches the null string at either the start or the end of a word. +
+"\B" matches a null string within a word. +
+The start of the sequence passed to the matching algorithms is considered to be + a potential start of a word unless the flag match_not_bow is set. The end of + the sequence passed to the matching algorithms is considered to be a potential + end of a word unless the flag match_not_eow is set. +
+The following operators are provided for compatibility with the GNU regular + expression library, and Perl regular expressions: +
+"\`" matches the start of a buffer. +
+"\A" matches the start of the buffer. +
+"\'" matches the end of a buffer. +
+"\z" matches the end of a buffer. +
+"\Z" matches the end of a buffer, or possibly one or more new line characters + followed by the end of the buffer. +
+A buffer is considered to consist of the whole sequence passed to the matching + algorithms, unless the flags match_not_bob or match_not_eob are set. +
+The escape character "\" has several meanings. +
+Inside a set declaration the escape character is a normal character unless the + flag regex_constants::escape_in_lists is set in which case whatever follows the + escape is a literal character regardless of its normal meaning. +
+The escape operator may introduce an operator for example: back references, or + a word operator. +
+The escape operator may make the following character normal, for example "\*" + represents a literal "*" rather than the repeat operator. +
+The following escape sequences are aliases for single characters:
+
+
+
+
| + | Escape sequence + | +Character code + | +Meaning + | ++ |
| + | \a + | +0x07 + | +Bell character. + | ++ |
| + | \f + | +0x0C + | +Form feed. + | ++ |
| + | \n + | +0x0A + | +Newline character. + | ++ |
| + | \r + | +0x0D + | +Carriage return. + | ++ |
| + | \t + | +0x09 + | +Tab character. + | ++ |
| + | \v + | +0x0B + | +Vertical tab. + | ++ |
| + | \e + | +0x1B + | +ASCII Escape character. + | ++ |
| + | \0dd + | +0dd + | +An octal character code, where dd is one or + more octal digits. + | ++ |
| + | \xXX + | +0xXX + | +A hexadecimal character code, where XX is one or more + hexadecimal digits. + | ++ |
| + | \x{XX} + | +0xXX + | +A hexadecimal character code, where XX is one or more + hexadecimal digits, optionally a Unicode character. + | ++ |
| + | \cZ + | +z-@ + | +An ASCII escape sequence control-Z, where Z is any + ASCII character greater than or equal to the character code for '@'. + | ++ |
The following are provided mostly for perl compatibility, but note that there
+ are some differences in the meanings of \l \L \u and \U:
+
+
+
+
| + | \w + | +Equivalent to [[:word:]]. + | ++ |
| + | \W + | +Equivalent to [^[:word:]]. + | ++ |
| + | \s + | +Equivalent to [[:space:]]. + | ++ |
| + | \S + | +Equivalent to [^[:space:]]. + | ++ |
| + | \d + | +Equivalent to [[:digit:]]. + | ++ |
| + | \D + | +Equivalent to [^[:digit:]]. + | ++ |
| + | \l + | +Equivalent to [[:lower:]]. + | ++ |
| + | \L + | +Equivalent to [^[:lower:]]. + | ++ |
| + | \u + | +Equivalent to [[:upper:]]. + | ++ |
| + | \U + | +Equivalent to [^[:upper:]]. + | ++ |
| + | \C + | +Any single character, equivalent to '.'. + | ++ |
| + | \X + | +Match any Unicode combining character sequence, for + example "a\x 0301" (a letter a with an acute). + | ++ |
| + | \Q + | +The begin quote operator, everything that follows is + treated as a literal character until a \E end quote operator is found. + | ++ |
| + | \E + | +The end quote operator, terminates a sequence begun + with \Q. + | ++ |
+ When the expression is compiled as a Perl-compatible regex then the matching + algorithms will perform a depth first search on the state machine and report + the first match found.
++ When the expression is compiled as a POSIX-compatible regex then the matching + algorithms will match the first possible matching string, if more than one + string starting at a given location can match then it matches the longest + possible string, unless the flag match_any is set, in which case the first + match encountered is returned. Use of the match_any option can reduce the time + taken to find the match - but is only useful if the user is less concerned + about what matched - for example it would not be suitable for search and + replace operations. In cases where their are multiple possible matches all + starting at the same location, and all of the same length, then the match + chosen is the one with the longest first sub-expression, if that is the same + for two or more matches, then the second sub-expression will be examined and so + on. +
+ The following table examples illustrate the main differences between Perl and + POSIX regular expression matching rules: +
++
|
+ Expression + |
+
+ Text + |
+
+ POSIX leftmost longest match + |
+
+ ECMAScript depth first search match + |
+
|
+
|
+
+
|
+
+
|
+
+
|
+
|
+
|
+
+
|
+
+ $0 = " abc def xyz " |
+
+ $0 = " abc def xyz " |
+
|
+
|
+
+
|
+
+
|
+
+
|
+
These differences between Perl matching rules, and POSIX matching rules, mean + that these two regular expression syntaxes differ not only in the features + offered, but also in the form that the state machine takes and/or the + algorithms used to traverse the state machine.
+Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + + diff --git a/doc/syntax_option_type.html b/doc/syntax_option_type.html new file mode 100644 index 00000000..532d6386 --- /dev/null +++ b/doc/syntax_option_type.html @@ -0,0 +1,332 @@ + + + ++
|
+ |
+
+ Boost.Regex+syntax_option_type+ |
+
+ |
+
Type syntax_option type is an implementation defined bitmask type that controls + how a regular expression string is to be interpreted. For convenience + note that all the constants listed here, are also duplicated within the scope + of class template basic_regex.
+namespace std{ namespace regex_constants{
+
+typedef bitmask_type syntax_option_type;
+// these flags are standardized:
+static const syntax_option_type normal;
+static const syntax_option_type icase;
+static const syntax_option_type nosubs;
+static const syntax_option_type optimize;
+static const syntax_option_type collate;
+static const syntax_option_type ECMAScript = normal;
+static const syntax_option_type JavaScript = normal;
+static const syntax_option_type JScript = normal;
+static const syntax_option_type basic;
+static const syntax_option_type extended;
+static const syntax_option_type awk;
+static const syntax_option_type grep;
+static const syntax_option_type egrep;
+static const syntax_option_type sed = basic;
+static const syntax_option_type perl;
// these are boost.regex specific:
static const syntax_option_type escape_in_lists;
static const syntax_option_type char_classes;
static const syntax_option_type intervals;
static const syntax_option_type limited_ops;
static const syntax_option_type newline_alt;
static const syntax_option_type bk_plus_qm;
static const syntax_option_type bk_braces;
static const syntax_option_type bk_parens;
static const syntax_option_type bk_refs;
static const syntax_option_type bk_vbar;
static const syntax_option_type use_except;
static const syntax_option_type failbit;
static const syntax_option_type literal;
static const syntax_option_type nocollate;
static const syntax_option_type perlex;
static const syntax_option_type emacs;
+} // namespace regex_constants
+} // namespace std
+ The type syntax_option_type is an implementation defined bitmask
+ type (17.3.2.1.2). Setting its elements has the effects listed in the table
+ below, a valid value of type syntax_option_type will always have
+ exactly one of the elements normal, basic, extended, awk, grep, egrep, sed
+ or perl set.
Note that for convenience all the constants listed here are duplicated within + the scope of class template basic_regex, so you can use any of:
+boost::regex_constants::constant_name+
or
+boost::regex::constant_name+
or
+boost::wregex::constant_name+
in an interchangeable manner.
++
|
+ Element + |
+
+ Effect if set + |
+
|
+ normal + |
+
+ Specifies that the grammar recognized by the regular expression engine uses its + normal semantics: that is the same as that given in the ECMA-262, ECMAScript + Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects + (FWD.1). +boost.regex also recognizes most perl-compatible extensions in this mode. + |
+
|
+ icase + |
+
+ Specifies that matching of regular expressions against a character container + sequence shall be performed without regard to case. + |
+
|
+ nosubs + |
+
+ Specifies that when a regular expression is matched against a character + container sequence, then no sub-expression matches are to be stored in the + supplied match_results structure. + |
+
|
+ optimize + |
+
+ Specifies that the regular expression engine should pay more attention to the + speed with which regular expressions are matched, and less to the speed with + which regular expression objects are constructed. Otherwise it has no + detectable effect on the program output. This currently has no effect for + boost.regex. + |
+
|
+ collate + |
+
+ Specifies that character ranges of the form "[a-b]" should be locale sensitive. + |
+
|
+ ECMAScript + |
+
+ The same as normal. + |
+
|
+ JavaScript + |
+
+ The same as normal. + |
+
|
+ JScript + |
+
+ The same as normal. + |
+
|
+ basic + |
+
+ Specifies that the grammar recognized by the regular expression engine is the + same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001, + Portable Operating System Interface (POSIX ), Base Definitions and Headers, + Section 9, Regular Expressions (FWD.1). + + |
+
|
+ extended + |
+
+ Specifies that the grammar recognized by the regular expression engine is the + same as that used by POSIX extended regular expressions in IEEE Std + 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and + Headers, Section 9, Regular Expressions (FWD.1). + |
+
|
+ awk + |
+
+ Specifies that the grammar recognized by the regular expression engine is the + same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable + Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk + (FWD.1). +That is to say: the same as POSIX extended syntax, but with escape sequences in + character classes permitted. + |
+
|
+ grep + |
+
+ Specifies that the grammar recognized by the regular expression engine is the + same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable + Operating System Interface (POSIX ), Shells and Utilities, Section 4, + Utilities, grep (FWD.1). +That is to say, the same as POSIX basic syntax, but with the newline character + acting as an alternation character in addition to "|". + |
+
|
+ egrep + |
+
+ Specifies that the grammar recognized by the regular expression engine is the + same as that used by POSIX utility grep when given the -E option in IEEE Std + 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and + Utilities, Section 4, Utilities, grep (FWD.1). +That is to say, the same as POSIX extended syntax, but with the newline + character acting as an alternation character in addition to "|". + |
+
|
+ sed + |
+
+ The same as basic. + |
+
|
+ perl + |
+
+ The same as normal. + |
+
The following constants are specific to this particular regular expression + implementation and do not appear in the + regular expression standardization proposal:
++
| regbase::escape_in_lists | +Allows the use of the escape "\" character in sets of + characters, for example [\]] represents the set of characters containing only + "]". If this flag is not set then "\" is an ordinary character inside sets. | +
| regbase::char_classes | +When this bit is set, character classes [:classname:] + are allowed inside character set declarations, for example "[[:word:]]" + represents the set of all characters that belong to the character class "word". | +
| regbase:: intervals | +When this bit is set, repetition intervals are + allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter + a's. | +
| regbase:: limited_ops | +When this bit is set all of "+", "?" and "|" are + ordinary characters in all situations. | +
| regbase:: newline_alt | +When this bit is set, then the newline character "\n" + has the same effect as the alternation operator "|". | +
| regbase:: bk_plus_qm | +When this bit is set then "\+" represents the one or + more repetition operator and "\?" represents the zero or one repetition + operator. When this bit is not set then "+" and "?" are used instead. | +
| regbase:: bk_braces | +When this bit is set then "\{" and "\}" are used for + bounded repetitions and "{" and "}" are normal characters. This is the opposite + of default behavior. | +
| regbase:: bk_parens | +When this bit is set then "\(" and "\)" are used to + group sub-expressions and "(" and ")" are ordinary characters, this is the + opposite of default behavior. | +
| regbase:: bk_refs | +When this bit is set then back references are + allowed. | +
| regbase:: bk_vbar | +When this bit is set then "\|" represents the + alternation operator and "|" is an ordinary character. This is the opposite of + default behavior. | +
| regbase:: use_except | +When this bit is set then a bad_expression + exception will be thrown on error. Use of this flag is deprecated - + basic_regex will always throw on error. | +
| regbase:: failbit | +This bit is set on error, if regbase::use_except is + not set, then this bit should be checked to see if a regular expression is + valid before usage. | +
| regbase::literal | +All characters in the string are treated as literals, + there are no special characters or escape sequences. | +
| regbase::emacs | +Provides compatability with the emacs + editor, eqivalent to: bk_braces | bk_parens | bk_refs | bk_vbar. | +
Revised + + 17 May 2003 +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + + diff --git a/doc/thread_safety.html b/doc/thread_safety.html new file mode 100644 index 00000000..eeda681d --- /dev/null +++ b/doc/thread_safety.html @@ -0,0 +1,68 @@ + + + ++
|
+ |
+
+ Boost.Regex+Thread Safety+ |
+
+ |
+
Class basic_regex<> and its typedefs regex + and wregex are thread safe, in that compiled regular expressions can safely be + shared between threads. The matching algorithms regex_match, + regex_search, regex_grep, + regex_format and regex_merge + are all re-entrant and thread safe. Class match_results + is now thread safe, in that the results of a match can be safely copied from + one thread to another (for example one thread may find matches and push + match_results instances onto a queue, while another thread pops them off the + other end), otherwise use a separate instance of match_results + per thread. +
+The POSIX API functions are all re-entrant and + thread safe, regular expressions compiled with regcomp can also be + shared between threads. +
+The class RegEx is only thread safe if each thread + gets its own RegEx instance (apartment threading) - this is a consequence of + RegEx handling both compiling and matching regular expressions. +
+Finally note that changing the global locale invalidates all compiled regular + expressions, therefore calling set_locale from one thread while another + uses regular expressions will produce unpredictable results. +
++ There is also a requirement that there is only one thread executing prior to + the start of main().
+Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/uarrow.gif b/doc/uarrow.gif new file mode 100644 index 0000000000000000000000000000000000000000..6afd20c3857127c21fc9bcd52ec347e32c21578c GIT binary patch literal 1666 zcmZ?wbhEHb)Me0S_|CwP5S3S3-SznXe?>*j_wTThe following tables provide comparisons between the following regular + expression libraries:
+ + +Henry Spencer's regular expression library + - this is provided for comparison as a typical non-backtracking implementation.
+Philip Hazel's PCRE library.
+Machine: Intel Pentium 4 2.8GHz PC.
+Compiler: Microsoft Visual C++ version 7.1.
+C++ Standard Library: Dinkumware standard library version 313.
+OS: Win32.
+Boost version: 1.31.0.
+PCRE version: 3.9.
+As ever care should be taken in interpreting the results, only sensible regular + expressions (rather than pathological cases) are given, most are taken from the + Boost regex examples, or from the Library of + Regular Expressions. In addition, some variation in the relative + performance of these libraries can be expected on other machines - as memory + access and processor caching effects can be quite large for most finite state + machine algorithms. In each case the first figure given is the relative + time taken (so a value of 1.0 is as good as it gets), while the second figure + is the actual time taken.
+The following are the average relative scores for all the tests: the perfect + regular expression library would score 1, in practice anything less than 2 + is pretty good.
+| GRETA | +GRETA + (non-recursive mode) |
+ Boost | +Boost + C++ locale | +POSIX | +PCRE | +
| 6.90669 | +23.751 | +1.62553 | +1.38213 | +110.973 | +1.69371 | +
For each of the following regular expressions the time taken to find all + occurrences of the expression within a long English language text was measured + (mtent12.txt + from Project Gutenberg, 19Mb).
+| Expression | +GRETA | +GRETA + (non-recursive mode) |
+ Boost | +Boost + C++ locale | +POSIX | +PCRE | +
Twain |
+ 19.7 + (0.541s) |
+ 85.5 + (2.35s) |
+ 3.09 + (0.0851s) |
+ 3.09 + (0.0851s) |
+ 131 + (3.6s) |
+ 1 + (0.0275s) |
+
Huck[[:alpha:]]+ |
+ 11 + (0.55s) |
+ 93.4 + (4.68s) |
+ 3.4 + (0.17s) |
+ 3.35 + (0.168s) |
+ 124 + (6.19s) |
+ 1 + (0.0501s) |
+
[[:alpha:]]+ing |
+ 11.3 + (6.82s) |
+ 21.3 + (12.8s) |
+ 1.83 + (1.1s) |
+ 1 + (0.601s) |
+ 6.47 + (3.89s) |
+ 4.75 + (2.85s) |
+
^[^ ]*?Twain |
+ 5.75 + (1.15s) |
+ 17.1 + (3.43s) |
+ 1 + (0.2s) |
+ 1.3 + (0.26s) |
+ NA | +3.8 + (0.761s) |
+
Tom|Sawyer|Huckleberry|Finn |
+ 28.5 + (3.1s) |
+ 77.2 + (8.4s) |
+ 2.3 + (0.251s) |
+ 1 + (0.109s) |
+ 191 + (20.8s) |
+ 1.77 + (0.193s) |
+
(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn) |
+ 16.2 + (4.14s) |
+ 49 + (12.5s) |
+ 1.65 + (0.42s) |
+ 1 + (0.255s) |
+ NA | +2.43 + (0.62s) |
+
For each of the following regular expressions the time taken to find all + occurrences of the expression within a medium sized English language text was + measured (the first 50K from mtent12.txt).
+| Expression | +GRETA | +GRETA + (non-recursive mode) |
+ Boost | +Boost + C++ locale | +POSIX | +PCRE | +
Twain |
+ 9.49 + (0.00274s) |
+ 40.7 + (0.0117s) |
+ 1.54 + (0.000445s) |
+ 1.56 + (0.00045s) |
+ 13.5 + (0.00391s) |
+ 1 + (0.000289s) |
+
Huck[[:alpha:]]+ |
+ 14.3 + (0.0027s) |
+ 62.3 + (0.0117s) |
+ 2.26 + (0.000425s) |
+ 2.29 + (0.000431s) |
+ 1.27 + (0.000239s) |
+ 1 + (0.000188s) |
+
[[:alpha:]]+ing |
+ 7.34 + (0.0178s) |
+ 13.7 + (0.0331s) |
+ 1 + (0.00243s) |
+ 1.02 + (0.00246s) |
+ 7.36 + (0.0178s) |
+ 5.87 + (0.0142s) |
+
^[^ ]*?Twain |
+ 8.34 + (0.00579s) |
+ 24.8 + (0.0172s) |
+ 1.52 + (0.00105s) |
+ 1 + (0.000694s) |
+ NA | +2.81 + (0.00195s) |
+
Tom|Sawyer|Huckleberry|Finn |
+ 12.9 + (0.00781s) |
+ 35.1 + (0.0213s) |
+ 1.67 + (0.00102s) |
+ 1 + (0.000606s) |
+ 81.5 + (0.0494s) |
+ 1.94 + (0.00117s) |
+
(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn) |
+ 15.6 + (0.0106s) |
+ 46.6 + (0.0319s) |
+ 2.72 + (0.00186s) |
+ 1 + (0.000684s) |
+ 311 + (0.213s) |
+ 1.72 + (0.00117s) |
+
For each of the following regular expressions the time taken to find all + occurrences of the expression within the C++ source file + boost/crc.hpp was measured.
+| Expression | +GRETA | +GRETA + (non-recursive mode) |
+ Boost | +Boost + C++ locale | +POSIX | +PCRE | +
^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\<\w+\>([
+ ]*\([^)]*\))?[[:space:]]*)*(\<\w*\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\{|:[^;\{()]*\{) |
+ 8.88 + (0.000792s) |
+ 46.4 + (0.00414s) |
+ 1.19 + (0.000106s) |
+ 1 + (8.92e-005s) |
+ 688 + (0.0614s) |
+ 3.23 + (0.000288s) |
+
(^[
+ ]*#(?:[^\\\n]|\\[^\n_[:punct:][:alnum:]]*[\n[:punct:][:word:]])*)|(//[^\n]*|/\*.*?\*/)|\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\>|('(?:[^\\']|\\.)*'|"(?:[^\\"]|\\.)*")|\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned|using|virtual|void|volatile|wchar_t|while)\> |
+ 1 + (0.00571s) |
+ 5.31 + (0.0303s) |
+ 2.47 + (0.0141s) |
+ 1.92 + (0.011s) |
+ NA | +3.29 + (0.0188s) |
+
^[ ]*#[ ]*include[ ]+("[^"]+"|<[^>]+>) |
+ 5.78 + (0.00172s) |
+ 26.3 + (0.00783s) |
+ 1.12 + (0.000333s) |
+ 1 + (0.000298s) |
+ 128 + (0.0382s) |
+ 1.74 + (0.000518s) |
+
^[ ]*#[ ]*include[ ]+("boost/[^"]+"|<boost/[^>]+>) |
+ 10.2 + (0.00305s) |
+ 28.4 + (0.00845s) |
+ 1.12 + (0.000333s) |
+ 1 + (0.000298s) |
+ 155 + (0.0463s) |
+ 1.74 + (0.000519s) |
+
For each of the following regular expressions the time taken to find all + occurrences of the expression within the html file libs/libraries.htm + was measured.
+| Expression | +GRETA | +GRETA + (non-recursive mode) |
+ Boost | +Boost + C++ locale | +POSIX | +PCRE | +
beman|john|dave |
+ 11 + (0.00297s) |
+ 34.3 + (0.00922s) |
+ 1.78 + (0.000479s) |
+ 1 + (0.000269s) |
+ 55.2 + (0.0149s) |
+ 1.85 + (0.000499s) |
+
<p>.*?</p> |
+ 5.38 + (0.00145s) |
+ 21.8 + (0.00587s) |
+ 1.02 + (0.000274s) |
+ 1 + (0.000269s) |
+ NA | +1.05 + (0.000283s) |
+
<a[^>]+href=("[^"]*"|[^[:space:]]+)[^>]*> |
+ 4.51 + (0.00207s) |
+ 12.6 + (0.00579s) |
+ 1.34 + (0.000616s) |
+ 1 + (0.000459s) |
+ 343 + (0.158s) |
+ 1.09 + (0.000499s) |
+
<h[12345678][^>]*>.*?</h[12345678]> |
+ 7.39 + (0.00143s) |
+ 29.6 + (0.00571s) |
+ 1.87 + (0.000362s) |
+ 1 + (0.000193s) |
+ NA | +1.27 + (0.000245s) |
+
<img[^>]+src=("[^"]*"|[^[:space:]]+)[^>]*> |
+ 6.73 + (0.00145s) |
+ 27.3 + (0.00587s) |
+ 1.2 + (0.000259s) |
+ 1.32 + (0.000283s) |
+ 148 + (0.0319s) |
+ 1 + (0.000215s) |
+
<font[^>]+face=("[^"]*"|[^[:space:]]+)[^>]*>.*?</font> |
+ 6.93 + (0.00153s) |
+ 27 + (0.00595s) |
+ 1.22 + (0.000269s) |
+ 1.31 + (0.000289s) |
+ NA | +1 + (0.00022s) |
+
For each of the following regular expressions the time taken to match against + the text indicated was measured.
+| Expression | +Text | +GRETA | +GRETA + (non-recursive mode) |
+ Boost | +Boost + C++ locale | +POSIX | +PCRE | +
abc |
+ abc | +1.31 + (2.2e-007s) |
+ 1.94 + (3.25e-007s) |
+ 1.26 + (2.1e-007s) |
+ 1.24 + (2.08e-007s) |
+ 3.03 + (5.06e-007s) |
+ 1 + (1.67e-007s) |
+
^([0-9]+)(\-| |$)(.*)$ |
+ 100- this is a line of ftp response which contains a message string | +1.52 + (6.88e-007s) |
+ 2.28 + (1.03e-006s) |
+ 1.5 + (6.78e-007s) |
+ 1.5 + (6.78e-007s) |
+ 329 + (0.000149s) |
+ 1 + (4.53e-007s) |
+
([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4} |
+ 1234-5678-1234-456 | +2.04 + (1.03e-006s) |
+ 2.83 + (1.43e-006s) |
+ 2.12 + (1.07e-006s) |
+ 2.04 + (1.03e-006s) |
+ 30.8 + (1.56e-005s) |
+ 1 + (5.05e-007s) |
+
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
+ john_maddock@compuserve.com | +1.48 + (1.78e-006s) |
+ 2.1 + (2.52e-006s) |
+ 1.35 + (1.62e-006s) |
+ 1.32 + (1.59e-006s) |
+ 165 + (0.000198s) |
+ 1 + (1.2e-006s) |
+
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
+ foo12@foo.edu | +1.28 + (1.41e-006s) |
+ 1.9 + (2.1e-006s) |
+ 1.42 + (1.57e-006s) |
+ 1.38 + (1.53e-006s) |
+ 107 + (0.000119s) |
+ 1 + (1.11e-006s) |
+
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
+ bob.smith@foo.tv | +1.29 + (1.43e-006s) |
+ 1.9 + (2.1e-006s) |
+ 1.42 + (1.57e-006s) |
+ 1.38 + (1.53e-006s) |
+ 119 + (0.000132s) |
+ 1 + (1.11e-006s) |
+
^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
+ EH10 2QQ | +1.26 + (4.63e-007s) |
+ 1.77 + (6.49e-007s) |
+ 1.3 + (4.77e-007s) |
+ 1.2 + (4.4e-007s) |
+ 9.15 + (3.36e-006s) |
+ 1 + (3.68e-007s) |
+
^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
+ G1 1AA | +1.06 + (4.73e-007s) |
+ 1.59 + (7.07e-007s) |
+ 1.05 + (4.68e-007s) |
+ 1 + (4.44e-007s) |
+ 12.9 + (5.73e-006s) |
+ 1.63 + (7.26e-007s) |
+
^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
+ SW1 1ZZ | +1.26 + (9.17e-007s) |
+ 1.84 + (1.34e-006s) |
+ 1.28 + (9.26e-007s) |
+ 1.21 + (8.78e-007s) |
+ 8.42 + (6.11e-006s) |
+ 1 + (7.26e-007s) |
+
^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
+ 4/1/2001 | +1.57 + (9.73e-007s) |
+ 2.28 + (1.41e-006s) |
+ 1.25 + (7.73e-007s) |
+ 1.26 + (7.83e-007s) |
+ 11.2 + (6.95e-006s) |
+ 1 + (6.21e-007s) |
+
^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
+ 12/12/2001 | +1.52 + (9.56e-007s) |
+ 2.06 + (1.3e-006s) |
+ 1.29 + (8.12e-007s) |
+ 1.24 + (7.83e-007s) |
+ 12.4 + (7.8e-006s) |
+ 1 + (6.3e-007s) |
+
^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
+ 123 | +2.11 + (7.35e-007s) |
+ 3.18 + (1.11e-006s) |
+ 2.5 + (8.7e-007s) |
+ 2.44 + (8.5e-007s) |
+ 5.26 + (1.83e-006s) |
+ 1 + (3.49e-007s) |
+
^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
+ +3.14159 | +1.31 + (4.96e-007s) |
+ 1.92 + (7.26e-007s) |
+ 1.26 + (4.77e-007s) |
+ 1.2 + (4.53e-007s) |
+ 9.71 + (3.66e-006s) |
+ 1 + (3.77e-007s) |
+
^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
+ -3.14159 | +1.32 + (4.97e-007s) |
+ 1.92 + (7.26e-007s) |
+ 1.24 + (4.67e-007s) |
+ 1.2 + (4.53e-007s) |
+ 9.7 + (3.66e-006s) |
+ 1 + (3.78e-007s) |
+
Copyright John Maddock April 2003, all rights reserved.
+ + diff --git a/example/snippets/regex_iterator_example.cpp b/example/snippets/regex_iterator_example.cpp new file mode 100644 index 00000000..6ec3d85e --- /dev/null +++ b/example/snippets/regex_iterator_example.cpp @@ -0,0 +1,115 @@ +/* + * + * Copyright (c) 2003 + * Dr John Maddock + * + * Permission to use, copy, modify, distribute and sell this software + * and its documentation for any purpose is hereby granted without fee, + * provided that the above copyright notice appear in all copies and + * that both that copyright notice and this permission notice appear + * in supporting documentation. Dr John Maddock makes no representations + * about the suitability of this software for any purpose. + * It is provided "as is" without express or implied warranty. + * + */ + + /* + * LOCATION: see http://www.boost.org for most recent version. + * FILE regex_iterator_example_2.cpp + * VERSION see