Regular expression

From Conservapedia
This is an old revision of this page, as edited by JonG (Talk | contribs) at 21:53, September 3, 2011. It may differ significantly from current revision.

Jump to: navigation, search

A regular expression is used in computer software to define a sequence of characters.[1]

In general, any character will match itself, but there are a dozen special characters, including the escape character.

To match any of 2 or more characters, enclose them in square brackets. For example,

gr[ae]y

will match gray or grey.

A regular expression is matched from left to right and is processed one token at a time. Certain characters have special meanings within the description of a regular language:

  • '*' - the previous construction is matched 0 or more times
  • '+' - the previous construction is matched 1 or more times
  • '?' - the previous construction is matched 0 or 1 times
  • '(...)' - the contents between the parentheses is considered a single construction
  • '...|...' - the construction on the left of the '|' may be matched or the construction on the right of the '|' may be matched.
  • '[...]' - match one of the characters contained within the square braces.
  • '.' - match any single character
  • '^' - beginning of a string
  • '$' - end of a string

There also exists a wide range of special character classes distinguished with a backslash (a small list):

  • '\\' - a literal backslash
  • '\.' - a literal period


Regular expressions have also been extended by many languages, some of which extend them to the point where they are able to match a wider range of languages than is specified by a regular language.

Examples

  • /Mrs?\. Smith/ - matches 'Mr. Smith', 'Mrs. Smith'
  • /^a*b*$/ - matches 'a', 'ab', 'b', 'aaaab', 'abbbbb', 'aaabbb',
  • /[1-9][0-9]*(\.0|([1-9][0-9]*))+/ - matches version numbers: 1, 1.0, 2.3, 103.4, 42.5.6.8.9, 7.0.8.9

Formal definition and Limitations

A regular expression is particular instance of a non-deterministic finite state automaton. Regular expressions are a type-3 grammar in the Chomsky hierarchy of language.

A regular expression is not able to count. This is because there is a finite number of states. Consider the language that is specified by anban. Examples of this language include b, aba, aabaa, aaabaaa, etc... A regular expression - being a finite state automaton itself - has a finite number of states that it can be in. If there a point at which the state loops back on itself it is no longer able to match that language.

See also

Further reading

  1. "A regular expression, or regex for short, is a pattern describing a certain amount of text." Regular Expression Quick Start