Lesson 19: RegExp Today we will be discussing RegExp which is shorthand for regular expressions. A RegExp in JavaScript terms is an object that describes a pattern of characters ( 1) Defining a RegExp A regular expression is contained within slash marks just as a string is normally contained withing quote marks, so to create a variable that containes a regular expression that searches for the letter s we use similar syntax to setting a variable to the letter s, only replacing the quotes with slashes: var myRegExp=/s/; The power of regular expressions lies in their ability to use special characters and syntax instead of only using individual or groups of characters. For instance, the dollar sign ($) is a special character that means to check the end of a string or line, so to check for s at the end of a string, we us: var myRegExp=/s$/; There are many special characters in RegExps, and to use them literally requires escaping just using JavaScript characters within strings. Here is a partial list of RegExp literal characters: \f form feed \n new line \r carriage return \t tab \v vertical tab \/ a literal "/" \\ a literal "\" \. a literal "." \* a literal "*" \+ a literal "+" \? a literal "?" \| a literal "|" \( a literal "(" \) a literal ")" \[ a literal "[" \] a literal "]" \{ a literal "{" \} a literal "}" \xxx. the ASCII character of octal xxx \xnn the ASCII character of hexadecimal nn \cX the control character ^X 2) Reg Exp Character Classes You can combine individual characters into a character class by enclosing them in square brackets [ and ] in order to match any character defined within them. So to find the letter s above we used: var myRegExp=/s/; if we want to find matches of s t and u we use: var myRegExp=/[stu]/; If we want to match everything but certain characters we can use the negation character ^ so to find everything but s t or u we use: var myRegExp=/[^stu]/; You can also search for a range of characters, so the search for s t or u could be restated as a search for s through u with this syntax: var myRegExp=/[s-u]/; and the search for anything but s t or u is: var myRegExp=/[^s-u]/; The people who created RegExps were smart enough to realize that certain patterns of characters would be searched for often, and included them in the core definition with special escape identifiers for their special character classes. For instance, to find any single digit, we could seach for the range of number characters 0-9: var myRegExp=/[0-9]/; or we could instead use the special character class defining digits \d and simply use: var myRegExp=/\d/; These special character classes can even be used inside square brackets to define more complex character classes, so to search for any digit with any whitespace we use \d and \s (the special charcter class for spaces or returns): var myRegExp=/[\d\s]/; These special character classes are the negative of themselves when capitalized, so to search for any non-digit character, we use: var myRegExp=/\D/; Here is a partial list of special character classes [...] any one character between the brackets [^...] any one character not in the brackets any character except newline same as [^\n] \w any word character, same as [a-zA-Z0-9_] \W any non-word character, like [^a-zA-Z0-9_] \s any whitespace character, same as [\t\n\r\f\v] \S any non-whitespace, same as [^\t\n\r\f\v] \d any digit character, same as [0-9] \D any non-digit character, same as [^0-9] [\b] a special case for backspaces (weird) 3) Character Repetition With what we know so far, we can search for one digit with /\d/ and two digits with /\d\d/ ect, but this is inefficient, and doesn't allow us to search for a number of any size. The number of times an element of a RegExp is searched for in a single string can be defined within curly brackets as a single integer or a range, and by using special literals (see above), we can create special string definitions for searches through RegExps. For instance, to search for a two digit number, we use: var myRegExp=/\d{2}/; Another common search method requires ignoring whitespaces no matter how many. We can search for on or more occurances with the special + literal, so to search for one or more occurances (>=1) of whitespace before and after any two digits, we use: var myRegExp=/\s+\d{2}\s+/; And to search for only one or no occurance (<=1) we can use the special ? literal, so to search for only one space after a two digit number we use: var myRegExp=/\s+\d{2}\s?/; Here is a partial list of special repetition literals: {n,m} match from n times to m times {n,} match n or more times {n} match exactly n times ? match zero or one times, same as {0,1} + match one or more times, same as {1,} * match zero or more times, same as {0,} 3) Grouping and Referencing Withing RegExps We can search for various patterns within a string by grouping and using alternatives within the RegExp itself. We use ( and ) to group and | to alternate. So to check for either ab or cd or ef we use: var myRegExp=/ab|cd|ef/; We can group to check for either ef or one or more occurances of ab or cd with parentheses: var myRegExp=/(ab|cd)+|ef/; These subexpressions can be referenced to by the begining index of their opening parentheses so to be certain matches of group match particular occurances. For instance, in JavaScrip, we wouldn't want to improperly nest single and double quotations marks. To search a string and insure proper occuarnces of these quote marks, we could use: var myRegExp=/(['"])[^'"]]*\1/; This searches for either type of quote, then any amount of non quote characters, then the same match as the first subexpression found, being certain is the same character found and not just one of the characters defined by the charater class searched for. 4) Attributes i and g There are two attributes that can be used when performing searches that are placed immediately after the closing slash mark of the RegExp, the i for case insensitive searches, and the g for global searches. A) Normally, all matching in JavaScript is case sensitive, so A is not a. We can ignore case with RegExp by using the i attribute, so to search for any instance of the word javascript regardless if it's spelt JavaScript or Javascript or javascript ect, we use: var myRegExp=/javascript/i; B) With a standard RegExp, searches are performed one step at a time through the searched string. If instead we want the searches to all be perfomed throughout the string, we can set the gobal search attribute g, so to search for all the occurances of java in a string regardless of capitalization or occurance in the string we use: var myRegExp=/java/gi/; All the above rules and definitions can be used within the JavaScript match(), replace(), search(), and split() methods, allowing for dynamic reaction to user input, regadless of the form or syntax of the input. This is the same as the original syntax for Perl used for searching databases, and can be used in many ways for complex JavaScript applications, and can be intertwined with Perl and other languages for truly powerful pattern matching and more! |