C# .NET Tutorial 9: Regular Expression in C#
The last post was dedicated to regular expressions. In this tutorial of the C# Tutorial series I am going to discus how you can use regular expressions in C#.
Using Regular Expressions for Pattern Matching
To help you understand let’s create a console application that accepts two strings as input and determines whether the first string, which is a regular expression, matches the second string. The console application must use the System.Text.RegularExpressions namespace. The check will be performed using the Regex.IsMatch static method. Here is the code.
using System.Text.RegularExpressions;
namespace TestRegExp
{
class Class1
{
[STAThread]
static void main (string[] args)
{
if (Regex.IsMatch(args[1], args[0])
Console.WriteLine("Input matches regular expression.");
else
Console.WriteLine("Input does not match regular expression.");
}
}
}
To test it give it the following data:
- ^\d{5}$ 1234 – should not match.
- ^\d{5}$ 12345 – should match.
How to Specify Regular Expression Options
You can modify a regular expression pattern with options that affect matching behaviour. Regular expression options can be set by specifying the options parameter in the Regex(pattern, options) constructor, where options is a bitwise or combination of RegexOptions enumerated values. The following are the members of the RegexOptions enumeration:
- None – no options are set.
- IgnoreCase – Specifies case-insensitive matching.
- MultiLine – matching is performed from the beginning to the end of each new line not from the beginning to the end of the string.
- ExplicitCapture – specifies that only valid captures are explicitly named.
- Compiled – specifies that the regular expression will be compiled to an assembly. It will generate Microsoft intermediate language code (MSIL) for the regular expression. This yields faster execution at the expense of start up time.
- SingeLine – specifies single-line mode. This changes the meaning of the character (.) so that it matches every character (instead of every character except \n).
- IgnorePatternWhitespace – specifies that unescaped white space is excluded from the pattern, and enables comments following a number sign (#).
- RightToLeft – the search moves from right to left. With such an option the starting position in a regular expression should be specified at the end of the string instead of the beginning).
- ECMAScript – specifies that ECMAScript – compliant behaviour is enabled for the expression. This option can only be used in conjunction with IgnoreCase and MultiLine flags. Other options will throw an exception.
- CultureInvariant – specifes that cultural differences in language are ignored.
How to Extract Matched Data
Besides determining whether a string matches a pattern one can also extract information from a string. For example if you have the url www.markscerri.com and you only want the part between the www. and the .com you can extract it with a regular expression. To do this you need to:
- Create a regular expression and enclose in parentheses the pattern to be matched.
- Create an instance of the System.Text.RegularExpressions.Match class using the static method Regex.Match.
- Retreive the matched data by accessing the elements of the Match.Groups array.
Here is an example:
string input = "www.markscerri.com"; Match m = Regex.Match(input, @"www.(\w+).com"); Console.WriteLine(Match.Groups[1]); //note that matches start from 1
The following example searches an input string and prints out all the href=”…” values and their locations in the string.
void GetHrefs(string input)
{
Regex r;
Match m;
r = new Regex("href\\s*=\\s*(?:\"(?[^\"]*)\"|(?\\S+))", RegexOptions.IgnoreCase | RegexOptions.Compiled);
for (m = r.Match(input); m.Success; m=m.NextMatch())
{
Console.WriteLine("Found href " + m.Groups[1] + " at " + m.Groups[1].Index);
}
}
You can also use the Match.Result method to reformat extracted substrings. For example the following code extracts the protocol and port from a url. That is http://www.markscerri.com:8080/ will result in the string http:8080.
String Extractor(String url)
{
Regex r = new Regex(@"^(?
\w+)://[^/]+?(?
:\d+)?/",RegexOptions.Compiled);
return r.Match(url).Result("${proto}${port}");
}
How to Replace Substrings using Regular Expressions
You can use regular expressions to make replacements that are more complex than you can do with String.Replace. The following code uses the Regex.Replace static method to replace dates in mm/dd/yy format to dd-mm-yy format.
String ChangeFormat(String input)
{
return Regex.Replace(input, "\\b(?\\d{1,2})/(?\\d{1,2})/(?\\d{2,4})\\b",
"${day}-${month}-${year}");
}
In this example we use back references within regular expressions. The replacement expression ${day} inserts the substring captured by the group (?…).
Character escapes and substitutions are the only special constructs recognized in a replacement pattern. For example the replacement pattern a*${txt}b inserts the string a*, followed by the substring matched by the txt capturing group, followed by the string b. The * character is not recognized as a meta character within a replacement pattern.
The following list shows how to define named and numbered replacement patterns:
- $number – substitutes the last substring matched by the group number number.
- ${name} – substitutes the last substring matched by a (?<name>) group.
- $$ – substitutes a single $ literal.
- $& – substitutes a copy of the entire match itself.
- $` – substitutes all the text of the input string before the match.
- $’ – substitutes all the text of the input string after the match.
- $+ – substitutes the last group captured.
- $_ – substitutes the entire input string.
Validating Names with Regular Expressions
Regular expressions can be efficiently used to validate user input. However certain input such as names can be very difficult to validate because a name can have a lot of different valid strings. The following regular expression can be used to validate names and surnames:
[a-zA-Z'`-À,Â\s]{1,40}
Well that’s it for today. In the next tutorial I will be discussing encoding and decoding of strings. Until then enjoy!!
goood