C Sharp Find Multiple Words with One Search

From Regex Regular Expression Encyclopedia

Jump to: navigation, search

You can use this recipe for finding one of a list of words in a line. This recipe assumes both words are whole words surrounded by whitespace and that the list is a short one containing the words moo and oink.

[edit] code

using System;
using System.IO;
using System.Text.RegularExpressions;
public class Recipe
{
    private static Regex _Regex = new Regex( @"\s+(moo|oink)\s+" );
    public void Run(string fileName)
    {
        String line;
        int lineNbr = 0;
        using (StreamReader sr = new StreamReader(fileName))
        {
            while(null != (line = sr.ReadLine()))
            {
                lineNbr++;
                if (_Regex.IsMatch(line))
              {
                  Console.WriteLine("Found match '{0}' at line {1}",
                      line,
                      lineNbr);                                                       
              }
          }
      }
  }
  public static void Main( string[] args )
  {
      Recipe r = new Recipe();
      r.Run(args[0]);
  }
}

[edit] How It Works

A special character class, \b, allows you to easily search for whole words. This is an advantage because without doing a whole bunch of extra work you can make sure that a search for some- thing, for example, doesn’t yield unexpected matches such as somethings. You can break the regular expression shown here into the following:

Regular Expression Description
\s whitespace . . .
+ found one or more times . . .
(...) followed by something . . .
\s followed by whitespace . . .
+ that occurs one or more times.
The something here is another expression, moo|oink. This expression is as follows:
m an m, followed by . . .
o an o, then . . .
o an o . . .
or...
o an o, followed by . . .
i an i, then . . .
n an n, followed by...
k a k.

[edit] Variations

A useful variation of this recipe is to replace the \s+ combination, which matches specifically whitespace, with the word boundary character class \b.

Personal tools