Visual Basic .NET Search for Repeated Words Across Multiple Lines
From Regex Regular Expression Encyclopedia
This recipe allows you to search for repeated words that occur on more than one line. For example:
word word
[edit] code
Imports System Imports System.IO Imports System.Text.RegularExpressions Public Class Recipe Private Shared _Regex As Regex = New Regex("\b(\w+)(\s*$\s*|\s+)\1\b", _ RegexOptions.IgnoreCase Or RegexOptions.Multiline) Public Sub Run(ByVal fileName As String) Dim line As String Dim sr As StreamReader = File.OpenText(fileName) line = sr.ReadToEnd() If Not line Is Nothing If _Regex.IsMatch(line) Then For Each myMatch As Match In _Regex.Matches(line) Console.WriteLine("Found match '{0}'", myMatch.ToString()) Next End If End If sr.Close() End Sub Public Shared Sub Main(ByVal args As String()) Dim r As Recipe = New Recipe r.Run(args(0)) End Sub End Class
[edit] How It Works
The “magic” part of this expression is the option given to the constructor of the Regex class, RegexOptions.Multiline, which allows the $ anchor to match the end of a line as well as the end of a string. The difference between the two is that when using the ReadToEnd() method of the StreamReader, the entire contents of the file will be loaded into one string, even though the contents span multiple lines in the file. Each word can have some space between it and the end of the line or between the beginning of the line and the word. The part of the expression that matches this is as follows:
| Regular Expression | Description |
|---|---|
| \s | whitespace . . . |
| * | that’s optional . . . |
| \s | some more whitespace . . . |
| * | that's optional. |
Since I wanted to match two repeated words on one line as well as two lines, the expres- sion must also look for a space between the words. This is the same as in recipe 1-10, which is \s+.
Another option is passed in on the constructor to the Regex class: RegexOptions. IgnoreCase. When more than one option is specified, the | operator is used in C# and the Or keyword is used in Visual Basic .NET between the options. The option to ignore case in this expression is used so it will find matches such as This this or The the.
