VBScript Search for Repeated Words

From Regex Regular Expression Encyclopedia

Jump to: navigation, search

You can use this recipe to find words that appear more than once on a line, such as the the.

code

Dim fso,s,re,line,lineNbr
Set fso = CreateObject("Scripting.FileSystemObject")
Set s = fso.OpenTextFile(WScript.Arguments.Item(0), 1, True)
Set re = New RegExp
re.Pattern = "\b(\w+)\s\1\b"
lineNbr = 0
Do While Not s.AtEndOfStream
    line = s.ReadLine()
    lineNbr = lineNbr + 1
    If re.Test(line) Then
        WScript.Echo "Found match: '" & line & "' at line " & lineNbr
    End If
Loop
s.Close

How It Works

The most important aspect of this regular expression is the back reference, which is \1 in all the previous recipes. The back reference is just a way of saying “whatever you found in the first group.” The parentheses in the expression define the group. Here’s a breakdown of the expression:

Regular Expression Description
\b is a word boundary, followed by . . .
(...) a group (explained next), then . . .
\s a space . . .
+ one or more times, then . . .
\1 whatever was found in the group, and lastly . . .
\b a word boundary.
The group is simply (\w+), which is as follows:
\w a word character . . .
+ found one or more times.

This will match a word. The expression begins and ends with a word boundary anchor. This is to prevent the expression from matching a string such as quarterback backrub. If the word boundary anchors are removed, the expression will start matching subsections of words.