VBScript Search for Repeated Words
From Regex Regular Expression Encyclopedia
You can use this recipe to find words that appear more than once on a line, such as the the.
code
Dim fso,s,re,line,lineNbr Set fso = CreateObject("Scripting.FileSystemObject") Set s = fso.OpenTextFile(WScript.Arguments.Item(0), 1, True) Set re = New RegExp re.Pattern = "\b(\w+)\s\1\b" lineNbr = 0 Do While Not s.AtEndOfStream line = s.ReadLine() lineNbr = lineNbr + 1 If re.Test(line) Then WScript.Echo "Found match: '" & line & "' at line " & lineNbr End If Loop s.Close
How It Works
The most important aspect of this regular expression is the back reference, which is \1 in all the previous recipes. The back reference is just a way of saying “whatever you found in the first group.” The parentheses in the expression define the group. Here’s a breakdown of the expression:
| Regular Expression | Description |
|---|---|
| \b | is a word boundary, followed by . . . |
| (...) | a group (explained next), then . . . |
| \s | a space . . . |
| + | one or more times, then . . . |
| \1 | whatever was found in the group, and lastly . . . |
| \b | a word boundary. |
| The group is simply (\w+), which is as follows: | |
| \w | a word character . . . |
| + | found one or more times. |
This will match a word. The expression begins and ends with a word boundary anchor. This is to prevent the expression from matching a string such as quarterback backrub. If the word boundary anchors are removed, the expression will start matching subsections of words.
