ASP.NET Search for Repeated Words

From Regex Regular Expression Encyclopedia

Jump to: navigation, search

You can use this recipe to find words that appear more than once on a line, such as the the.

[edit] code

<%@ Page Language="vb" AutoEventWireup="false" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head><title></title>
</head>
<body>
    <form Id="Form1" RunAt="server">
    <asp:TextBox id="txtInput" runat="server"></asp:TextBox>
    <asp:RegularExpressionValidator Id="revInput" RunAt="server"
        ControlToValidate="txtInput"
        ErrorMessage="Please enter a valid value"
        ValidationExpression=".*\b(\w+)\s\1\b.*"></asp:RegularExpressionValidator>
    <asp:Button Id="btnSubmit" RunAt="server" CausesValidation="True"
        Text="Submit"></asp:Button>
    </form>
</body>
</html>

[edit] How It Works

The most important aspect of this regular expression is the back reference, which is \1 in all the previous recipes. The back reference is just a way of saying “whatever you found in the first group.” The parentheses in the expression define the group. Here’s a breakdown of the expression:

Regular Expression Description
\b is a word boundary, followed by . . .
(...) a group (explained next), then . . .
\s a space . . .
+ one or more times, then . . .
\1 whatever was found in the group, and lastly . . .
\b a word boundary.
The group is simply (\w+), which is as follows:
\w a word character . . .
+ found one or more times.

This will match a word. The expression begins and ends with a word boundary anchor. This is to prevent the expression from matching a string such as quarterback backrub. If the word boundary anchors are removed, the expression will start matching subsections of words.

Personal tools