Using Regular Expressions to Match Twitter Users and Hashtags

If you want to find Twitter usernames and hashtags in tweets and do something with them, like turn them into links when you’re displaying them on your website, the most compact way of doing so is through regular expressions. However, most of the articles I looked through on the web mess up the regexp.

Usernames start with a “@”, while hashtags start with a “#”. Since usernames and hashtags will only have letters, numbers, or underscores in them, most all of the examples on the web use a regexp like so:

@([A-Za-z0-9_]+)

There’s only one problem: if you have an email address in a tweet, it’ll match on that. Run that regular expression on “Email me at spammy@mailinator.com” and you’ll match on “mailinator” as a username when it’s not. What you really need to do is make sure that there’s nothing in front of the “@” or “#” but whitespace or the beginning of the string.

For completeness, here’s example code to add links to both usernames and hashtags in a bunch of different languages.

Javascript

<script type="text/javascript">
    String.prototype.linkify_tweet = function() {
   var tweet = this.replace(/(^|\s)@(\w+)/g, "$1@<a href="http://www.twitter.com/$2">$2</a>");
   return tweet.replace(/(^|\s)#(\w+)/g, "$1#<a href="http://search.twitter.com/search?q=%23$2">$2</a>");
 };
</script>

PHP

function linkify_tweet($tweet) {
    $tweet = preg_replace('/(^|\s)@(\w+)/',
        '\1@<a href="http://www.twitter.com/\2">\2</a>',
        $tweet);
    return preg_replace('/(^|\s)#(\w+)/',
        '\1#<a href="http://search.twitter.com/search?q=%23\2">\2</a>',
        $tweet);
}

Python

import re

def linkify_tweet(tweet):
    tweet = re.sub(r'(\A|\s)@(\w+)', r'\1@<a href="http://www.twitter.com/\2">\2</a>', tweet)
    return re.sub(r'(\A|\s)#(\w+)', r'\1#<a href="http://search.twitter.com/search?q=%23\2">\2</a>', tweet)

Perl

$s =~ s{(\A|\s)@(\w+)}{$1@<a href="http://www.twitter.com/$2">$2</a>};
$s =~ s{(\A|\s)#(\w+)}{$1#<a href="http://search.twitter.com/search?q=%23$2">$2</a>};

(Javascript approach taken from Simon Whatley)

18 Comments