Using Regular Expressions to Match Twitter Users and Hashtags

If you want to find Twitter usernames and hashtags in tweets and do something with them, like turn them into links when you’re displaying them on your website, the most compact way of doing so is through regular expressions. However, most of the articles I looked through on the web mess up the regexp.

Usernames start with a “@”, while hashtags start with a “#”. Since usernames and hashtags will only have letters, numbers, or underscores in them, most all of the examples on the web use a regexp like so:

@([A-Za-z0-9_]+)

There’s only one problem: if you have an email address in a tweet, it’ll match on that. Run that regular expression on “Email me at spammy@mailinator.com” and you’ll match on “mailinator” as a username when it’s not. What you really need to do is make sure that there’s nothing in front of the “@” or “#” but whitespace or the beginning of the string.

For completeness, here’s example code to add links to both usernames and hashtags in a bunch of different languages.

Javascript

<script type="text/javascript">
    String.prototype.linkify_tweet = function() {
   var tweet = this.replace(/(^|\s)@(\w+)/g, "$1@<a href="http://www.twitter.com/$2">$2</a>");
   return tweet.replace(/(^|\s)#(\w+)/g, "$1#<a href="http://search.twitter.com/search?q=%23$2">$2</a>");
 };
</script>

PHP

function linkify_tweet($tweet) {
    $tweet = preg_replace('/(^|\s)@(\w+)/',
        '\1@<a href="http://www.twitter.com/\2">\2</a>',
        $tweet);
    return preg_replace('/(^|\s)#(\w+)/',
        '\1#<a href="http://search.twitter.com/search?q=%23\2">\2</a>',
        $tweet);
}

Python

import re

def linkify_tweet(tweet):
    tweet = re.sub(r'(\A|\s)@(\w+)', r'\1@<a href="http://www.twitter.com/\2">\2</a>', tweet)
    return re.sub(r'(\A|\s)#(\w+)', r'\1#<a href="http://search.twitter.com/search?q=%23\2">\2</a>', tweet)

Perl

$s =~ s{(\A|\s)@(\w+)}{$1@<a href="http://www.twitter.com/$2">$2</a>};
$s =~ s{(\A|\s)#(\w+)}{$1#<a href="http://search.twitter.com/search?q=%23$2">$2</a>};

(Javascript approach taken from Simon Whatley)

Share

18 Comments

  1. on April 6, 2009 at 6:44 pm | Permalink

    What about other #punctuation/#symbols? (@sargent like this)

  2. on April 7, 2009 at 9:50 am | Permalink

    I have no interest in such things!

  3. on April 7, 2009 at 9:14 pm | Permalink

    Thanks, this is a very useful blog post. I am building hash tag support into my forum so that users can tag their posts with keyword information. However, I couldn’t figure out how to only detect hash characters at the start of a new line, or with a whitespace in front, and so pasted URLs were breaking!

  4. on April 8, 2009 at 8:29 am | Permalink

    I’m glad you found this useful!

  5. on April 21, 2009 at 11:36 am | Permalink

    Gee, I wonder why you’re wrangling this, Stephen … hehehehehehehe.

  6. Marcelino Dornas
    on May 6, 2009 at 4:16 pm | Permalink

    C#.NET – Using Regular Expressions to Match Twitter Users

    string b = Regex.Replace(a, @”(\A|\s)@(\w+)”, @”@$2“);

  7. on November 16, 2009 at 4:15 am | Permalink

    Note that this will destroy your HTML if you have something like this:

    <a href=”http://www.example.com” title=”Bla @test blubb”>Don’t break!</a>

  8. on November 16, 2009 at 7:12 pm | Permalink

    Ah, good point. Sadly there’s no good way around it using regexp. Parsing is probably the true and correct way to go.

  9. on January 13, 2010 at 5:47 pm | Permalink

    Same method as above in ruby for those interested:

    def linkify_tweet(tweet)
    tweet.gsub!(/(^|\s)#(\w+)/, ‘\1#\2‘)
    tweet.gsub!(/(^|\s)@(\w+)/, ‘\1@\2‘)
    end

  10. Graphity
    on February 4, 2010 at 6:13 pm | Permalink

    Question: Why don’t you use “/#([^ ]+)/” for hashtags, so you also capture non-Ascii tags?

  11. on February 9, 2010 at 7:22 pm | Permalink

    Graphity: I tend to be cautious about using match-anything regexps, so deliberately limited it to letters, numbers, or an underscore. You certainly could use the match-anything regexp you posted if you wanted to be more liberal in what you match.

  12. Graphity
    on March 3, 2010 at 1:22 pm | Permalink

    @Stephen: Ah, I understand. I just saw problems using localized hashtags, like, for me, German umlauts.

  13. Temurid
    on July 4, 2010 at 6:05 am | Permalink

    Helo,
    I need to make a web page where I can show tweets of say two different categories. I found by searcg that hash tag is a way to find tweeks of different types. But I do not find any help how to use these in php i.e. to find hashtags using php. I will be thankful for any help.

  14. Nick
    on August 29, 2010 at 10:55 pm | Permalink

    in AS3 for those interested. Works great.

    public static function parseTweetUsersAndTagsToLinks( tweet:String ):String
    {
    tweet = tweet.replace(/(^|\s)@(\w+)/g, “$1@$2“);
    return tweet.replace(/(^|\s)#(\w+)/g, “$1#$2“);
    }

  15. on January 13, 2011 at 4:20 am | Permalink

    Cool post.

    Btw: As I know, the punction of apostrophe( ‘ ) can also be a part of the hashtag.

  16. on March 3, 2011 at 5:53 pm | Permalink

    any ideas on Java?
    Just stumbled upon your blog looking for a solution to my problem – replace links, usernames and hashtags from tweets: my solution is
    testData.replaceAll(“((?i)http:\\S*?\\s|(?i)http:\\S*?$|@\\S*?|@\\S*?$|\\#[:alnum:]*?|\\#[:alnum:]*?$)”, “[replaced]“)
    Problem is the hashtag gets removed, but not the groupName.
    Any ideas to solve that?

  17. on November 4, 2011 at 4:28 pm | Permalink

    Thank you very much – these regexps will help me enhance http://myretweetedtweets.appspot.com with auto-links for references and hashes!

  18. Felipe
    on June 22, 2012 at 12:41 pm | Permalink

    If you do not want people to type anything before the “@”, in other words.. it has to start with “@”, then it should be this way:

    ^@([A-Za-z0-9_]+)

2 Trackbacks

  1. By Twitter PHP Badge with Caching « Kien Tran on May 22, 2009 at 1:27 am

    [...] with email addresses inside of the tweet, and does not create an twitter link for them. Thanks to Live Grenades writer Stephen for the regular [...]

  2. [...] you are displaying tweets gracefully on your website.  In the case of tweets you may want to use appropriate regular expressions as well to add links to Twitter usernames and [...]