{"id":2704,"date":"2009-04-06T18:08:34","date_gmt":"2009-04-06T23:08:34","guid":{"rendered":"http:\/\/granades.com\/?p=2704"},"modified":"2009-04-06T18:08:34","modified_gmt":"2009-04-06T23:08:34","slug":"using-regular-expressions-to-match-twitter-users-and-hashtags","status":"publish","type":"post","link":"https:\/\/granades.com\/?p=2704","title":{"rendered":"Using Regular Expressions to Match Twitter Users and Hashtags"},"content":{"rendered":"<p>If you want to find <a href=\"http:\/\/www.twitter.com\/\">Twitter<\/a> usernames and hashtags in tweets and do something with them, like turn them into links when you&#8217;re displaying them on your website, the most compact way of doing so is through regular expressions. However, most of the articles I looked through on the web mess up the regexp.<\/p>\n<p>Usernames start with a &#8220;@&#8221;, while hashtags start with a &#8220;#&#8221;. Since usernames and hashtags will only have letters, numbers, or underscores in them, most all of the examples on the web use a regexp like so:<\/p>\n<p><pre><code>\r\n@([A-Za-z0-9_]+)\r\n<\/code><\/pre>\n<\/p>\n<p>There&#8217;s only one problem: if you have an email address in a tweet, it&#8217;ll match on that. Run that regular expression on &#8220;Email me at spammy@mailinator.com&#8221; and you&#8217;ll match on &#8220;mailinator&#8221; as a username when it&#8217;s not. What you really need to do is make sure that there&#8217;s nothing in front of the &#8220;@&#8221; or &#8220;#&#8221; but whitespace or the beginning of the string.<\/p>\n<p>For completeness, here&#8217;s example code to add links to both usernames and hashtags in a bunch of different languages.<\/p>\n<p><b>Javascript<\/b><\/p>\n<pre><code markup=\"none\">\r\n<script type=\"text\/javascript\">\r\n    String.prototype.linkify_tweet = function() {\r\n   var tweet = this.replace(\/(^|\\s)@(\\w+)\/g, \"$1@<a href=\\\"http:\/\/www.twitter.com\/$2\\\">$2<\/a>\");\r\n   return tweet.replace(\/(^|\\s)#(\\w+)\/g, \"$1#<a href=\\\"http:\/\/search.twitter.com\/search?q=%23$2\\\">$2<\/a>\");\r\n };\r\n<\/script>\r\n<\/code><\/pre>\n<p><b>PHP<\/b><\/p>\n<pre><code markup=\"none\">\r\nfunction linkify_tweet($tweet) {\r\n    $tweet = preg_replace('\/(^|\\s)@(\\w+)\/',\r\n        '\\1@<a href=\"http:\/\/www.twitter.com\/\\2\">\\2<\/a>',\r\n        $tweet);\r\n    return preg_replace('\/(^|\\s)#(\\w+)\/',\r\n        '\\1#<a href=\"http:\/\/search.twitter.com\/search?q=%23\\2\">\\2<\/a>',\r\n        $tweet);\r\n}\r\n<\/code><\/pre>\n<p><b>Python<\/b><\/p>\n<pre><code markup=\"none\">\r\nimport re\r\n\r\ndef linkify_tweet(tweet):\r\n    tweet = re.sub(r'(\\A|\\s)@(\\w+)', r'\\1@<a href=\"http:\/\/www.twitter.com\/\\2\">\\2<\/a>', tweet)\r\n    return re.sub(r'(\\A|\\s)#(\\w+)', r'\\1#<a href=\"http:\/\/search.twitter.com\/search?q=%23\\2\">\\2<\/a>', tweet)\r\n<\/code><\/pre>\n<p><b>Perl<\/b><\/p>\n<pre><code markup=\"none\">\r\n$s =~ s{(\\A|\\s)@(\\w+)}{$1@<a href=\"http:\/\/www.twitter.com\/$2\">$2<\/a>};\r\n$s =~ s{(\\A|\\s)#(\\w+)}{$1#<a href=\"http:\/\/search.twitter.com\/search?q=%23$2\">$2<\/a>};\r\n<\/code><\/pre>\n<p>(Javascript approach taken from <a href=\"http:\/\/www.simonwhatley.co.uk\/parsing-twitter-usernames-hashtags-and-urls-with-javascript\">Simon Whatley<\/a>)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you want to find Twitter usernames and hashtags in tweets and do something with them, like turn them into links when you&#8217;re displaying them on your website, the most compact way of doing so is through regular expressions. However, most of the articles I looked through on the web mess up the regexp. Usernames &hellip; <a href=\"https:\/\/granades.com\/?p=2704\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Using Regular Expressions to Match Twitter Users and Hashtags<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[],"tags":[],"class_list":["post-2704","post","type-post","status-publish","format-standard","hentry"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/granades.com\/index.php?rest_route=\/wp\/v2\/posts\/2704","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/granades.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/granades.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/granades.com\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/granades.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2704"}],"version-history":[{"count":11,"href":"https:\/\/granades.com\/index.php?rest_route=\/wp\/v2\/posts\/2704\/revisions"}],"predecessor-version":[{"id":2715,"href":"https:\/\/granades.com\/index.php?rest_route=\/wp\/v2\/posts\/2704\/revisions\/2715"}],"wp:attachment":[{"href":"https:\/\/granades.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2704"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/granades.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2704"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/granades.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2704"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}