Rajas Save

Dissecting PunyCode - Not All Characters Are Created Equal

Blog Post created by Rajas Save Employee on May 3, 2017

PunyCode is a special encoding used to convert Unicode characters to ASCII, which is a smaller, restricted character set and used to encode internationalized domain names (IDN) [1]. PunyCode is a way to represent Unicode within the limited character subset of ASCII used for Internet host names. For example, "München" (German name for the city of Munich) would be encoded as "Mnchen-3ya". Using PunyCode, host names containing Unicode characters are transcoded to a subset of ASCII consisting of letters, digits, and hyphen (the Letter-Digit-Hyphen (LDH) subset, as it is called)[2].


There exist non-Latin character sets which contain code points (characters) that, when displayed, look like Latin code points:

ASCII 0x61 -> a

Unicode U0430 -> a (0x0430 if UTF16, but 0xd0b0 if UTF8)

A name consisting of characters that look like Latin characters is a different name than if it consisted of Latin characters.

xn--80ak6aa92e -> 0xd0b0d180d180d38fd0b5

0xd0b0 (UTF8 U430) -> "a" (Cyrillic small letter "a")

0xd180 (UTF8 U440) -> "p" (Cyrillic small letter "er")

0xd180 (UTF8 U440) -> "p" (Cyrillic small letter "er")

0xd38f (UTF8 U4CF) -> "ӏ" (Cyrillic small letter "palochka")  ...and so on...


Byte sequences like 0xd0b0, 0xd180, et al can't be used in things like domain names, etc.  The RFC 3492 document defines a general algorithm called Bootstring that allows a string of basic code points to uniquely represent any string of code points drawn from a larger set. PunyCode is an instance of Bootstring that uses particular parameter values specified by this document, appropriate for IDN.


This threat advisory discusses how to detect IDN homograph Phishing attacks using RSA NetWitness Logs & Packets.


PunyCode Detection using IDN_homograph Parser

IDN_homograph lua parser detects punyCode-encoded internationalized domain names which use non-Latin Unicode code points whose glyphs resemble those of Latin Unicode code points and registers the decoded homograph as analysis.service meta.

  • service - host as which the homograph is masquerading
  • ioc - indicators of compromise - homograph detected


IDN_homograph lua parser is now available on RSA Live:





Host aliases encoded with PunyCode:



Meta registered in RSA NetWitness Investigation:

  • host: www.xn--80ak6aa92e.com
  • ioc: homograph detected
  • service: www.apple.com


Below is screenshot of IDN_homograph parser detecting IDN homograph attacks:



Detection of homographs used in Phishing Emails


If an email contains: <a href="http://www.xn--80ak6aa92e.com">http://www.apple.com</a>

Then Phishing_lua parser will register:  risk.warning - href host doesn't match displayed host as well as the same IDN meta from IDN_homograph as above.

If an email contains: <a href="http://www.xn--80ak6aa92e.com">http://www.xn--80ak6aa92e.com</a>

Then there is no mismatch, but the host will still be registered from Phishing_lua parser, and the same IDN detection will be

done by IDN_homograph parser.




Event Stream Analysis for Detecting PunyCode Phishing Attempts


Event Stream Analysis (ESA) rule identifies mail sessions that have a PunyCode hostname and also have a mismatch between the hostname in a link (href) and the text in the same link containing an IDN homograph.  This suspected phishing attempt is then followed by HTTP(S) traffic with the same hostname in the certificate or in the host.



ESA rule will alert based on presence on PunyCode in emails, which is detected using ioc’s and analysis_service meta generated from IDN_homograph and mail protocol parsers. It also does looks for sessions on which uses same alias host over HTTP(S).



Event Stream Analysis Rule for PunyCode Phishing Attempt is now available on RSA Live:





Thanks goes to Sean LimWilliam Motley and Angela Stranahan for contributing to this threat advisory.



  1. IDN converter: https://www.punycoder.com/
  2. https://en.wikipedia.org/wiki/Punycode