Revised Homograph Attacks

December 29, 2019

Preface

Homograph attacks have been known to phish users through the use of lookalike sites, and other than a few deviations like plagiarism detection bypasses, most of the attacks have been focused on domain phishing.

We felt that the attack surface has a lot of untapped potential:

Not all browsers are made equal
Existing art only scratches the surface

We, therefore, set out to find additional attack vectors that attackers can use to phish users.

The research was inspired originally by the work of Xudong Zheng, and we wanted to take it a step further.

This blog post is the first part in a series of posts that shows a complete, end-to-end, phishing framework POC, that we developed in order to highlight the security vulnerabilities in applications and infrastructure we use on a daily basis.

This blog post will present a revised homograph attack vector, but will not delve into the specifics of the many definitions in this field. We’ll leave that to the 2nd post.

Introduction

So what exactly is a Homograph attack?

From Wikipedia:

The internationalized domain name (IDN) homograph attack is a way a malicious party may deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters look alike.

This is the best known, most prevalent form of attack (2019) for this attack surface. Another example of homograph abuse is plagiarism, where the attacker replaces characters with seemingly identical ones, but with a different Unicode value - thus fooling automated checkers and passing the work off as original.

Over time, several mitigations were developed, the best-known of which is presenting the user the domain in its Punycode form. In short, Punycode transforms a Unicode string into an ASCII string, so “wikipedia.org” with a Cyrillic “e” will be displayed as “xn–wikipdia-g8g.org”.

As mentioned, homograph attacks were discussed almost entirely from a URL spoofing perspective, since domain spoofing on the web is the prevalent attack vector, but there are other attack vectors to be considered.

Homograph Email Attacks

What about email-based homograph attacks? If we have a registered domain, we can sign the domain so it would look “Secure”, but we can also connect a mail service to it. Doing so enables us to send emails on behalf of that domain.

When setting up an email server, an attacker can set the name of the email contact to whatever he/she pleases. By setting up an email that looks exactly like a known contact and setting it with the name of the contact, the chance for a successful phishing attack increases dramatically.

But will this work?

Current Email Mitigations

First, we have to examine the mitigations that currently exist:

SPF – The Sender Policy Framework domain record tells the world what hosts or IP’s can send emails on behalf of the domain. When email servers receive an email that appears to be from a certain domain, they can look up the SPF record, and verify whether the sending server is part of it.

But we’re the owners of our spoofed domain! Of course we can send emails from our domain, which means SPF is transparent to this kind of attack.

DKIM - Domain Keys Identified Mail (DKIM) is a method of email authentication that cryptographically verifies that an email was sent by trusted servers and is untampered.

This will verify that our emails had reached the victim intact, good for us!

DMARC - Domain-based Message Authentication, Reporting and Conformance - allows the domain owner to specify what should happen with failed emails as well as get feedback.

This mitigation has no adverse affects on our attack vector. In fact, we can ignore the configuration, or set it according to our needs.

S/MIME - S/MIME provides a digital signature to confirm that the sender’s email address was actually the email address used to send it. Verified email address indicates that the associated email address is validated by a digital signature.

Again, as we’re the owners of our spoofed domain, the victim will be able to authenticate that we’re the ones who signed the email, as we’re the owners of our domain, so S/MIME is also transparent to this attack vector.

Upon registering the spoofed domain, the attacker can send forged emails that will seem to come from one of the victim’s contacts, or – in a business scenario – from a key authority within the organization. Moreover, unlike previous attacks where one would try to forge the real email of the victim (in the hope of lack of SPF, S/MIME, etc.) in this attack, the victim’s reply is correctly routed to the attacker, which greatly increases the chance of success.

Now that we know this, let’s think of an example for an attack:

The attacker waits until the company’s CFO is on holiday or cannot be reached.
An employee in the accounting department gets an email just before the start of the month from the CFO.
In the email, the CFO indicates that one of their suppliers changed their bank account, and gives instructions that the next month’s fulfillment should be transferred to the “new” bank account.

This will never work, right? Who acts directly on emails they receive from a C-level?

Email Providers

The fact that we can bypass the mitigations, does not mean the domain will be shown properly, if at all. Now we need to see which email clients will display our spoofed emails correctly, and under which conditions. We need to map the relevant functions and flows in which the user can notice the spoofing attempt.

A relevant flow might look like this:

The 1st hint that something is wrong is from the inbox display.
When opening the email, the way the “From” field is shown would be the 2nd hint.
A 3rd hint might come from expanding the “From” field or clicking on it. As a marginal portion of the users do that, especially when the e-mail seems to come from a reliable source, this hint usually won’t be visible, and an attacker will be able to implement a successful phishing attack even if the email client doesn’t display the proper data for this field.
Next, if the user replies to the email, the “To” field can give the user another hint that spoofing may have occurred.
Another hint is the automatic reply text that is added to the email upon replying. If the reply text contains the name or the email of the contact in Unicode, the victim will have a hard time trying to spot the difference. Also, if the application doesn’t display this field by default upon composing an email, the users probably won’t be able to use this hint to identify a phishing email. This behavior is adopted by some providers across all platforms, but it is most prevalent on mobile platforms due to screen size, so on these platforms, this hint will usually be missed.
Finally, if the user wishes to compose a new email to that contact, the email that appears in the “To” field may give another hint. However, it is most likely that composing an email will be irrelevant to the attack scenario, since users will just reply to the original email, instead of composing a new one.

Therefore, in our opinion, the most crucial parts of identifying an Email Homograph attack are the Inbox, the “From” field, the “To” field and the Reply Text (on desktop platforms) in the Reply.

We examined several email service providers for these functions, and present below the results for the biggest 3 email providers we think are vulnerable. Appendix A lists the other 3 providers, who are vulnerable in our opinion, as well as the comprehensive list of the email providers we tested across the various platforms.

All the information regarding the different providers/platforms we tested will be uploaded to our Github repo soon.

The vulnerable clients were:

Gmail/G-Suite
Outlook
iCloud
GMX
Yandex
Mail.ru

Following a responsible disclosure, the responses were as follows:

Google - Responded that they will not track this as a security bug as they are already aware of the issue, and there are long term plans to address it.
Microsoft - After a 2nd submission due to misclassification on Microsoft’s side, Microsoft responded that since the attack relies on social engineering, it does not meet the bar for security servicing.
Apple - did not respond.
GMX - Evaluated the report and found it useful.
Yandex - did not respond.
Mail.ru - specifically mentioned in their HackerOne program that they will not accept or even review any IDN Homograph attacks, so no report was sent.

The full list of email providers checked across the various platforms and functions can be found in Appendix B.

Following is a short legend in order to understand the charts of each email provider:

U is Unicode - It means the relevant feature displays the email In Unicode form.
N is Name - It means the relevant feature displays the name of the contact.
P is Punycode - It means the relevant feature displays the email in Punycode form.

Note: On some platforms and email providers, the function of “On Email Click”, doesn’t provide any more info than “From field expanded“. However, in order to provide a coherent table and to be able to compare one service to another, the field was duplicated in those email providers.

Gmail/G-Suite:

It is worth noting that Gmail implemented some security features, that in some cases block an email from even reaching the inbox, or display error messages with various levels of warnings to the user (big red warning or yellow warning). We discovered that Gmail mainly implemented blocking of domains that looks like Gmail, or (to a lesser extent) facebook/facebookmail. A more detailed explanation is in the next section.

Outlook:

**Outlook does not save contacts - only if the contact is actively put in the contact list, it will appear upon composing an email.

iCloud mail:

**NA means the contact is not available (unless the user added it in the first place)

Here’s how the different fields look across the various email providers:

Gmail

The inbox (similar to most email providers) shows the sender’s name. In this example, it’s “Notifications” with an empty character listed as the last name.

When a user opens the email, they will see the name of the contact and the sender’s email in Unicode.

If the user is suspicious, he/she might click on the sender’s email to get further information. In the window, we can see the sender’s name and email in Unicode, as well as the the field “Signed by”. This field always shows the domain name at the prefix in its Punycode form.

In Gmail, this might be the only way for a user to differentiate between a regular email and a phishing scam. It is worth noting that Google magic declared this email as “Important”, as we can see in the bottom of this window, which might make this email look even more reliable.

Furthermore, if the user responds, the “To” field will display the sender’s email in Unicode form.

If the user will click on the “..” in the email, the reply text will be shown and Gmail will display it in its Unicode form. It is worth noting, that since the reply text is a part of the email, once the reply text is written, it’ll remain that way, because it is not metadata but part of the email’s text.

Last thing for this flow, if the user wants to compose a new email, the email in the “To” field appears also in Unicode.

If the user already exists in the victim’s contact list, upon typing the name of the contact, more than one contact might appear, that looks exactly the same. That means there is no plausible way to differentiate between the two emails. The usual case will be to just click on either one, which might be the one used in the phishing attempt.

On the bright side, Gmail implemented some features to alert its users against such an attack. When replying, Gmail might present the user with a yellow warning.

Gmail also implemented some heavy restrictions on Gmail lookalike domains, where it will block emails from these domains if detected and directed straight to a Gmail/G-Suite user.

It will display a big, red warning if the email is part of a thread that someone else replied to (even if the first email wasn’t received because of the block)

But as this is a partial fix, and mainly applied to a limited number of domains, it can be bypassed when attacking organizations that use the G-Suite platform. Moreover, the G-Suite/Gmail services can be used against it to enhance the credibility of the attacks.

For instance, the attacker might share some relevant information to increase their credibility. This could take the form of personal information, for private targets, or related documents, in a business setting.

The attacker might also send partial, old, or expired documents in order to get the user to return the newer version, or other documents, to the attacker. “Here’s the documents I have, could you please see if these are all of the documents you need? If not, please tell me what else to attach. Also, if you have other documents that belong to me, can you please send them? I just can’t find them anywhere…”

As we can see, the sharing of a photo album has been mailed by Google, and as such, is signed by it. Therefore, it’s hard for a user to distinguish between a real share and a phishing attempt.

There’s also the ability to use forms. This can also be used as a powerful phishing tool, as it can be used to get sensitive information from the targets.

Again, in this attack we can find out the mail is forged by examining the certificate, even though the email is presented in its Unicode form.

The last example that shows the impact of a possible attack: we registered the domain faceþookmail.com, that looks very similar to one of the domains that Facebook uses for their emails, facebookmail.com (the difference being the ‘b’, which is actually a þ which is a “Latin Small Letter Thorn” and its codepoint is U00FE).

As we can see from the screenshot, the sender’s email looks very much like facebookmail.com, to the point where unless closely examined, it will easily pass as legitimate. The certificate, however, reveals that the domain is not the same, but only to those that click on the email AND know what they are looking for.

We discussed the web client of Gmail/G-Suite, but as we be see from the chart, the Android and iOS applications behave similarly. For more examples, please go to our Github repo (that will be updated in a short while).

Gmail/G-Suite is not the only vulnerable email provider - Microsoft Outlook is vulnerable, as well. Its most used client, Outlook for Windows, displays most of the relevant information in Unicode (with the exception of the compose functionality, that displays the recipient name in Punycode).

Outlook

First, Outlook’s inbox displays the email sender’s name.

Upon opening the email, Outlook displays the “From” field data in Unicode form.

Although there is an “Action Items” prompt, it is not clear what the user should do with it, and even if the user does click on it, no action or warning will be displayed:

Sometimes, Outlook will present a relevant message, but it is displayed in the same way as any other informational message, so it’s highly unlikely to gain the attention of users.

When replying, Outlook will display both the “To” field and the reply text in Unicode form.

Only when composing an email and typing the contact’s name, the destination email will be displayed in Punycode form:

However, once selected, the “To” field will display the contact’s email in Unicode form.

iCloud mail

Here, the inbox as well as the “From” field display the name of the sender.

Upon clicking on the sender’s name in the “From” field, the full contact’s name will be displayed, as well as the email in Unicode form.

When replying, the “To” field displays the name of the contact, and the reply text shows the email in Unicode form.

In iCloud’s web client, the compose function needs the contact to be in the contact list in advance in order for it to appear in the “To” field while typing, so a user will have to manually copy it from a previous email in order to send another email.

Appendix A - The Remaining Vulnerable Email Providers

GMX

The inbox shows the sender’s name, and upon opening the email, the “From” field will show the name of the sender.

Upon clicking the name of the sender to get further information, the sender’s email will be shown in Punycode form.

Furthermore, if the user responds, the “To” field will show the sender’s name and the reply text will be shown in Punycode form. However, it is a bit masked inside the rest of the text, and can be easily missed.

When composing an email, the contacts will be shown in Punycode form.

Yandex

The inbox displays the sender’s name, and upon hovering the mouse over the sender’s name, the email will appear in Unicode form.

When opening the email, the “From” field will show the name of the sender, as well as the full email in Unicode form. Hovering on the name of the sender to get further information, the sender’s email will be shown in Unicode form.

Furthermore, if the user responds, the “To” field will show the sender’s name and email in Unicode form. The reply text will also be shown in Unicode form.

When composing an email, the name of the contact will be shown, as well as the contact’s email in Unicode form.

Mail.Ru

The inbox shows the sender’s name, and upon hovering the mouse over the sender’s name, the email will appear in Unicode form.

Upon opening the email, the “From” field will show the sender’s name.

Upon clicking the name of the sender to get further information, the sender’s name will be shown as well as the email in Unicode form.

Furthermore, if the user responds, the “To” field will show the sender’s name and the reply text will be shown in Unicode form. Upon hovering on this field, the email will be presented in Unicode form.

When composing an email, the contacts will be shown in Unicode form.

Appendix B - Charts By Function

We’ll now show the state of the email providers that we checked across the various platforms by functions.

Inbox function

The “From” field

From field expanded

Clicking or hovering on the “From” field

Note: not all providers implemented the same functionality on this one, so in order to show this in a rather concise way, if the feature is not implemented (that is, if nothing happens when clicking or hovering, the value from the previous chart - “From field expanded” - was taken)