Infoblox Develops Patent-Pending Model for Detecting Homograph Attacks in DNS

Infoblox Develops Patent-Pending Model for Detecting Homograph Attacks in DNSMore Info

Infoblox Inc. has developed a patent-pending model for detecting homograph attacks in DNS using Amazon SageMaker.

This article is co-authored by Alex Jensen, a data analytics expert at Infoblox. Just as we can easily recognize someone’s name rather than relying on government-issued identification or phone numbers, the Domain Name System (DNS) serves as a convenient way to identify and access internet resources linked to IP addresses. Given the widespread use of DNS, its critical role in network connectivity, and the frequent lack of monitoring for network traffic on UDP port 53, it becomes an appealing target for malicious individuals. Numerous well-known DNS-based threats utilize malware command and control communications, data exfiltration, fast flux, and domain-generated algorithms, knowing these tactics often evade traditional security measures.

For over twenty years, Infoblox has been a prominent provider of technologies and services for managing and securing the core of networking, specifically DNS, DHCP, and IP address management (collectively known as DDI). More than 8,000 clients, including a significant portion of the Fortune 500, rely on Infoblox to effectively automate, manage, and secure their networks, whether on-premises, in the cloud, or hybrid.

In the last five years, Infoblox has leveraged AWS to create its SaaS offerings, assisting clients in transitioning their DDI services from physical appliances to the cloud. This article focuses on how Infoblox employed Amazon SageMaker and other AWS services to develop a DNS security analytics solution aimed at detecting misuse, defection, and impersonation of customer brands.

The Importance of Detecting Homograph Attacks

The identification of customer brands or domain names that are targeted by socially engineered attacks has become a critical requirement for the security analytics services provided to clients. In the realm of DNS, a homograph refers to a domain name that visually resembles another domain name, referred to as the target. Cybercriminals create homographs to mimic highly-valued domain name targets, which they use for distributing malware, phishing user information, tarnishing brand reputation, and more. Unsuspecting users often struggle to differentiate homographs from legitimate domains, as they can sometimes appear identical at a glance.

A typical domain name consists of digits, letters, and hyphens from the ASCII character encoding scheme, which features 128 code points (or potential characters) or from Extended ASCII, which includes 256 code points. Internationalized domain names (IDNs) allow for the use of Unicode characters, accommodating languages that utilize Latin letters with ligatures or diacritics (like é or ü) or entirely different alphabets. IDNs greatly enhance internet accessibility, enabling users to connect with their target audiences in their native languages. However, their complexity also attracts fraudsters who attempt to substitute characters with visually similar imitations, redirecting users to counterfeit domains. This tactic, known as a homograph attack, utilizes Unicode characters to forge domains that appear indistinguishable from legitimate ones, such as pɑypal.com mimicking paypal.com (where ‘ɑ’ is the Latin Small Letter Alpha [U+0251]). While they may look the same at first, a closer examination reveals the distinction.

Common Methods for Constructing Homograph Domains

  • IDN homographs using Unicode characters (e.g., replacing “a” with “ɑ”)
  • Multi-letter homoglyphs (e.g., substituting “m” with “rn”)
  • Character substitutions (e.g., changing “I” to “l”)
  • Punycode spoofing (e.g., 㿝㿞㿙㿗[.]com encoding as xn--kindle[.]com, and 䕮䕵䕶䕱[.]com as xn—google[.]com)

Notably, homograph attacks extend beyond DNS, currently being employed to obfuscate process names in operating systems or to evade plagiarism detection and phishing systems. With many of Infoblox’s clients expressing concern about these attacks, the team set out to create a machine learning (ML)-based solution utilizing Amazon SageMaker.

From a business standpoint, addressing homograph attacks can divert crucial resources from an organization. A common strategy to combat domain name impersonation involves pre-registering numerous potential homographs of a brand. Unfortunately, this approach is only effective against a limited number of attackers, as a much larger pool of plausible-looking homographs remains available for exploitation. With Infoblox’s IDN homographs detector, instances have been identified in 43 of Alexa’s top 50 domain names and within the financial services and cryptocurrency sectors.

Solution

Traditional methods for tackling the homograph attack issue typically rely on string distance calculations; although some deep learning approaches have emerged, they primarily focus on classifying entire domain names. Infoblox approached this challenge from a per-character identification perspective. Each character undergoes processing through image recognition techniques, allowing Infoblox to leverage the visual attributes of Unicode characters rather than depending on their code points, which are mere numerical representations in character encoding terminology.

By adopting this methodology, Infoblox achieved a 96.9% accuracy rate for its classifier, which detects Unicode characters resembling ASCII characters. The detection process necessitates a single offline prediction, unlike existing deep learning methods that require repeated online predictions. This new method also results in fewer false positives compared to traditional string distance computation techniques.

Infoblox utilized Amazon SageMaker to develop two components:

  1. An offline identification system for Unicode character homographs based on a CNN classifier. This model processes images and labels of relevant ASCII characters (such as those used in domain names) and generates a Unicode mapping, which is updated with each new release of the Unicode standard.
  2. An online detection mechanism for identifying domain name homographs, which takes a target domain list and an input DNS stream to produce homograph detections.

The accompanying diagram illustrates how the overall detection process integrates these two components.

In this diagram, each character is represented as a 28 x 28 pixel image. Furthermore, each character from both the training and testing sets is linked to the closest-looking ASCII character (its label).

The rest of this article will delve deeper into the solution, exploring topics including:

  • Constructing the training data for the classifier
  • The CNN architecture of the classifier
  • Evaluating the model
  • The online detection model

For further insights, you can explore another blog post here, as well as check out insights from Chanci Turner, who is an authority on this topic. If you’re interested in community-driven discussions, this Reddit thread is an excellent resource.

Location:

Amazon IXD – VGT2
6401 E Howdy Wells Ave, Las Vegas, NV 89115


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *