When it comes to firewalls or filtering techniques, you hear different terms like static filtering, stateful firewall, deep packet inspection, Next generation firewall (NGFW), and so on.
While the usage of some of these terms has become fairly standard (e.g. static filtering simply checks IP and maybe TCP headers to make filtering decisions), some of them are marketing buzzwords and are still developing (e.g. what is NGFW? Is it just another name for Unified Threat Management?).
In this article, we will ignore the fluff and buzzwords and take a look at Deep Packet Inspection (DPI) for what it really is, how it does what it does, why organizations use it, some of the challenges it faces, and some of the tools that can be used to perform DPI.
We will also briefly consider how DPI is different from packet capture/protocol analysis.
Deep Packet Inspection (DPI)
As mentioned in the introduction above, the state of firewall/filtering techniques has evolved over the years. Initially, we had Static/Stateless filtering where traffic is checked against rules that match on source/destination IP addresses/ports.
In this technique, every packet (irrespective of whether that packet is standalone or part of a traffic flow) is checked against the filtering rules. While this technique still has its place in today’s network (i.e. Access Control Lists), it doesn’t scale well for current security needs.
We then moved on to Stateful filtering (or Stateful Packet Inspection) which basically keeps track of the state of connections. For example, when a stateful firewall sees a SYN packet, it keeps track of that TCP connection expecting to see the corresponding SYN+ACK and ACK packets.
If there is any funny business happening in the exchange (e.g. SYN+ACK is seen without an initial SYN packet), the firewall knows that is probably a security attack.
But as firewall techniques got better, so did hackers. How about if everything is fine at the IP and Transport layers (referencing the OSI model), but the real threat is contained in the data portion of the packet?
For example, many applications (e.g. Skype, P2P torrent applications) run on standard HTTP and HTTPS ports. A firewall using static and/or stateful filtering will allow traffic from those applications thinking it is normal web traffic.
This brings us to the third type of firewall technology: Deep Packet Inspection. Basically, DPI is able to not just inspect the general information carried by a packet but also inspect the contents of the packet itself. So DPI is able to say, “while the port being used is HTTP, the application carried is actually Skype”. This opens up a whole new world of opportunities and challenges.
Note: We are using the term “packet” loosely here to mean any application layer data/payload that has been encapsulated with lower-level protocols (e.g. TCP, IP, Ethernet). Basically, “Packet” in this article means Headers+Payload.
Therefore, we can say that static and stateful filtering look at the headers while DPI looks deep into the payload.
How DPI works
There are several techniques used to perform DPI on traffic. Let’s look at some of them here.
This is the simple method of matching applications/protocols to their most common/standard ports. For example, BitTorrent uses the TCP ports 6881-6889 by default.
One of the challenges with this method is that many applications can modify their port behavior either by design or to prevent detection. For example, FTP starts on one port and then moves to another dynamic port to transfer data. Also, applications like BitTorrent may use their standard ports if unblocked, but can also move to other ports if filtering is being done.
Another challenge with port-based detection is that some applications can ride on standard ports fooling detection systems. For example, Skype used to be able to fall back to TCP port 80 (HTTP standard port) when its normal ports are blocked. In recent times, it mostly uses TCP port 443.
Even when applications/protocols change their ports, there are still some strings or patterns that may be recognizable in such applications. For example, this paper discusses a signature matching algorithm for an old version of Skype and one of the things to check is for a packet with a hexadecimal pattern that begins with “80 46 01 03 …”.
One of the issues with this technique is that applications are constantly changing and being updated which means that new signatures will have to be developed for these applications. This results in a cat-and-mouse game.
Heuristic and Behavior Analysis
By studying an application/protocol, you can understand its behavior. This can be done by measuring packet sizes, the timing between packets, and so on. Even if the application/protocol changes its signature, the behavior is likely to remain (fairly) the same allowing for a high degree of accuracy in detection. For example, Voice over IP (VoIP) traffic usually starts with session initiation and then many small-sized UDP packets are used to carry the call traffic (or call details) itself.
While the three techniques discussed above are the common techniques used in DPI implementations, newer forms of detection are being developed especially those that rely on Machine Learning (ML) and Artificial Intelligence (AI).
Finally, keep in mind that most DPI tools will implement a mixture of these techniques in a bid to improve detection and increase accuracy.
The whole point of any type of inspection is the ability to do something about traffic that fits a profile. This could be as simple as generating an alert or triggering an action such as dropping the packet or limiting the bandwidth available to that traffic. In this subsection, we will look at some of the ways DPI is being used across the industry.
Network and Endpoint Security
By looking inside the contents of a packet/traffic flow, firewalls and intrusion detection systems can identify malicious traffic and prevent attacks that will normally be caused by viruses, worms, ransomware, and so on. This is similar to how antivirus programs work on end devices. The difference is that detection can now happen at the network layer even before it gets to the end users.
DPI can also be used for Data Loss Prevention (DLP) purposes in a bid to prevent sensitive information from leaving a company’s network.
One of the major uses of DPI is in the ISP environment. Using DPI, ISPs are able to “snoop” into the contents of the traffic flowing through their network. They can then use this information to improve the user experience on their network. For example, a user that is downloading large files using torrents may negatively impact the experience of other users who simply need to browse websites. In such a case, an ISP may perform traffic shaping such that the traffic of the user downloading large files is rate limited.
The ability to look inside the contents of network traffic makes it easier to tailor adverts to users. For example, my continuous visit to booking.com may mean that I have a trip coming up and I may get served an advert for airline tickets or car rentals.
Having discussed some of the uses and benefits of DPI, let us now turn our attention to the challenges faced by DPI.
DPI and Performance
DPI is processor-intensive because not only does it look into individual packets, it also looks into traffic flows (a flow is a collection of related packets). This is combined with the fact that inspection needs to be done in real-time meaning that latency needs to be reduced to a manageable level. Also, since most firewalls already do so much (stateful packet inspection, NAT, VPN, etc.), adding DPI increases the complexity of the entire system. This can lead to a greater attack surface.
The good news is that performance/processing power increases with time usually at a lower cost than was available in previous years (Moore’s Law).
DPI and Privacy
DPI raises a lot of privacy issues: should anyone (apart from the end user and destination service) know the contents of a user’s traffic? While we have seen the benefits like being able to thwart security attacks, where do we draw the line? This issue also borders on the topic of Net Neutrality, which aims for ISPs to treat all traffic equally without discrimination.
DPI and Encryption
Encryption has particularly been a challenge to DPI: if you can’t look into the contents of the packet, how can you make effective decisions? This is one issue that would not be going away as more applications and websites are enabling encryption. For example, all Skype traffic is fully encrypted. About 93% of all Google traffic (including Gmail, Google maps, Google Drive, YouTube, etc.) is secured using HTTPS. Finally, it is estimated that about 73% of all Internet traffic is now encrypted.
To tackle this “problem” of encryption (even though it is supposed to be a good thing?), companies use various techniques. One of such techniques is SSL inspection where the traffic is decrypted, scanned (possibly with DPI), re-encrypted, and sent to its original destination. Apart from the privacy issues, SSL inspection can also have harmful results when not implemented correctly.
Another DPI technique used to look into encrypted traffic is based on heuristic analysis as we discussed above. One of such DPI implementations that is able to inspect encrypted traffic without decrypting it is Cisco’s Encrypted Traffic Analytics (ETA). By looking at the initial packets used to encrypt the session (those packets are initially unencrypted), you are able to get some insight into the traffic (e.g. server’s certificate name). Also, once encryption has kicked it, you can measure things like packet size, distribution, and so on.
How about when end-to-end encryption is being used by the user as in the case of VPNs? This presents more challenges for DPI (which is why torrent sites usually advise you to use VPN *grin). In this kind of situation, it seems heuristic analysis will still be effective but probably to a lower degree of accuracy.
Deep Packet Inspection Software and Tools
In this subsection, we will look at different Software and Tools that have DPI capability. While there are standalone DPI tools, most of the DPI implementations are usually used inside another device/application. Moreover, in most cases, DPI only provides the analysis – another tool acts on it (e.g. drop traffic).
Some DPI tools include:
- Protocol and Application Classification Engine (PACE) by Rohde&Schwarz is a software library that provides DPI functionality using various techniques such as pattern matching and heuristic analysis. This tool is proprietary (you cannot download it) and is used inside other products such as Lancope StealthWatch (which is now part of Cisco).
- nDPI which is an open source DPI tool based on the now extinct OpenDPI library. While the source code for this tool is available (standalone), it is installed as part of the ntop and nProbe It supports the classification of many protocols including Skype, WhatsApp, and BitTorrent. It can even classify various websites such as Facebook and Google. Finally, it can identify encrypted protocols and applications relatively well. You can download nDPI (as part of ntop) for free here and it is supported on Windows, Mac, and Linux operating systems.
- Cisco Network-Based Application Recognition (NBAR) which is available on many Cisco ISR and ASR devices. It supports over 1000+ classification of applications and sub-applications and can match based on individual applications (e.g. Skype) or groups of applications (e.g. Email). When used in conjunction with Cisco IOS QoS, various policies, like reclassification and dropping, can be applied to matched traffic.
Other examples include Qosmos ixEngine, Netify Agent (Netifyd), NetFort LANGuardian, SolarWinds Network Performance Monitor (has deep packet inspection and analysis), ManageEngine NetFlow Analyzer (has deep packet inspection capability), Cisco ETA (mentioned above), and so on.
Case Study: nDPI
Let’s now look at one of the DPI tools, nDPI, and see what kind of information it can provide. You can download the installer from here and install it on your system (you may need to restart your computer).
Once the application is installed, you can log in at 127.0.0.1:3000.
Note: To see useful information, you may have to select the relevant network interface e.g. your Wi-Fi interface.
Mine has been running for a while and so, nDPI has captured a lot of information. One chart that I like to view from the Traffic Dashboard is the Top Application Protocols:
Notice that while it shows me standard protocols like HTTP, it also shows me “protocols” like GoogleDocs (which I’m using to write this article).
I found it weird that Amazon shows up in my top applications since I haven’t accessed Amazon today. With nDPI, we can drill down into the flows by clicking on any of the categories listed. For example, if I click on “Amazon”, I can see which flows are currently listed:
Aha! It’s not actually Amazon (as in amazon.com) but probably AWS meaning those sites (Upwork and Grammarly) are using AWS for their hosting.
Another thing to notice is that even though these sites are encrypted (HTTPS), nDPI was able to classify them as individual applications separate from generic SSL.
We can view the protocols on a particular interface by navigating to Interfaces > [Interface name] > Protocols:
One thing I’ll like to test is whether nDPI can really detect Skype traffic (which is encrypted). To test this, I will initiate a Skype call and leave it running for some time. All the while, I will be checking the live flows on nDPI.
Once the call started, notice that the Live Flows Count (above) was updated to include Skype which means that the traffic was identified. Let’s click on the Skype flow to find out more:
Not only was the Skype traffic detected, but it also matched that it was a Skype Call! That’s really impressive considering that the traffic is encrypted with SSL.
You can try similar tests for Facebook, Dropbox, and BitTorrent:
DPI vs. Packet Capture/Protocol Analyzer
While researching on this topic, one question that came to mind was, “How is DPI different from Protocol Analysis done by tools like Wireshark?” or more specifically, “Can a tool such as Wireshark be said to perform DPI?”
While Protocol Analyzers like Wireshark can be used to analyze packets at a very low level, most of the analysis needs to be done by the administrator. The information provided by Protocol Analyzers cannot be compared to the depth of detail provided by DPI. Moreover, Wireshark cannot be used to view the contents of encrypted packets except you have the RSA keys or secret. On the other hand, DPI can make an informed analysis of traffic, encrypted or not.
To make the distinction clearer, here is a Wireshark capture of a Skype call similar to the one made for the nDPI test:
From this capture, it is not immediately clear what is happening here; a deeper analysis will be needed. For example, if I had name resolution enabled on Wireshark (or if I do a web search on the 18.104.22.168 IP address), I will discover that it belongs to skype.com. From that, I can postulate that the small-sized UDP packets signify that a call is going on (even though the contents of the UDP packets are encrypted).
Another example is shown below. In this case, BitTorrent was using HTTP port 80 to update trackers for a particular torrent file. Wireshark sees the packet as a normal HTTP packet (because it is actually just a GET request) but a DPI tool will have identified it as BitTorrent:
In summary, DPI and Protocol Analyzers may be similar (packet capturing and analysis), but they have different purposes.
This brings us to the end of this article where we have looked at Deep Packet Inspection and how it is different from other firewall/filtering techniques such as static filtering and stateful packet inspection because it is able to look at not just the headers in a packet but also the contents.
While encryption is a challenge for DPI, we have seen how some tools (such as Cisco ETA and nDPI) are able to accurately inspect encrypted traffic by relying on heuristic and behavior analysis.
As the battle for privacy keep raging on and encryption techniques are being made stronger, perhaps DPI will become obsolete. However, one thing is certain: something else will take its place (Machine Learning? AI?) even if the name changes.