Packet Loss: Understanding, Diagnosing & Fixing in Networks

Have you seen the movie “Thor Ragnarok”?

There was a part in the movie where Thor and Loki were trying to escape from their crazy sister, Hela, who had been in prison for a long time. To get to Asgard (the home of Thor and Loki) from Earth, they use a very high-speed travel portal (Bifrost bridge).

As Thor and Loki were trying to escape from their sister through this bridge, a fight broke out and Thor and Loki got knocked out of the bridge and fall into a waste planet (Sakaar).

This is a good introduction to this article, where we will be discussing what Packet Loss is, its causes and effects, and how to solve, or at least reduce the possibility, of packet loss. We will also use GNS3 to simulate a network where packet loss exists.

Packet Loss and Its Causes

The short story of Thor and his evil sister is exactly how packets get lost. Simply put, Packet loss is when packets traveling through a network medium get “knocked off” before getting to their destination. There are a couple of reasons why packet loss happens and we will look at some of them in this section.

Note: Every network will encounter issues like packet loss, from time to time. This is expected. However, these issues should not have too much of a negative impact on the performance of the network.

Link Congestion

One of the major causes of packet loss is link congestion. A simple analogy is rush hour traffic when there are more cars on the road than the road can sufficiently handle. Another analogy is a 4-lane road merging into 2 lanes. What happens is that there are more packets arriving on a link than that link is designed to handle.

In some cases, even if the link can technically handle the amount of traffic reaching it, it has been configured to drop packets after a certain limit. An example of this is an organization that purchases 2 Mbps from its ISP.

Even if the link can technically support up to 100 Mbps (e.g. MetroEthernet), the ISP will configure their devices to ensure that the organization can only push 2Mbps worth of traffic. Anything more will usually be dropped (depending on the maximum burstable agreement the organization has with the ISP).

Another example of network congestion is when service providers intentionally oversubscribe a link. The rationale is that all the subscribers of that service will not be using the link simultaneously. However, what happens during peak periods when more people are using the service than its capacity is that there will likely be packet loss resulting from congestion.

Overutilized devices

Another cause of packet loss similar to network congestion is Over-Utilized devices. This means that a device is operating at a capacity it was not designed for. In a network, packets may arrive faster than they can be processed/sent out.

To handle this type of situation, many devices have buffers where they hold packets temporarily until they can be processed and sent out. However, in the case of an Over-Utilized device, the buffer will probably fill up quickly, resulting in excess packets being dropped.

For example, a Cisco ASA 5506-X is designed to handle up to 750 Mbps of throughput traffic. If you use such a device at the network edge of an organization pushing more than that maximum throughput, you will definitely have an issue.

What happens in many instances is that the device performs at a good enough performance during normal (off-peak) operating times but during peak periods, there will be a noticeable drop in performance, usually evident in the high CPU utilization of the device.

Faulty Hardware and/or Software

Another cause of Packet loss is Faulty hardware. This could be a component of a device or the whole device itself.

For example, I once worked on a project where the ISP was providing the organization with 100 Mbps but the organization was still struggling with good Internet access, especially during peak periods. What we discovered was that the interface on the edge router connecting the organization to its ISP was only able to account for 30 Mbps out of 100 Mbps! The interface had failed (for whatever reason) and once we moved the link to another interface, the performance increased immediately.

Closely related to faulty hardware is a buggy software running on the network device. As with any other software, it is usually impossible for the development team to catch all the bugs in the software running on network devices, and one of such bugs may result in packet loss.

Here are some examples of software bugs in Cisco devices resulting in packet loss:

Wireless versus Wired networks

The type of network medium can also be a cause of packet loss. Generally speaking, Wireless Networks suffer more setbacks than their wired counterparts. For example, radio frequency interference can be a major issue on wireless networks resulting in packet loss.

Other challenges on a wireless network that can result in packet loss include weak signal, distance limitations, and (improperly configured) roaming, many of which can be solved with a combination of a Wifi Analyzer and creating Wifi Heat Maps.

In the case of wired networks, faulty cables can result in packet loss. This could result from the fact that the cable is not properly terminated or that the cable is damaged, causing issues for the electrical signal meant to flow through the cable.

Attack

I was once called to troubleshoot a problem in a data center. The network guys suddenly noticed a major degradation in the network performance so much so that accessing the network devices for management was difficult i.e. very slow access.

We had identified the devices that were affected – two edge routers acting in active/standby mode. Thinking it was a hardware problem on the active device (due to the high CPU utilization), we did a manual failover to the standby device and we started seeing the same problem on the standby device.

This made us focus on the traffic being received by these devices. Upon further investigation, we noticed that a particular IP address was performing a network attack by flooding traffic to the devices, incapacitating them. Blocking that IP address stopped that attack and brought the network back to its normal operating condition.

The attack described above is an example of a Denial of Service (DoS) attack and can result in legitimate packets being dropped because a device is overwhelmed with attack traffic.

Faulty configuration

The last cause of packet loss we will consider in this article is faulty configuration. A typical example is speed and duplex mismatch between two devices on a link. If one device is configured for half-duplex while the other one is configured for full-duplex, there will likely be a collision resulting in packet loss on the link.

Effects of Packet Loss

The effects of packet loss vary depending on the protocol/application concerned.

TCP is generally designed to handle packet loss because of the acknowledgment and re-transmission of packets – if a packet gets lost (i.e. no acknowledgment is received for that packet), it will usually be re-transmitted.

UDP, on the other hand, does not have inbuilt re-transmission capability and may not handle packet loss as well. However, irrespective of the protocol/application, too much loss of packets is definitely a problem.

How Do Different Protocols Handle Packet Loss?

Different protocols can handle packet loss in several ways, for instance:

Transmission Control Protocol (TCP) TCP is a protocol that lies between the Application and Network Layers used for creating secure connections and delivering data amid electronic devices. To achieve this, each packet is given a sequence number that makes it easier to confirm when a packet has been received. In case, due to any reason the packet is lost or missing, the recipient device will immediately forward an acknowledgment message (ACK) updating about the packet status. After that, the packet will be resent by the sender.
User Datagram Protocol (UDP) Another popular transport-layer protocol used for delivering data between devices but is connectionless. Also, unlike TCP, it does not offer the same level of reliability. It sends the packet to the destination and hopes it will be received. Unlike TCP, it does not acknowledge packet receipt. There will be no notification to the sender and no attempt at retransmission in the event of a packet loss.

Depending on the protocol being used and the type of data being delivered, dropped packets might result in a variety of issues. The following are some instances where issues may arise due to dropped packets:

Voice over Internet Protocol (VoIP) Through the use of VoIP technology, users can make phone calls online. VoIP call quality issues, such as choppy audio, garbled speech, and dropped calls, can be caused by packet loss.
Video streaming Videos may stutter, buffer, or even fail to load as a result of packet loss.
File transfer The transmission of a file may take longer to complete or may fail completely if packets are dropped during the file transfer process.

Examples of applications that do not handle packet loss well are Voice over IP (VoIP) and some types of video. Degradation in VOIP quality will result in a loss of CDR's and VOIP connectivity at times as well.

You have probably been on calls (e.g. Skype, WhatsApp) where there is a noticeable performance issue, like “robotic speech” or completely missed audio. This is usually as a result of packet loss (along with other factors like bandwidth, delay, and network jitter).

According to Cisco recommendations, packet loss on VoIP traffic should be kept below 1% and between 0.05% and 5% depending on the type of video.

How Much Packet Loss Is Normal?

Packet loss can lead to network communication delays and disturbances; thus it is best to avoid it. On the other hand, a tiny packet loss is usual in any network, and some packet loss is expected. However, the acceptable level of packet loss can be determined based on the network type and the applications operating on it. A lower than 1% packet loss rate, for instance, might be necessary for real-time voice and video applications on a high-speed local area network (LAN).

A slower wide area network (WAN), on the other hand, might be able to handle a higher packet loss rate if it has fewer time-sensitive applications.

Depending on the particular network and applications, different amounts of packet loss will be deemed normal. By reducing packet loss, you can guarantee your network's optimal performance and dependability.

Lab: Packet Loss in GNS3

Let us investigate the effects of packet loss using a simple lab in GNS3.

To make this as realistic as possible, we will introduce the NETem appliance which emulates a link and is able to introduce various factors like bandwidth, delay, and packet loss on a link. This functionality is actually built into the Linux kernel – the NETem appliance just makes it easier to configure.

Download GNS3 Here and Get it installed in Order to Follow along with our Lab setup

GNS3 Download this FREE tool

Our lab setup is as shown below:

The NETem appliance is transparent on the network so PC1 and R1 are actually on the same 10.0.0.0/24 network, thinking they have a direct connection.

The easiest test we can do on the network is a ping test. Let us ping from PC1 to R1:

PC1> ping 10.0.0.1

10.0.0.1 icmp_seq=1 timeout

84 bytes from 10.0.0.1 icmp_seq=2 ttl=255 time=23.168 ms

84 bytes from 10.0.0.1 icmp_seq=3 ttl=255 time=6.965 ms

84 bytes from 10.0.0.1 icmp_seq=4 ttl=255 time=14.084 ms

84 bytes from 10.0.0.1 icmp_seq=5 ttl=255 time=13.407 ms

 

PC1> ping 10.0.0.1

84 bytes from 10.0.0.1 icmp_seq=1 ttl=255 time=6.999 ms

84 bytes from 10.0.0.1 icmp_seq=2 ttl=255 time=12.637 ms

84 bytes from 10.0.0.1 icmp_seq=3 ttl=255 time=12.203 ms

84 bytes from 10.0.0.1 icmp_seq=4 ttl=255 time=11.711 ms

84 bytes from 10.0.0.1 icmp_seq=5 ttl=255 time=6.818 ms

As you can see from the screenshot above, we received a reply to almost all the ping echo packets.

Note: The first ping packet timed out due to ARP. After that initial ping, ping should not timeout as long as the ARP cache still contains the MAC address of the other host.

Now, we will configure the NETem appliance to introduce loss on the network. When we open the console (telnet) connection to that appliance, the default interface is as shown below:

What I want to do is apply a 15% loss in a symmetric manner i.e. both ways.

Now when we test with ping again, we see that some ping packets are lost:

PC1> ping 10.0.0.1

10.0.0.1 icmp_seq=1 timeout

84 bytes from 10.0.0.1 icmp_seq=2 ttl=255 time=10.554 ms

10.0.0.1 icmp_seq=3 timeout

84 bytes from 10.0.0.1 icmp_seq=4 ttl=255 time=5.864 ms

84 bytes from 10.0.0.1 icmp_seq=5 ttl=255 time=5.807 ms

 

PC1> ping 10.0.0.1

10.0.0.1 icmp_seq=1 timeout

84 bytes from 10.0.0.1 icmp_seq=2 ttl=255 time=6.068 ms

84 bytes from 10.0.0.1 icmp_seq=3 ttl=255 time=3.839 ms

84 bytes from 10.0.0.1 icmp_seq=4 ttl=255 time=3.460 ms

84 bytes from 10.0.0.1 icmp_seq=5 ttl=255 time=6.079 ms

If you replicate this lab, try the ping over and over again and you will notice that the packets lost each time will differ slightly. Also, reduce/increase the packet loss and see what effect it has on the network.

Side Note: Something very interesting to try is to replace PC1 with a router or any device that can be used to open a telnet/ssh connection (VPCS doesn’t support this). Next, configure R1 to accept remote connections and then try to manage R1 remotely (telnet/ssh) from the other device you just added.

What you will notice is that at 10% packet loss, the remote connection will be relatively smooth. However, at 30%, you will notice typing delays. You can experiment with lower/higher values.

Diagnosing Packet Loss

While there is no strict approach to detecting packet loss on a network, there are a couple of steps and tools you can use. You will usually start from a place of user experience, that is, users are complaining about poor network performance or they are experiencing some of the effects of packet loss that we have discussed above. From that point, you will want to start troubleshooting to either confirm that the problem exists or exclude packet loss as the cause of the problem e.g. an application problem.

One of the most evident signs that packet loss is occurring on a network is devices with High CPU utilization. Like we already discussed, this can be as a result of several reasons like over-utilized devices, faulty hardware/software, or even an attack.

If you find one of such devices on the network (e.g. through your network management system), then you will want to troubleshoot why that device has high CPU utilization. Cisco has a good guide for troubleshooting high CPU utilization on its devices.

R1#show processes cpu

CPU utilization for five seconds: 5%/0%; one minute: 3%; five minutes: 1%

PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process

1           8        67        119  0.00%  0.00%  0.00%   0 Chunk Manager

2           4        39        102  0.00%  0.01%  0.00%   0 Load Meter

3           0         1          0  0.00%  0.00%  0.00%   0 chkpt message ha

4           0         1          0  0.00%  0.00%  0.00%   0 EDDRI_MAIN

5        1200        70      17142  0.00%  0.57%  0.29%   0 Check heaps

Assuming there are no easy-to-detect causes of packet loss on the network such as high CPU utilization, then you can continue your troubleshooting using tools like ping and traceroute. By consistently sending ping packets (of various sizes), you may be able to determine that there is loss on the network.

Once this has been identified, you can then use traceroute to try to determine which hop in the path from sender to receiver is causing the packet loss. MTR, a tool that combines the functionality of ping and traceroute in one, can also be used to continuously monitor the performance of a particular path, and report packet loss if any.

Note: Keep in mind that some devices filter ping/traceroute packets. As such, you may not always get accurate results using these tools.

When troubleshooting packet loss on a device, it will be worth taking a look at the interface statistics. Many vendors have command-line or GUI tools to view the statistics on network interfaces and will reveal information such as the number of packets that have gone in and out of that interface, the number of errors, the size of the input and output queues, and if there have been any drops e.g. due to a full buffer.

R1#show interfaces FastEthernet 0/0

FastEthernet0/0 is up, line protocol is up

Hardware is Gt96k FE, address is c201.6bcd.0000 (bia c201.6bcd.0000)

Internet address is 10.0.0.1/24

MTU 1500 bytes, BW 10000 Kbit/sec, DLY 1000 usec,

reliability 255/255, txload 1/255, rxload 1/255

Encapsulation ARPA, loopback not set

Keepalive set (10 sec)

Half-duplex, 10Mb/s, 100BaseTX/FX

ARP type: ARPA, ARP Timeout 04:00:00

Last input 00:00:05, output 00:00:00, output hang never

Last clearing of "show interface" counters never

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

Queueing strategy: fifo

Output queue: 0/40 (size/max)

5 minute input rate 0 bits/sec, 0 packets/sec

5 minute output rate 0 bits/sec, 0 packets/sec

38 packets input, 3652 bytes

Received 1 broadcasts, 0 runts, 0 giants, 0 throttles

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

0 watchdog

0 input packets with dribble condition detected

73 packets output, 7754 bytes, 0 underruns

0 output errors, 0 collisions, 1 interface resets

0 unknown protocol drops

0 babbles, 0 late collision, 0 deferred

0 lost carrier, 0 no carrier

0 output buffer failures, 0 output buffers swapped out

R1#

Here are a few articles to help you identify/troubleshoot input and output drops:

Finally, in this section on troubleshooting packet loss, you want to consider packet capturing using a traffic diagnostic tool like Wireshark. These tools are typically able to capture and analyse traffic based on several performance characteristics, including detecting packet loss.

Fixing Packet Loss

An automated monitoring tool will give you the ability to track a variety of network traffic conditions simultaneously. When examining networks for packet loss, you will probably also need to look into related issues, such as jitter. Trying to manually detect multiple issues can quickly become too complicated, so a tool like Paessler PRTG, with its QoS sensor, can come in very handy.

Interactive applications, such as video streaming and VoIP don’t have time for transmission controls and so are very prone to issues such as packet loss. Retrospective manual diagnosis won’t buy you enough time to fix problems before they damage the experience of users, so it is better to use an automated tool that will spot developing issues and raise an alert, so you can fix issues before they become noticeable.

Paessler PRTG Start a 30-day FREE trial

Solving the issue of packet loss on a network is usually as simple as identifying the cause, and finding a fix for that cause.

If a link is congested, perhaps you should consider getting a “fatter” pipe so that you can push more traffic through that link. You can also consider applying Quality of Service (QoS) features such that certain types of traffic (e.g. VoIP) are given priority over other traffic that is not so sensitive to loss or critical to operations.
For devices that are over-utilized above their capacity, the only solution may be to upgrade to a higher-performance device. In some cases, it may be a component of the device that needs to be upgraded. For example, you should not use a Fast-Ethernet interface for a 100Mbps link because even though the theoretical limit of Fast-Ethernet is 100Mbps, in practice, you will probably not be able to hit that limit. Use a Gigabit-Ethernet interface instead.
Swap out faulty hardware/cables and upgrade software as soon as new releases are available (upon adequate testing).
Depending on your environment, you may opt for a physical network cable (wired) connecting your device to the network instead of using a wireless connection. For wireless networks, you should work on reducing interference as much as possible. One way is to move to a less crowded channel. If distance is not a limitation and your devices support it, you can move to the 5Ghz band which suffers less interference, has more non-overlapping channels, resulting in less congestion and contention. Using a WiFi Analyzer can further assist you in finding issues and spotty areas in your wifi network.
If under attack, try to mitigate that attack as fast as you can. This can be as simple as using an ACL to block the IP address of the attacker (if static and known). In more complex cases, you can use features like Remotely Triggered Black Hole Routing or a DDoS-prevention cloud service like Cloudflare.
Finally, check that your configuration is not causing packet loss. Ensure that duplex settings match on devices (or just leave it on Auto). If you have configured QoS, ensure that your buffer’s size is enough.

How to Reduce High Packet Loss?

A high level of packet loss may result in disruptions in network communications as well as slow down the performance. As a result, it might affect your performance and network reliability. Network congestion, poor network design, interference, and faulty built-in hardware and software are a few common factors that can lead to high packet loss.

To avoid or reduce it, we have penned down a few steps that will help reduce high packet loss:

Discover the reason behind Packet Loss It is essential to find and identify the cause behind high packet loss before addressing it. Using ping, traceroute, or other network diagnostic tools, users can easily discover the cause of the problem.
Utilize QoS Settings One way to lower packet loss for high-priority applications or services is to prioritize specific kinds of network traffic with the help of QoS settings.
Increase Bandwidth Enhancing the available bandwidth might help lower packet loss if network congestion is the cause of the loss. Changing to a faster internet connection or improving your network hardware is the best solution to achieve this.
Run Analysis and Identify Faulty Hardware or Software Defective network hardware, like switches or routers, or software problems, including out-of-date drivers or improper setups, could be the source of high packet loss.
Network Design Optimization Having a poor network design may also lead to a high packet loss. Hence, to increase connectivity and lower packet loss, you might need to rebuild the network or install more hardware.

Conclusion

The effects of Packet loss can be very annoying like inaudible audio calls and grainy videos. As we have seen in this article, packet loss can be caused by a variety of things like congestion, security attacks, and even the network medium being used.

To combat this issue, identify the cause using tools like ping, MRT, show commands, and packet captures, and then try to fix the defect.

Packet loss FAQs

What causes packet loss?

If a packet fails to reach its destination, the cause is that one of the devices responsible for passing it on along its route or the network interface of the receiving device. The most common cause for this is that the device receives more traffic than it can process. If the device is busy, arriving packets go into a buffer, or queue. When the queue is full, there is nowhere for arriving packets to go, so they just disappear.

Is jitter a packet loss?

Jitter is an irregularity in the arrival rate of packets. If a packet in a stream gets lost, that would cause a gap in the arrival rate, which can be identified as jitter. In other words, packet loss is one of the causes of jitter.

What is an acceptable level of jitter?

The tolerance to jitter is different for different applications. Interactive applications, such as video streaming or VoIP don’t operate well with jitter. As a rule of thumb, jitter should be less than 10 percent of the typical roundtrip time on the connection.

How does packet loss affect network performance?

Packet loss reduces network performance by slowing down data transmission and causing data to be resent, leading to increased latency and reduced throughput.

How can packet loss be detected?

Packet loss can be detected by using network monitoring tools that measure the number of packets sent and received, and calculate the percentage of lost packets.

How can packet loss be prevented?

Packet loss can be prevented by reducing network congestion, improving network infrastructure, and ensuring proper network configuration and maintenance.

Can packet loss be repaired?

No, packet loss cannot be repaired. The lost packets cannot be recovered once they are lost.

Is it normal to have some packet loss in a network?

A small amount of packet loss is normal in any network and is usually not a cause for concern. However, significant or persistent packet loss can indicate a problem that needs to be addressed.

How can the impact of packet loss be reduced?

The impact of packet loss can be reduced by using technologies such as Quality of Service (QoS) to prioritize important network traffic and reduce network congestion, and by implementing error correction and retransmission protocols to recover lost data.