mag72

Network Congestion – 5 Causes & How to Alleviate Issues with your Network being Congested!

network congestion

Marc Wilson

When you hear the word “Congestion”, what comes to mind?

My guess is that you thought about traffic congestion on the road. When there are too many cars on a particular stretch of road faster than they can exit that road, we have a traffic jam/congestion.

Traffic congestions can also be caused by other factors like accidents, bad roads, small roads, and so on. The point is that congestion results in restricted flow of traffic.

The same is true of Network Congestion as we will discuss in this article, looking at the causes, effects, troubleshooting tips and software and how to fix of network congestion.

Network Congestion

Just like in road congestion, Network Congestion occurs when a network is not able to adequately handle the traffic flowing through it. While network congestion is usually a temporary state of a network rather than a permanent feature, there are cases where a network is always congested signifying a larger issue is at hand.

In this section, we will discuss five (5) common causes of network congestion including:

  • Over-subscription
  • Poor network design/mis-configuration
  • Over-utilized devices
  • Faulty devices
  • Security attack

Over-Subscription

Have you ever experienced a case where your web browsing experience is consistently faster at certain times of the day than others? For example, there is a high probability that you will have a better browsing experience at night than during the day.

This is because there are more users on the network during the day (peak period) than at night (off-peak period). This is similar to getting on the train during rush hour versus when everyone is at work.

Cases like this are usually the result of Over-Subscription where a system (e.g. a network) is handling more traffic than it was designed to handle per time.

It is important to note that over-subscription is usually done on purpose as it may result in cost savings.

For example, let’s consider a scenario where an organization has 100 users and it has been determined that a 100Mbps Internet link will be suitable for all these users.

Now imagine that most of the staff of this organization work from home. In this case, it will be more cost efficient to go for a lower link capacity, say 50Mbps, since only a handful of employees will be using the link per time. But what happens when there is a company-wide meeting and all employees come into the office? You guessed right – Network congestion.

Poor Network Design/Mis-Configuration

A more serious cause of network congestion is poor design or device Mis-Configuration. Take for example a broadcast storm, where a large volume of broadcast and/or multicast traffic is seen on the network within a short time, resulting in severe performance degradation.

Since broadcasts are contained within subnets, the larger the subnet the more serious the effect of a broadcast storm. Therefore, a network that has been designed with large subnets without giving proper consideration to broadcast storms can result in network congestion.

Another case of broadcast storms is Layer 2 loops. In a layer 2 segment, broadcast messages are used to discover unknown MAC addresses. If there is a loop on the network, the same broadcast message can be sent back and forth between the devices on the network resulting in broadcast storms and possible network congestion.

Over-Utilized Devices

Devices such as routers, switches, and firewalls have been designed to handle certain network throughput. For example, the Juniper MX5 has a capacity of 20Gbps. Apart from the fact that this is a theoretical value (the capacity in the production environment will be slightly lower), this is also the maximum capacity.

Therefore, constantly pushing ~20Gbps of traffic through that device means that the device will be over-utilized and will likely result in high CPU utilization and packet drops, leading to congestion on the network.

Another issue related to over-utilized devices that can cause network congestion is Bottlenecks. As in most hierarchical designs where multiple devices feed into a higher-level device, care must be taken to ensure that the higher-level device is capable of handling all the traffic from the lower-level devices.

If this is not the case, then the higher-level device can result in a bottleneck causing congestion on the network. Think about a 4-lane highway merging into a 2-lane road.

Faulty Devices

I once performed a network performance assessment for an organization. They were buying 100Mbps link capacity from their ISP but the users on the network were struggling to connect to the Internet effectively.

They complained that the network was always “slow” (user speak for network congestion) even when few people were on the network. Upon investigation, we discovered that while their ISP was truly giving the agreed upon 100Mbps, the edge device was only providing 30Mbps to the network!

Apart from the fact that this organization had wrongly terminated the link on a FastEthernet interface (which gives a theoretical speed of 100Mbps but much lower practical speed), that interface was also faulty. By moving the ISP link to another interface (we used a GigabitEthernet interface instead), the performance problem was solved.

Security Attack

In another organization I consulted for, a network of about 10 users had poor browsing experience even with the 4Mbps link they were getting from their ISP.

Ideally, this capacity should have been enough because the users were not doing anything heavy on the Internet – just emails, web searches, and normal user activities.

Upon investigation, it was discovered that one of their servers had been compromised and it seems the attacker was using this server to host illicit content resulting in a huge amount of traffic being sent to/from this server. By cleaning up this server, the congested network was once again “free” for normal user traffic.

Other security attacks that can result in network congestion include viruses, worms, and Denial of Service (DoS) attacks.

Effects of Network Congestion

Everyone on a network generally “feels” the effects of network congestion. They may not be able to explain it in technical terms but will say things like “The connection is so slow”, “I can’t open web pages”, “The network is really bad, I can’t hear you”.

From a technical perspective, the effects of a congested network include:

  • Delay:
    Also known as Latency, Delay is the time it takes for a destination to receive the packet sent by the sender. For example, the time it takes for a webpage to load is a result of how long it takes for the packets from the web server to get to the client. Another evidence of delay is the buffering you experience when watching a video, say on YouTube.
  • Packet Loss:
    While packets may take a while to get to their destination (delay), packet loss is an even more negative effect of network congestion. This is especially troubling for applications like Voice over IP (VoIP) that do not deal well with delay and packet loss, resulting in dropped calls and Call Detail Records, lag, robotic voices, and so on.
  • Timeouts:
    Network congestion can also result in timeouts in various applications. Since most connections will not stay up indefinitely waiting for packets to arrive, this can result in lost connections.

Troubleshooting Network Congestion

Feeling the effects of network congestion is one thing but actually confirming that a network is congested is another. In this section, we will look at some activities that can be performed to confirm the congestion of a network.

1. Ping

One of the fastest ways to check if a network is congested is to use Ping because not only can it detect packet loss, it can also reveal delay in a network i.e. through the round-trip time (RTT). Using a tool like MTR (which combines ping and traceroute) can also reveal parts of the network where congestion is occurring.

ping

2. LAN Performance Tests

A tool like iPerf can be very useful in determining performance issues on a network, measuring statistics like bandwidth, delay, jitter, and packet loss. This can help reveal bottlenecks on the network and also identify any faulty devices/interfaces.

iperf

3. Bandwidth Monitoring

During the investigation of the compromised server I mentioned above, we used a tool called ntopng to discover “Top Talkers” which revealed that the server was using up all the bandwidth on the network. In the same way, tools that monitor bandwidth can reveal network congestion especially during a security attack or if a particular host is using up all the bandwidth.

ntop

You can read this article for more information about performing a network performance assessment.

Decongesting a network

The fix for a Congested Nnetwork will Depend on the Cause:

  • For oversubscribed links, you may need to purchase more bandwidth from your service provider. Some service providers also allow you to temporarily boost your bandwidth for a small fee. You may also want to implement Quality of Service (QoS) features which will ensure that even in the event of congestion, critical applications can still function.
  • Layer 2 loops can be prevented by using loop prevention protocols such as Spanning Tree Protocol (STP). A poor network design can be more difficult to fix since the network is probably in use. For such cases, incremental changes can be made to improve the network and remove congestion.
  • Over-Utilized devices may need to be swapped out. Alternatively, the capacity of the system can be increased by implementing high-availability features such as clustering and stacking.
  • Faulty devices definitely need to be replaced. In some cases (like the example I gave above about the 100Mbps link reduced to 30Mbps), only a part of the device (e.g. an interface) needs to be replaced.
  • Security attacks need to be combated as soon as they are discovered. In the case of the compromised server, the first thing we did was to remove that server from the network completely. Since this is not always a feasible solution (e.g. the compromised device is a critical server), other temporary measures such as applying access control lists to deny the offending traffic may need to be implemented.

Conclusion

In this article, we have discussed network congestion and how it affects user experience. We have seen how causes such as over-subscription, faulty devices, and security attacks can result in network congestion.

We have also discussed the effects of network congestion including generally poor user experience, packet loss, and timed out connections. Finally, we have discussed how to troubleshoot congestions in a network and highlighted some things that can be done to fix these issues.