Cross-site Scripting Attacks: Explained with Examples

Cross-site scripting attacks involve exploiting vulnerabilities in websites in order to steal data from their visitors. Often referred to by their acronym, XSS, these attacks can be a little difficult to understand without the right background knowledge.

The ultimate aim of these attacks is to steal data, gain access to accounts and commit a range of other cybercrimes.

The important things to note are that:

XSS attacks provide a way for hackers to circumvent the same-origin policy. Ordinarily, this policy prevents one web page from being able to access data on another page, unless the two pages have the same origin. This helps to keep your data safe from malicious websites.
Websites that aren’t secured properly may have input validation vulnerabilities. These essentially allow attackers to inject their own malicious scripts into the code of the website.
By compromising a website that people trust with a malicious script, hackers can circumvent the same-origin policy. The browsers of site visitors will trust the malicious script, because they think it is coming from a website that they trust. Hackers use XSS attacks as footholds to steal data and access the accounts of the site visitors.

Lost already? Don’t worry, we will discuss each of these elements in depth, so that you can really understand the how and why of these attacks. If you already have decent knowledge of these concepts, feel free to jump ahead to the What is cross-site scripting? section.

For everyone else, we will be addressing the following topics first:

What is a script?
What is client-side scripting?
The security issues of client-side scripting
What is code injection?
What is the same-origin policy?

Once we get through each of the important background topics, we will finally be ready to cover the different types of cross-site scripting attacks.

What is a script?

Let’s go right back to the beginning and make sure that you have a clear understanding of scripts. The term script is short for scripting language. Scripting languages are essentially a special subtype of programming languages, which are a range of formal systems that we use to write computer code.

You’ve probably seen code many times before and recognize the weird combinations of letters and symbols that somehow makes your computer do the things you want it to do. Take a look at the following example, which simply prints the words Cross-site scripting attacks on the screen:

#include <stdio.h>
int main() {
// printf() displays the string inside quotation
printf("Cross-site scripting attacks");
return 0;
}

The above example is source code, which humans can easily read and write (if they are programmers). The problem with source code is that machines can’t read it. Instead, they read machine code, which transmits the same information, but in the zeros and ones that machines can understand. The above example would look something along the lines of this in machine code:

0101001010101100101010010101010010101101010010101001010111001010100101001010110010101010010010110101001010100100101001010010110010010100101010010100101100101001010100…

As humans, the above is almost impossible to read. The obvious conclusion from this is that in order for us to get computers to do what we want, we need to translate source code to machine code. We do this with something called a compiler.

If you copy the source code we wrote out above (the first example, not the ones and zeros) , then paste it into this online compiler and click Run, it translates it into machine code under-the-hood. This allows it to understand what we are asking it to do. Once it reads the machine code, it displays our message, just like we told it to:

Cross-site scripting attacks

Scripting languages are languages that don’t need to be compiled into machine code. Instead, they use an interpreter to translate their code into machine code. The most important distinction is that compilers generate standalone machine-code programs, while interpreters perform the actions that the programming language describes.

Compilers generate code in the desired output language, then save it. On the other hand, interpreters execute the code straight away. While compiled languages are able to run independently of their parent program, scripting languages run inside other software.

An interpreter also differs from a compiler in that it:

Translates a program one statement at a time, as opposed to the whole program at once. It also shows errors at each line, rather than displaying them all at the end.
Is slower to execute.
Is memory efficient.

Common languages that use a compiler include C and C++, while scripting languages include JavaScript, Python, Ruby, Perl, Bash, PowerShell and MATLAB. One of the main advantages of these scripting languages is that they make the development process much faster. However, this comes at the cost of their slower execution speed.

Scripting languages like PowerShell or Bash are often used to automate command-line processes. Websites and web apps are also full of scripts, where they perform roles ranging from data storage to building dynamic website functionality. The most common one you will run into online is JavaScript, the standard in client-side scripting that we will discuss in the following section.

Over 97 percent of websites use JavaScript for client-side functionality, often through third-party libraries. Its dominance, alongside the security challenges posed by third-party libraries (we will discuss these in The security issues of client-side scripting section) make it one of the most commonly used languages in XSS attacks.

When you hear a phrase like “running a script”, it simply means executing some code written in a scripting language.

What is client-side scripting?

Before we can get to client-side scripting, you need to understand the two main types of website, static and dynamic.

Different types of websites: static vs. dynamic

Dynamic websites are those that change when the user interacts with them. They have become increasingly common over time, because the dynamic approach offers a far wider range of functionality. Some examples of dynamic websites include:

Amazon When you enter a search term on the home page, you will be met with a list of products that match it. If you search for “boots”, you will see a list of boots on the results page. These lists may be individually tailored to your location, or by other details that Amazon knows about you.
Facebook Everyone’s Facebook News Feed is different, based on their past activities and interactions with others. The dynamic nature of the website allows people to see their own individualized content.
Google When you enter keywords, Google delivers a page of dynamically generated results. This is based on its search algorithm, which will return different pages based on what it knows about you, your search history, and your location. Again, this is only possible because the website is dynamic.

Contrast the above examples with more traditional static websites. Most blogs are good examples. The defining feature of static websites is that the site is stored in the same way as it’s delivered to the user’s browser:

In each of the above examples, a user’s interaction doesn’t impact the content that the site displays for them. All they can do is navigate through the pages. They cannot search for products, log in to their accounts, or have specifically tailored content appear for them.

Client-side and server-side scripting

Dynamic websites have far more functionality, and they accomplish this by running both server-side and client-side scripts. As we discussed, this means that they do this by executing code written in scripting languages on both the client and server sides. The server side is known as the back end, while the client-side is referred to as the front end.

When you visit the PCWDLD website, your browser, such as Chrome, Safari, Edge or Firefox, acts as the client. The PCWDLD website is hosted on a server. As you can probably gather, client-side scripts run in the client, your web browser. In this case, server-side scripts are those that run on PCWDLD’s server.

The main difference between the two is where the processing gets done, either within your browser or on the server. Your browser makes a request to the server, which runs server-side scripts to return what you asked for.

Once it sends you the page, there may be client-side scripts that you can interact with, allowing you to alter the page without having to contact the server once again. Other actions that require new information will require you to send another request to the server, which will run server-side scripts to complete the task, before sending you the resulting data.

Client-side scripting uses languages such as CSS, HTML, and JavaScript. Server-side scripts are written in languages like Python, Java, PHP and Ruby.

Client-side scripting is everywhere, but a good example of it sometimes occurs when you sign up to a new account on a website. When you set up an account, you are often prompted to enter your password twice.

If you ever mistype one of the passwords, you will often see a prompt that tells you that the two passwords do not match. You can tell that this prompt must have come from a client-side script, because you haven’t hit the send button yet. If the server hasn’t received the data by this stage, the action can’t have been caused by a server-side script.

You can see an example of a server-side script when you use the search bar on Amazon’s website. The query goes from your browser to the server, which runs a script to return the relevant products to you. The script is server-side, because you don’t have Amazon’s entire database stored in your browser.

Client-side and server-side scripting each have their own uses, as well as positives and negatives. Major websites will generally handle things like logging in and managing personal information on the server-side, while client-side scripting is often responsible for making pages more interactive.

The differences between client-side and server-side scripts

The advantages of client-side scripts include that they:

Can improve a website’s usability for site visitors with browsers that support the scripts.
Can be faster at certain tasks, because they don’t require a request and a response from a server.
Allow a greater degree of interactivity, because they immediately respond to user inputs.
Give developers more control of their widgets.

The downsides of client-side scripts include:

Each different browser, and even separate versions of the same browser may treat various client-side scripts differently. This means that developers have to do more testing to make sure that the majority of users have a seamless experience.
Some browsers don’t support scripts, while some users block scripts with add-ons like NoScript. This means that a website may not work for them, or that it will have reduced functionality when they visit.

The benefits of server-side scripts include:

Users can only see the HTML output, not the code from the server-side script. This enhances security.
They allow for content management systems, which can simplify many things for the developer.
Can be fast in certain scenarios, because servers are generally much more powerful than people’s phones or computers.
Users don’t need plugins like Flash or Java for content to function properly.

The disadvantages of server-side scripts include:

The server needs to have scripting software installed on it.
Databases are needed so that content management systems and scripts can store dynamic data.

The differences between client-side and server-side scripts mean that developers have to carefully choose a combination that suits the unique needs of their project. The right mix of the two help to offer us the speed and functionality that we are so accustomed to on many of our favorite websites.

Now that you understand what client and server-side scripts are, as well as how the prominence of client-side scripts has added to the functionality of many of our most popular websites, we can examine one of the downsides of client-side scripts.

The security issues of client-side scripting

Websites and web applications have gotten extremely complex over time. Many are composed of code from a mix of open-source libraries, JavaScript libraries, as well as in-house code. While this mixture has made it easier and cheaper to develop complex websites, it has also created additional security risks on the client side.

Open source libraries are incredibly valuable, but it’s often impractical for developers to audit the code they use from them, and to keep up with any changes. A 2020 study from Veracode looked at 85,000 applications and found that 70 percent of them had a security flaw in an open source library.

The most common vulnerability was cross-site scripting, which was found in 30 percent of libraries. Insecure deserialization was found in over 23 percent, while just over 20 percent had broken access control vulnerabilities.

Another issue is that third-party vendors provide developers with useful services, but they may not give the developers visibility into code updates or changes. The quality of oversight and evaluation procedures surrounding in-house code will vary from developer to developer, but at least developers have insight into and control over their own code.

This complicated mix of code from open-source libraries, third-party vendors, and in-house developers ultimately results in a complex environment, where scripts from many entities are running on a website, often without adequate code evaluation.

Data is also being shared with an increasing number of parties, which brings in even more chances for things to go awry. Ultimately, most developers don’t end up having as much control over their code as we would like to think.

These outside libraries are loaded statically into pages by inline snippets. However, they change their content dynamically in the background, which can introduce security vulnerabilities. The website developer has no control over these changes.

These scripts from both third-party vendors and open-source libraries can cause changes to JavaScript behavior on the page. This can affect:

Storage properties such as session storage and cookies.
Document object models (DOM), which are interfaces that treat HTML and XML documents as tree files. The changed JavaScript behavior can mutate these elements, impacting user experience.
Network destinations and network protocols.

Now, let’s say an attacker takes advantage of this complex system and breaches one of these third-party libraries, with the intent of targeting those that use the library. The hacker could then alter the library so that websites that use the library will subtly leak user data, and send it back to the hacker.

Websites that use the third-party library could end up sending user data to the attacker, all without their knowledge. Even if the website’s developers have good security practices in place, they may have no way of seeing the data breach through the third-party library, because it would be occurring on the client-side.

The prevalence of scripts, the large attack surface of contemporary websites, and the fact that simple, unseen errors can have huge security ramifications are all factors that make it challenging to develop a relatively safe site. Ultimately, this complex state of affairs leaves us in a situation where many websites are vulnerable to cross-site scripting attacks.

Attackers generally use JavaScript to take advantage of these vulnerabilities. Other technologies include:

What is code injection?

Now that we’ve covered scripts and the environment that makes client-side scripts vulnerable to attack, it’s time to discuss another key component of cross-site scripting attacks, code injection. When attackers find cross-site scripting vulnerabilities, they use code injection to exploit them.

At its essence, code injection involves attackers exploiting vulnerabilities in code. These vulnerabilities are generally caused by poor sanitization or validation processes that allow attackers to send untrusted inputs to an interpreter (we discussed what these were in the What is a script? section). The attacker can inject code that is then executed in the victim’s browser.

Sanitization and validation

Let’s examine the concepts of sanitization and validation by analyzing what would happen without them. Many websites have a variety of input fields so that their users can enter their own data to achieve a specific aim. Think of things like search bars, forms, and all of the input fields you see when you sign up for a new account.

Under normal circumstances, a user just enters their search terms or other information, and the website then delivers the search results or other desired outputs. When this happens, everything works as expected, and it’s not particularly interesting.

So what happens when someone tries to enter code into one of these input fields?

Well, if the website is poorly secured and there is no validation or sanitization of user inputs, then the code may execute. If a hacker inputs malicious code, it can run on the website, causing whatever mayhem it was programmed to cause.

Obviously, this is bad. Very bad. Thankfully we have processes like sanitization and validation, which can prevent code that has been entered through input fields from executing. Validation essentially ensures that whatever the user has entered matches what the website requests from them.

For example, if you ask for a numeric date of birth, you would program your website so that user inputs with letters or other characters will not be validated. Similarly, if you want users to enter a postcode, a 15 digit number would not be validated, because postcodes are not 15 digits long.

Sanitization ensures that the inputs conform to the requirements of the subsystem. It’s crucial for security, and involves removing or altering unwanted inputs like code. It uses techniques like filtering and escaping to prevent code from executing, helping to keep websites safe.

Code injection vulnerabilities

If hackers discover errors in the sanitization and validation processes, they may take advantage of the opportunity and use it to inject code. They will enter their malicious code into the input field and it can then execute. This code injection causes the target to act according to the code that was entered.

Code injection can be devastating, because it can give hackers the foothold they need to escalate their attacks. Ultimately they may use it as part of an attempt to corrupt data, deny access, take over the host, spread viruses and more.

There are several different types of code injection. These include:

While each of these types of code injection represent their own threats, our main focus is on cross-site scripting.

Code injection in action

Let’s say a developer creates a simple, yet poorly designed program for a restaurant’s website, with the aim of obtaining customer contact information in exchange for free fries. They probably expect people to fill out their information in the input fields like so:

Name: John Smith

Email: john@johnsmith.com

Phone number: (322) 555-3347

Once a site visitor enters their details, the program is then supposed to email the address with a coupon for their fries. However, this all goes awry when a crafty hacker figures out that the program’s data and its execution instructions are both stored on the same memory. Not content with the opportunity to gain a coupon for free fries, they try to game the system.

Instead of entering their own name into the input field, they enter some malicious code. A well-designed system would notice that the code wasn’t a name, and the validation and sanitization processes would prevent it from being able to run. An insecure system might execute the code.

The hacker’s malicious code could cause a range of different issues, but let’s say they just really like fries. While the software may have initially been programmed to send a single coupon to each email address that was provided, the hacker’s code might alter it to send 100 emails instead.

As you can see, the poor validation and sanitization processes allowed the attacker to inject code which then subverted the intended functionality of the program. Instead, it allowed the attacker to get what they wanted, 100 coupons for free fries. While 100 bags of free fries may be a lot for a small restaurant, the harm from code injection can get much worse.

What is the same-origin policy?

Now that we have discussed everything from scripts to client-side security issues and code injection, we have almost laid all of the groundwork we need for a solid understanding of cross-site scripting attacks.

The final piece of the puzzle is the same-origin policy, a security mechanism that cross-site scripting attacks can circumvent. It’s important to understand it, because without the same-origin policy, hackers wouldn’t bother going to all of the effort of launching cross-site scripting attacks. Their work would be much easier.

A world without the same-origin policy

We all want to be safe when we are online. We don’t want people to be able to access our social media accounts, read our messages, or help themselves to our bank balances. The overall security system that keeps these things private and safe is a complicated mess that somehow seems to work most of the time.

The same-origin policy is one important component of it that helps to keep us secure. The best way to explain the same-origin policy is by imagining the online world without it.

In this world without the same-origin policy, let’s say you have two different tabs open in your browser. One is logged into your online banking, while another tab is on a very questionable website you discovered in one of the dark corners of the internet. The website has a game on it, which seems kind of fun. What you don’t realize is that it’s also draining your account balance in the other tab.

How could this happen?

Well, as we said, in this world there is no same-origin policy. This means that there’s nothing stopping the scripts from the dodgy gaming website from accessing information in the other tab, your online banking session.

Most major websites use scripts for a whole host of features, so scripts are a normal part of the online experience. But the malicious scripts on the gaming website pose a major threat if there is no same-origin policy in place.

In this world without the same-origin policy, the script from the malicious gaming website can access resources from your banking website, and vice-versa. These scripts can access things like cookies, security tokens, and other sensitive data.

They do this through a page’s Document Object Model, which is a cross-platform interface. Scripts can manipulate another page’s Document Object Model, and even send data to arbitrary servers. This means that a malicious script controlled by an attacker could steal sensitive data and then send it to their own server for further use.

While your bank probably isn’t going to bother doing anything too destructive to the gaming website, we can’t say the same in the opposite direction.

In this scenario without the same-origin policy, the scripts from the malicious gaming website have access and they can basically do anything you can normally do on your bank website. They can look up your personal information, change your password, review your statements, and transfer all of your funds into an attacker’s account.

Obviously, a world where the scripts from a website in one tab can access the resources of any others is a terrible idea.

That’s why we have the same-origin policy in place to protect us. The same-origin policy is enforced by your web browser and essentially mandates that a script from one page cannot access the data from another page, unless they have the same origin.

What do we mean by the same origin?

Two web pages are considered to have the same origin if they have:

The same hostname
The same port number
The same protocol

If one page is:

https://www.pcwdld.com/about/

Let’s see how Wikipedia matches up to it:

https://en.wikipedia.org/

The two websites have different hostnames (www.pcwdld.com vs en.wikipedia.org), so they are not deemed to have the same origin. This means that under the same-origin policy, scripts from https://www.pcwdld.com/about can’t access data from https://en.wikipedia.org, and vice-versa.

Now, let’s compare the PCWDLD About Us page site against:

http://pcwdld.com

If you aren’t looking closely, you may think that these two pages have the same origin. However, if you check out the start of each URL, you will notice that the first is HTTPS, while the second is HTTP. These are different protocols, with HTTPS, being the secure version of HTTP.

Because these two URLs have different protocols, the same-origin policy deems them to have separate origins, and they cannot access each other’s resources.

There’s actually a second count on which this fails the same-origin policy, although it’s even less obvious. HTTP uses port 80 by default, while HTTPS uses port 443. This means that these two URLs have different ports as well.

(Note that if you actually try to visit the HTTP version of the PCWDLD page, you will get redirected to the HTTPS page. We just used it as an example to show the protocol aspect of the same-origin policy.)

Now let’s compare it against our homepage:

https://www.pcwdld.com/

Note that the URLs for the homepage and the About Us page are different, but that doesn’t matter. They still have the same:

Hostname – www.pcwdld.com
Port number – 443
Protocol – HTTPS

This means that scripts on the https://www.pcwdld.com/ page can access resources from the https://www.pcwdld.com/about page and vice-versa.

The same-origin policy kind of makes intuitive sense, because at PCWDLD, we trust our other web pages, so it’s not a problem if scripts from one page can access another. However, we can’t trust web pages that are outside of our control, so the same-origin policy keeps us safe from potential threats that may come from them.

What does the same origin policy mean for hackers?

If you were a hacker who lived in a world without the same-origin policy, your life would be much simpler. Your scripts could easily steal data from other websites, and you would get to spend far more time with your feet up.

The same-origin policy simply makes things harder. If you are a hacker who wants to access data through a third-party website, you have to come up with a much more complicated plan. Luckily for you, the same-origin policy has exceptions. If you’re crafty, you just might be able to worm your attacks through them.

Exceptions to the same-origin policy

The same-origin policy does not forbid most cross-origin writes. It generally allows things like form submissions, redirects and links. It also typically allows embedding things such as:

Objects embedded with iFrames.
Media with <video> and <audio> tags.
Images that are displayed with <img> tags.
External resources that are embedded with <embed> or <object> tags.
CSS that is applied with <link rel=”stylesheet” href=”…”>. Fonts applied with @Font-Face.
JavaScript applied with <script src=”…”></script>.

These exceptions grant us much of the functionality we enjoy on the internet. However, the exceptions also open up the door for cross-site scripting attacks. It’s these HTML tags like <script> and <img> that hackers can use as their gateway to attacking poorly coded websites. Since they aren’t blocked by the same-origin policy, attackers can use them to access resources on another website.

What is cross-site scripting?

It’s taken a while to get here, but we’ve finally laid the groundwork to give ourselves a more complete understanding of cross-site scripting attacks. Let’s gather some of the points we have discussed so far:

Under normal circumstances, the same-origin policy stops scripts from one web page being able to access the resources of another, unless they have the same origin. However, there are exceptions for certain types of embedded code, which provide the gateway for cross-site scripting attacks.
Some websites do not validate and sanitize their input fields appropriately, causing security vulnerabilities. The complexity of the client-side ecosystem makes these issues harder to combat.
When attackers discover these vulnerabilities, they can attempt to take advantage of them by entering code into the input fields.
If this code injection executes, the attacker’s malicious code can run on the website.

Now, we’re at a stage of understanding where if attackers come across vulnerable websites, they can inject their code into the website. The significance of this is that this allows them to circumvent the same-origin policy.

Under the same-origin policy, your browser only allows pages with the same origins to access the resources of one another. If you log in to example.com, you would expect that none of the other sites you are visiting are able to access this data.

But if example.com has a cross-site scripting vulnerability, which an attacker has discovered and compromised, things are very different. This allows the attacker to inject their own malicious script. As we mentioned, the same-origin policy has exceptions for embedded code like <script> and <img>.

The attacker’s malicious script can then lead to a website they control, which hosts malware. Because the script is exempt from the same-origin policy, this means that the attacker’s website can run its malware and access resources from your session on example.com. This allows the attacker to steal sensitive data or infect you with malware.

This is the basis of cross-site scripting attacks. They are a way for attackers to access resources from another page, despite the barrier normally imposed by the same-origin policy. Ultimately, an attacker can use cross-site scripting to commit a range of other attacks. Common ones include:

Stealing security tokens so that they can access your account.
Copying sensitive data.
Installing malware on your device for a range of other attacks.

Types of cross-site scripting attack

There are two main types of cross-site scripting attack, reflected and stored cross-site scripting. There are also document object model related attacks, but we won’t be investigating them in depth.

Reflected cross-site scripting

Reflected cross-site scripting attacks are also known as non-persistent attacks. They are seen when server-side scripts don’t sanitize data appropriately.

HTML sanitization involves looking through a HTML document, and reproducing a new one that only contains tags that are presumed to be safe. When it isn’t being done properly, it can allow these reflective cross-site scripting attacks to slip through.

This threat exists because HTML documents are structured to mix content alongside formatting and control statements. If user inputs aren’t sanitized correctly, attackers can inject their own HTML code into the website’s HTML content.

In reflected cross-site scripting attacks, hackers find these vulnerabilities and inject their malicious code. They then find ways to trick potential victims into clicking on a link that includes their malicious script. If someone clicks on the link, the malicious script will execute in their browser, launching the attack.

Reflected cross-site scripting attack example

Let’s make up a website as an example, an online shopping platform like Amazon or Alibaba, called badsite.com. Badsite.com is poorly coded, and its search function doesn’t sanitize user inputs appropriately.

On a secure website, when a user inputs their search terms, we would expect these terms to be displayed on the results page, alongside a list of links to relevant pages. If there were no matching results, we would expect something along the lines of “No results match your search terms” to be displayed.

When someone enters a normal search query, for something like “flowers”, we would expect the URL for the results to be something along the lines of badsite.com/search?q=flowers.

However, our example search function is poorly coded, and the data it sends back to the user’s web browser doesn’t escape properly (this is a type of sanitization, which we discussed in the earlier Sanitization and validation section). If the input field hasn’t been encoded to escape properly by the developer, it essentially means that the server doesn’t know to only interpret inputs as pure data. If this is the case, it may execute code when it is entered into the search field.

Put another way, when the website receives inputs tags like <img> or <script> in the search field, they should be escaped, and the characters should be converted so that they are read as data rather than code. When a website doesn’t escape correctly, the <img>, <script> and other tags may be recognized as code, and the entire script could run. If an attacker does this with a malicious script, it can compromise the website's security.

Let’s say an attacker comes across badsite.com and notices that it’s vulnerable. Instead of searching for something benign, like “flowers”, the attacker could enter something like <script>bigattack</script>. Let’s say that <script>bigattack</script> is a script that leads victims to attackersite.com. On the attacker’s website, the attacker has a program that steals authentication cookies.

While badsite.com should escape <script>bigattack</script>” so that it can’t run as code, it does not. Instead, it executes.

So the attacker has:

Found a vulnerable website.
Prepared a script that links to their own website, attackersite.com.
Attackersite.com hosts software that can steal cookies.

The next step of a reflected attack is to lure victims in. There are a few different ways to do this, such as by tricking them into submitting a malicious form. One of the most common techniques is to send out phishing emails.

The attacker crafts an email, hoping to entice unsuspecting victims into clicking. It might look a little bit like this:

Check out these flowers, they’re the most beautiful one’s I’ve ever seen: badsite.com/search?q=flowers<script>bigattack</script>

Our example URL is pretty obvious as a scam, but it could be more thoroughly disguised with percent encoding. The more time and effort an attacker puts into the message, the more believable they can make it seem. They then send out the email en-mass, in the hope that some people will click the link.

The victim takes the bait

Let’s say that Victim 1 has an account with badsite.com. He isn’t the most internet savvy person and he absolutely loves flowers. He receives the email from the attacker and clicks on it straight away, eager to see such beautiful flowers. Clicking on the link opens Victim 1’s web browser and goes to the badsite.com/search?q=flowers<script>bigattack</script> URL. As it does so, the script secretly runs in the background.

The script reaches out to attackersite.com, which then loads and runs the authentication cookie stealing program in Victim 1’s browser. Because badsite.com was so poorly coded, the cookie stealing program appears to be coming from badsite.com, and Victim 1’s browser allows it to run. The same-origin policy does not stop it.

The cookie stealing program copies Victim 1’s authentication cookie, then sends it to attackersite.com’s server. The attacker retrieves Victim 1’s authentication cookie from their server, then places it in their own browser.

Authentication cookies are used by websites to keep people logged in as they move across the site. When you log in to a site, the website’s server places a cookie with a unique identifier in your browser. Under normal circumstances, this allows you to securely go to other pages on the website, without having to log in each time. Because you have the unique cookie in your browser, the server knows that it is you, and that you have been authorized to access the site.

However, in our example, because of badsite.com’s terrible security, the attacker has managed to steal Victim 1’s cookie and plant it in their own browser. This means that the attacker can now log in to Victim 1’s account as if they were Victim 1. From this point, the attacker can do anything that Victim 1 can normally do, because the website’s server thinks that the attacker is victim 1.

The attacker could:

Run up a whole bunch of purchases on Victim 1’s account.
Steal his credit card details and use them elsewhere.
Steal his personal information and use it to commit fraud.
Try his username and password on other platforms, to see if they can gain access to other accounts.
Change his password.

With this level of access, there are many other ways that the attacker could make Victim 1’s life miserable. The attacker could even use the Victim 1’s account to spread the attack. They could use the platform’s chat function to send messages that also direct the recipients to the badsite.com/search?q=flowers<script>bigattack</script> URL.

Stored cross-site scripting attacks

In a stored cross-site scripting attack, the hacker injects a malicious script that is stored on the website’s servers, as opposed to the malicious links that are sent in reflected attacks. Hackers can insert these scripts into databases, or in things like comment fields, visitor logs, user profiles and messages in forums.

Attackers can inject code for stored XSS attacks in a variety of ways, depending on the input vulnerabilities they find. One example is that they could insert a malicious script as the bio on their profile.

If the website doesn’t sanitize inputs appropriately, the script would then be stored on the website’s server. If other website users then visit the attacker’s profile page, the website’s server will send them the page, including the malicious script.

When their browsers receive the content, they will interpret the script and execute it. This happens because the browsers think that it is legitimate code from the website. From this point, hackers can steer their victims toward their desired attack.

Stored cross-site scripting attacks only require an initial code injection from the hacker, which can then lead to successful attacks against large numbers of victims if done well. This is because the script renders automatically, and can target anyone who comes across it.

Stored cross-site scripting attacks can be especially dangerous, because they don’t require the additional effort of trying to target victims through email and other means, like reflected attacks do.

Stored XSS attacks were common on forums, because many of them allowed users to post HTML formatted messages. This made it easy for attackers to store their malicious code on the site and have it execute against victims that came across it.

Stored cross-site scripting attacks are also known as persistent XSS attacks. XSS attacks can also be divided by type number, however, there seems to be no consensus over which is Type 1 and which is Type 2. Due to this confusion, it’s probably best to stick to referring to them as persistent or stored XSS attacks, and reflected or non-persistent XSS attacks.

Stored cross-site scripting attack example

Let’s say that in this example, badsite.com has a stored cross-site scripting vulnerability in its forum. Because the site has been poorly coded, it doesn’t validate user inputs properly when they post messages there.

In normal forum posts, we might expect friendly conversation, some troll advice, or perhaps two users spending weeks arguing over pointless semantics. However, due to badsite.com’s poor input validation, if posts include HTML tags, they can be added to the website’s source code.

An attacker notices this vulnerability and decides to take advantage of it. Under a thread about flowers, the attacker writes out the following post, and presses submit to send it to the server:

These flowers are amazing! <script>bigattack</script>

The script links to attackersite.com, where once again the attacker hosts authentication cookie stealing malware.

A well-designed website should sanitize the posts and strip out the script tags, or at least ensure that they don’t work. However, badsite.com is a bad site after all, so the script runs whenever any forum visitors load the page that features the attacker’s post.

Let’s say that Victims 1, 2, 3, 4 and 5 load the page before the site administrators have a chance to take it down. When each of the five victims loads the page, the script in the attacker’s comment runs.

It circumvents the same-origin policy and reaches out to attackersite.com, which loads and runs the authentication cookie stealing program in each of the victim’s browsers. The cookie stealing program copies the authentication cookies from each of the five victims, then sends them to the attackersite.com server.

Once the authentication cookies are on attackersite.com’s server, the attacker can use them to log in as each of the five victims. They can then launch any of the attacks that we discussed in the Reflected cross-site scripting attack example section against them.

How to defend against cross-site scripting (XSS) attacks

Cross-site scripting attacks are one of the most pervasive online threats. However, there are a range of relatively simple steps that website developers can take to protect their users. Site visitors can also do things like block scripts to protect themselves.

HTML escaping

One of the most important steps for preventing XSS attacks is to secure any user inputs. A key component of this is through HTML escaping. By setting up your website to escape, problematic characters like “<” and “>” will be converted to “<” and “>”, respectively.

If an attacker enters code into the input field like <script>bigattack</script>, it will be interpreted as “<script>bigattack</script>” instead, which means that it won’t execute as code.

Validation

Validation involves filtering user inputs to remove any potentially malicious aspects. With validation, you can allow some of the useful and more harmless HTML elements like <strong> and <em>, while filtering out the more dangerous ones like <script>.

The validation process begins with categorization, where inputs are compared against either a whitelist or a blacklist. While blacklisting may seem like an obvious choice for filtering out potentially harmful inputs, blacklisting can be incredibly complex and require frequent updates to stay on top of the latest threats.

As a result, whitelisting is the preferred method. When new vulnerabilities emerge, they are less of a threat because they won’t already be on the whitelist and therefore won’t be allowed. In contrast, the absence of these new threats on a blacklist would allow them to get through.

After the categorization step, the input will be processed. The input may be allowed, rejected, or sanitized, depending on how it compares to either the whitelist or the blacklist. Sanitization can leave the input mostly as is, but it can strip out problematic characters like < and >, and sometimes other symbols like hyphens.

Content security policy (CSP)

Content security policies reduce the threat from cross-site scripting attacks because they specify which domains the browser should consider as safe sources to run executable scripts from. Content security policies ensure that browsers will only execute scripts if they are loaded in source files that have been received from allowed domains. All other scripts are ignored.

Content security policies also allow servers to specify which protocols are allowed. As an example, they can require all content to be loaded over HTTPS, which is encrypted, instead of HTTP.

Content security policies are configured by adding the Content-Security-Policy HTTP header to a web page, then assigning it values that control the resources that can be loaded for the page. As an example, it could be designed to restrict form action to a specific endpoint, but allow images to be loaded from anywhere.

Securing cookies

Cross-site scripting attacks often aim to steal authentication cookies. This means that securing cookies can help to reduce the harm from these attacks. One method for mitigating the threat is to tie authentication cookies to each users’ IP address, and only allow that IP address to use the cookie. While this technique can be effective for blocking cookie theft in most scenarios, it’s not foolproof.

Another option is to set cookies to the SameSite=Strict parameter, which prevents cross-origin requests.

Individual protection: blocking scripts

Individuals can protect themselves from cross-site scripting attacks by disabling scripts in their browsers. This means that if they visit a website that hosts malicious scripts, they would not be susceptible to cross-site scripting attacks.

Scripts can be disabled by blocking all scripts in the browser, or by blocking scripts on a site-by-site basis with an add-on like No-Script. However, blocking scripts will hamper the functionality of many websites, and make some sites completely unusable.

Staying secure from cross-site scripting attacks

Cross-site scripting vulnerabilities are one of the most prominent security vulnerabilities online. They can have significant ramifications for website visitors, so it’s critical that developers take the time to mitigate these threats.

Despite the complex client-side environment, there are a range of simple steps that developers can take to keep their websites and their visitors secure, such as those listed above.

If developers neglect the security threats of cross-site scripting attacks, they could cause serious harm to their users, who may have their accounts taken over, their sensitive data stolen, and worse. Ultimately, this can damage a website’s reputation, and may have significant ramifications for its long-term success.

Cross-site Scripting Attacks FAQs

How does a cross-site scripting attack work?

Cross-site scripting attacks work by exploiting vulnerabilities in web applications that allow untrusted user input to be executed by other users. By injecting malicious code, an attacker can gain access to user data or credentials, modify web content, or perform other unauthorized actions.

What are some common types of cross-site scripting attacks?

Reflected XSS: where the attacker injects malicious code into a web page that is then reflected back to the user's browser, typically through a search or form field
Stored XSS: where the attacker injects malicious code into a web page that is then stored on the server and executed when other users view the page
DOM-based XSS: where the attacker injects malicious code that is executed in the browser's Document Object Model (DOM)

What are some best practices for preventing cross-site scripting attacks?

Input validation and sanitization: all user input should be validated and sanitized to prevent injection of malicious code
Output encoding: all output should be encoded to prevent injection of untrusted content into web pages
Content security policy (CSP): a policy that limits the types of content that can be executed by a web page, preventing execution of malicious code
Regular updates and patches: web applications should be kept up to date with the latest security patches and updates to prevent exploitation of known vulnerabilities

How can I test for cross-site scripting vulnerabilities?

There are several tools available for testing web applications for cross-site scripting vulnerabilities, including OWASP ZAP, Burp Suite, and Netsparker. These tools can scan web pages for vulnerabilities and provide recommendations for remediation.

What should I do if my web application is vulnerable to cross-site scripting attacks?

If your web application is vulnerable to cross-site scripting attacks, you should take immediate action to remediate the vulnerability. This may include implementing input validation and encoding, deploying a content security policy, or patching known vulnerabilities. You should also monitor your web application for signs of malicious activity and be prepared to respond quickly to any incidents.