Handling Relative URLs for Redirects / Forwards
• • ☕️ 7 min readIntroduction
The purpose of this post is to explore an approach to handling relative URLs safely for redirects and forwards. Many web security vulnerabilities that originate from unvalidated redirects and forwards are often remediated by restricting URLs. This restriction usually takes the form of an allow-list of known good absolute URLs in some capacity. See OWASP Validating URLs or Google Open Redirect for examples of this. Unfortunately, not all applications can adopt default allow-listing approach because the absolute URL may not be known ahead of time. This can cause friction as the one-size-fits all approach does not always work.
Notably, all of the code examples in this post can be found in the supporting url-parsing github repo.
Background
Objectively, URL parsing is difficult. There are many individual components that comprise a URL, and how each component interacts with one another can be confusing. For example, authority delegation in a URL goes way outside the scope of the average user. Orange Tsai presented A New Era of SSRF at Black Hat USA 2017 highlighting some of the problems that can arise, 10/10 research would recommend reading.
Basic URL Structure
The syntax and semantics of a URI are intentionally broad to create an extensible means for identifying resources. This introduces ambiguity as there are inconsistencies between URL parsers and the RFC2396 / RFC3986 specifications. WHATWG defined a contemporary implementation based on these specifications forming a standard. The following comporises URL Strings and URL Objects in JavaScript.
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│ href │
├──────────┬──┬─────────────────────┬─────────────────────┬───────────────────────────┬───────┤
│ protocol │ │ auth │ host │ path │ hash │
│ │ │ ├──────────────┬──────┼──────────┬────────────────┤ │
│ │ │ │ hostname │ port │ pathname │ search │ │
│ │ │ │ │ │ ├─┬──────────────┤ │
│ │ │ │ │ │ │ │ query │ │
" https: // user : pass @ sub.host.com : 8080 /p/a/t/h ? query=string #hash "
│ │ │ │ │ hostname │ port │ │ │ │
│ │ │ │ ├──────────────┴──────┤ │ │ │
│ protocol │ │ username │ password │ host │ │ │ │
├──────────┴──┼──────────┴──────────┼─────────────────────┤ │ │ │
│ origin │ │ origin │ pathname │ search │ hash │
├─────────────┴─────────────────────┴─────────────────────┴──────────┴────────────────┴───────┤
│ href │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
To demonstrate this at a code level, a URL can be parsed and accessed through a convinient object as seen below:
const { URL } = require('url');
var url = 'https://user:pass@sub.host.com:8080/p/a/t/h?query=string#has'
var newUrl = new URL(url);
console.log(newUrl)
Printing the URL object as was done above gives you an easy structure to access individual parsed URL components. Fortuantely this handles almost all of the heavy lifting for you.
URL {
href: 'https://user:pass@sub.host.com:8080//p/a/t/h?query=string#has',
origin: 'https://sub.host.com:8080',
protocol: 'https:',
username: 'user',
password: 'pass',
host: 'sub.host.com:8080',
hostname: 'sub.host.com',
port: '8080',
pathname: '//p/a/t/h',
search: '?query=string',
searchParams: URLSearchParams { 'query' => 'string' },
hash: '#has'
}
Relative URL Structure
Now that we have had a brief refresher on generic URL structure lets drill into the relative portion of the URL. The RFC3986 - URI Genric Syntax defines a Relative Reference as:
A relative reference takes advantage of the hierarchical syntax to
express a URI reference relative to the name space of another
hierarchical URI.
relative-ref = relative-part [ "?" query ] [ "#" fragment ]
relative-part = "//" authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty
The URI referred to by a relative reference, also known as the target
URI
A relative reference that begins with two slash characters is termed
a network-path reference; such references are rarely used. A
relative reference that begins with a single slash character is
termed an absolute-path reference. A relative reference that does
not begin with a slash character is termed a relative-path reference.
For clarification, the “authority” part above refers to RFC3986 - Authority. As seen below, the authority is defined as:
Many URI schemes include a hierarchical element for a naming
authority so that governance of the name space defined by the
remainder of the URI is delegated to that authority (which may, in
turn, delegate it further). The generic syntax provides a common
means for distinguishing an authority based on a registered name or
server address, along with optional port and user information.
The authority component is preceded by a double slash ("//") and is
terminated by the next slash ("/"), question mark ("?"), or number
sign ("#") character, or by the end of the URI.
authority = [ userinfo "@" ] host [ ":" port ]
URI producers and normalizers should omit the ":" delimiter that
separates host from port if the port component is empty. Some
schemes do not allow the userinfo and/or port subcomponents.
If a URI contains an authority component, then the path component
must either be empty or begin with a slash ("/") character. Non-
validating parsers (those that merely separate a URI reference into
its major components) will often ignore the subcomponent structure of
authority, treating it as an opaque string from the double-slash to
the first terminating delimiter, until such time as the URI is
dereferenced.
To demonstrate the text above, the following code snippet from Mozilla demonstrates the concepts well.
Full URL
https://developer.mozilla.org/en-US/docs/Learn
Implicit protocol
//developer.mozilla.org/en-US/docs/Learn
In this case, the browser will call that URL with the same protocol as the
one used to load the document hosting that URL.
Implicit domain name
/en-US/docs/Learn
This is the most common use case for an absolute URL within an HTML document.
The browser will use the same protocol and the same domain name as the one
used to load the document hosting that URL. Note: it isn't possible to omit
the domain name without omitting the protocol as well.
Attacking Implementations
If this problem wasn’t complicated enough, browsers also add some additional complexity. Modern browsers automatically convert back slashes (\
) into forward slashes (/
) despite this being against RFC3986 - URI Genric Syntax. In addition, the @
character can be used to define a target host redirecting the victim to a new domain, this type of attack is defined as Semantic Attacks.
The dangerous characters and encoded versions can be seen below:
127.0.0.1:3000?nextUrl=//nikola.dev
127.0.0.1:3000?nextUrl=/%2Fnikola.dev
127.0.0.1:3000?nextUrl=%2F%2Fnikola.dev
127.0.0.1:3000?nextUrl=\\nikola.dev
127.0.0.1:3000?nextUrl=\%5Cnikola.dev
127.0.0.1:3000?nextUrl=%5C%5Cnikola.dev
Interestingly, the \
and /
characters (and URL encoded equivalents) can repeat and are interchangable. The following is a valid payload:
http://127.0.0.1:3000/?nextUrl=/%5C/%5C/\%2F\/\%2F\/\%2F\/nikola.dev
Attackers can use this to bypass filters also depending on the underlying logic, for example if the nextUrl
must have example.com
this can be bypassed:
127.0.0.1:3000?nextUrl=//example.com%40nikola.dev
127.0.0.1:3000?nextUrl=//example.com@nikola.dev
Fortunately, as defined in the WHATWG Goals, if a url contains percent-encoded bytes it returns percent-decode. This means we do not need to worry about the percent encoded versions as they are canonicalised by the library on our behalf. An example of this can be seen below:
node app.js
Server running at http://127.0.0.1:3000/
URL Requested
Raw url: /?nextUrl=/nikola.dev
Parsed nextUrl parameter: /nikola.dev
URL Requested
Raw url: /?nextUrl=%2Fnikola.dev
Parsed nextUrl parameter: /nikola.dev
Recommended Approach
Much like any untrusted user input, relative URLs should be canonicalised, sanitised, and then validated - in that order. Canonicalisation and sanitation should be done through established URL parsing libraries such as URL Node package that follow the WHATWG standard. The output of these operations should then be validated using a strict pattern, only allowing required characters. Dangerous characters such as @
, #
and multiple /
characters should not be on the allow list. An example of such validation can be seen below:
exports.isUrlValid = function (relUrl) {
console.log("Handling: " + relUrl)
var re = new RegExp("^(\/[a-zA-Z0-9]+){0,}$");
if (re.test(relUrl)) {
return true
} else {
return false
}
};
Where possible handle absolute URLs to avoid introducing unnecessary complexity, OWASP Validating URLs is a great resource on such solutions.
Conclusion
This post presented a novel approach to handling relative URLs for redirects and forwards. The examples presented are by no means comprehensive, this is just touching the surface of the problem. However, the examples demonstrated did give insight into the approach attackers take when examining forward / redirect logic. All of the code examples in this post can be found in the supporting url-parsing github repo.