URL encoding dilemma

Tags: C#, .NET, Encoding

Categories: .NET, C#

Tags: .NET, C#

Ever had the necessity to encode the URL parameters from a console or library project and not a web project?

For this cases HttpUtility class from System.Web namespace would help us. But you need to add the System.Web reference (.NET 4.0). It contains the UrlEncode method which you can use but the problem is it encodes the space as +.

Let’s test with an example. I will use the following variables through all the examples:

string urlDomain = "http://www.myweb.com?text=";
string urlQuery = "some text";

Using UrlEncode in the following example

Console.WriteLine("{0}{1}", urlDomain, HttpUtility.UrlEncode(urlQuery));

Will give the following:

http://www.myweb.com?text=some+text

But I’m expecting to encode the space as %20, so http://www.myweb.com?text=some%20text is what I’m expecting.

To get this right I’m going to use another method:

HttpUtility.UrlPathEncode

Console.WriteLine("{0}{1}", urlDomain, HttpUtility.UrlPathEncode(urlQuery));

And the result is
http://www.myweb.com?text=some%20text
Right, the encoding is done alright.

But how about other characters, let’s change urlQuery

urlQuery = "from first>second";
Console.WriteLine("{0}{1}", urlDomain, HttpUtility.UrlPathEncode(urlQuery));

The result is:
http://www.myweb.com?text=from%20first>second

> is not escaped. Also other characters, like %, =, / etc are not escaped either.

UrlPathEncode encodes only spaces. I checked the source code and it encodes only spaces and only for schema, domain and path, i.e. it excludes the parameters, though everything which comes after ‘?’ symbol.

So, basically it does what the name says: UrlPathEncode – it encodes only the URL path which is good to use when you have for instance this kind of URL:

http://www.myweb.com/menu page/

But how about the query string?

The first which comes to mind is to use both methods, first use UrlPathEncode to encode the string and next apply UrlEncode. But this will encode the % twice and doesn’t look elegant at all. Also you will have to apply decoding twice.

There is a better way of doing this, using Uri.EscapeDataString

Uri class is part of the System library and wouldn’t even need to add the System.Web reference in your windows form or component library.

Console.WriteLine("{0}{1}", urlDomain, Uri.EscapeDataString(urlQuery));

Displays:

http://www.myweb.com?text=from%20first%3Esecond

By default, EscapeDataString implements the RFC 2396 which means that it encodes all the characters except for the unreserved ones, which are

= "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

If you need to escape accordingly to RFC 3987 then you need to enable support for International Resource Identifier (IRI), as per MSDN web site.

Or you can escape them manually:

urlQuery = urlQuery.Replace("(", "%28")
                .Replace(")", "%29")
                .Replace("!", "%21")
                .Replace("*", "%2A")
                .Replace("'", "%27");

And now you have the RFC 3987 implemented. Only these characters will not be escaped:

Alpha-numeric, "-", ".",  "_",  "~"

 

Conclusion

It is easy to get confused with all the encoding methods in .NET taking into account that documentation is not clear enough.

But it is easier than it looks like. Use HttpUtility.UrlPathEncode for the schema/domain/path and Uri.EscapeDataString for the query string.

Comments

comments powered by Disqus