URL can use a wide variety of characters and you need to recognize all of them to properly identify and isolate an URL from surrounding text. Here is a simple guide for programmers (based on RFC 1738 obviously):
In general, URLs are written as follows:: A URL contains the name of the scheme being used ( ) followed by a colon and then a string (the ) whose interpretation depends on the scheme. Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").