Tampilkan posting dengan label RFC. Tampilkan semua posting
Tampilkan posting dengan label RFC. Tampilkan semua posting

How To Recognize Valid URL from Text

Twitter and host of Twitter clients as well as tons of other programs have to recognize valid URL's (protocol defined in RFC 1738) from plain text and hyperlink them. Unfortunately dues tyo lazyness (or lack of knowledge) of the programmers such URL detection schemes are often hare-brained and fails torecognize valid URL's properly.

URL can use a wide variety of characters and you need to recognize all of them to properly identify and isolate an URL from surrounding text. Here is a simple guide for programmers (based on RFC 1738 obviously):
In general, URLs are written as follows:
:

   A URL contains the name of the scheme being used () followed
   by a colon and then a string (the ) whose
   interpretation depends on the scheme.

   Scheme names consist of a sequence of characters. The lower case
   letters "a"--"z", digits, and the characters plus ("+"), period
   ("."), and hyphen ("-") are allowed. For resiliency, programs
   interpreting URLs should treat upper case letters as equivalent to
   lower case in scheme names (e.g., allow "HTTP" as well as "http").