To clarify, I’m looking for a decent regular expression to validate URLs that were entered as user input with. I have no interest in parsing a list of URLs from a given string of text (even though some of the regexes on this page are capable of doing that). /(((http|ftp|https):\/) (([0-9a-z_-] \.) (aero|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cu|cv|cx|cy|cz|cz|de|dj|dk|dm|do|dz|ec|ee|eg|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mn|mn|mo|mp|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|nom|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ra|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sj|sk|sl|sm|sn|so|sr|st|su|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw|arpa)(:[0-9] )?
$_iu S I have added simple network ranges validation, the rules I used are: - valid range 18.104.22.168 - 22.214.171.124, network adresses above and including 126.96.36.199 are reserved addresses - first and last IP address of each class is excluded since they are used as network broadcast addresses since I don't think this is worth implementing completely in a regular expression, a following pass should exclude the Intranet address space: 10.0.0.0 - 10.255.255.255 172.16.0.0 - 1.255 192.168.0.0 - 192.168.255.255 the loopback and the automatic configuration address space: 127.0.0.0 - 127.255.255.255 169.254.0.0 - 169.254.255.255 while the local, multicast and and the reserved address spaces: 0.0.0.0 - 0.255.255.255 (SPECIAL-IPV4-LOCAL-ID-IANA-RESERVED) 188.8.131.52 - 239.255.255 (MCAST-NET) 240.0.0.0 - 255.255.255.255 (SPECIAL-IPV4-FUTURE-USE-IANA-RESERVED) should already be excluded by the above regular expression. Negative lookahead is used instead NOTE, that package does fuzzy search, not strict validation. If they copy it out and back into a browser so they may not know what's wrong with it upon visual inspection. I also don’t want to allow every possible technically valid URL — quite the opposite. See the URL Standard if you’re looking to parse URLs in the same way that browsers do. I did make one change: the "-*" in both domain and host was (incorrectly) succeeding against " so I changed it to "-?
However it will be very easy to add 'localhost' as an acceptable exception. :\.\d))" 'IP address dotted notation octets 'excludes loopback network 0.0.0.0 'excludes reserved space = 184.108.40.206 'excludes network & broacast addresses '(first & last IP address of each class) rxs = rxs "(?
@dperini, I'm using assert library to write a simple test for a js object in my rails app. Some of the uri formats as tested in @ixti spec above are failing to return false, here's the list. /") Ok @form Validators.uri(" Ok @form Validators.uri(" Ok @form Validators.uri(" Ok @form Validators.uri(" as a quick suggestion you can try something like: (? It's a start anyway Oire, yes I believe it would be a good idea to move this to a Git repo. :" 'IP address exclusion 'private & local networks rxs = rxs "(?!
This has been written to validate URLs typed by users and/or found in log files. :\x22|\x5b\x2f|\x3c\x2f) haven't tried it, not sure it does exactly what you asked/depicted.
@adamrofer, it seems the URL " you are testing against is actually a valid URL. Just test it, it exists and resolves correctly to a Georgia State page.
" - I'm not sure why that's in the gist above, I'd think it would fail on a JS unit test also.
// Regular expression for URLs // Based on // Improved to only pickup links begining with http https ftp ftps mailto and www $regex = "_(? The above is also true for decimal notations, various forms of IPV6 URLs and other "non-human" URLs.