Jafo232You Can't Spell Democrat Without Rat. Premium Member join:2002-10-17 Boonville, NY |
Jafo232
Premium Member
2008-Dec-17 11:13 am
PHP: Get TLD from URLRegex has never been my expertise, so I am going to ask here. I have a form where a URL is being submitted. My task is to parse the URL and get just the top level domain (i.e. dslreports.com) and save that value. I cannot seem to find any decent existing code that does this in a bullet proof way. Anyone have any ideas? |
|
|
JAAuldeWeb Developer MVM join:2001-05-09 Frederick, MD ARRIS SB6141 Ubiquiti EdgeRouter Lite Ubiquiti UniFi AP
1 edit |
Code and output: » codepad.org/LSQ1VyyLEdit: this could be done with a regex, but URLs can vary so widely that I'd rather take a few steps and use built in functions where possible. I think this is more reliable in this case. |
|
Jafo232You Can't Spell Democrat Without Rat. Premium Member join:2002-10-17 Boonville, NY |
Jafo232
Premium Member
2008-Dec-17 11:32 am
said by JAAulde:Code and output: » codepad.org/LSQ1VyyLEdit: this could be done with a regex, but URLs can vary so widely that I'd rather take a few steps and use built in functions where possible. I think this is more reliable in this case. I tried that code and the output was: "com" |
|
JAAuldeWeb Developer MVM join:2001-05-09 Frederick, MD ARRIS SB6141 Ubiquiti EdgeRouter Lite Ubiquiti UniFi AP
1 recommendation |
said by Jafo232:said by JAAulde:Code and output: » codepad.org/LSQ1VyyLEdit: this could be done with a regex, but URLs can vary so widely that I'd rather take a few steps and use built in functions where possible. I think this is more reliable in this case. I tried that code and the output was: "com" Which would be the TLD |
|
Jafo232You Can't Spell Democrat Without Rat. Premium Member join:2002-10-17 Boonville, NY 1 edit |
Jafo232
Premium Member
2008-Dec-17 11:35 am
Actually your code is right, what I asked for is wrong. I need the domain name with the TLD, example: http://www.dslreports.com/some/path/to/a/file.ext?name=value
To return just: dslreports.com Same with any URL: http://somesubdomain.domain.com/path/to/somewhere
|
|
JAAuldeWeb Developer MVM join:2001-05-09 Frederick, MD 1 edit
1 recommendation |
Ahh, I see. Very well then: » codepad.org/UXGS161qIf both TLD and "top level host" are found, you should get them back. Otherwise you get an empty string. |
|
Jafo232You Can't Spell Democrat Without Rat. Premium Member join:2002-10-17 Boonville, NY |
Jafo232
Premium Member
2008-Dec-17 11:44 am
Thanks JAAulde! I will try this out. Hopefully it will work better than some of the other code I have tinkered with. |
|
Jafo232 |
to JAAulde
JAAulde, shot your a private message.. |
|
1 edit |
to Jafo232
ok, so i was looking for a straight-out answer to your question, then figured it out myself, and figured i might as well post it... So, without further ado: (?:.+://)?(?:[^/]*\.)?([^/]+\.[^/]+)(?:/?.*)? replaced by $1 will strip anything other then the bla.bla before the / in any url, reliable as hell. Oh, since you're using php; don't forget the starting and ending thing so like: preg_replace('@(?:.+://)?(?:[^/]*\.)?([^/]+\.[^/]+)(?:/?.*)?@', '$1', $url); (PS. if you do happen to find some way to beat it, please inform me ) |
|
PetePumaHow many lumps do you want MVM join:2002-06-13 Arlington, VA |
This might not work for .co.uk URLs, where you want the last *3* parts of the URL. The country domains are a tricky bunch, and are not consistent. |
|
PToN Premium Member join:2001-10-04 Houston, TX |
to Jafo232
|
|
|
to PetePuma
Mh, forgot about that one, thanks. Luckily I needed the subdomain as well (whereby (?:.+://)?([^/]+)(?:/?.*)? does the trick), but in this case it seems one needs to stick with JAAulde's answer indeed. |
|
jayco4376 Premium Member join:2001-08-11 Lincoln, NE 2 edits |
to Jafo232
Not quite the PHP expert myself, but wouldn't something like this work just fine: » us.php.net/function.parse-urlEdit: Nevermind, just saw the actual code sample that was posted, and it used parse_url. My bad. I was looking at the question and the replies using regex and I know Python and Ruby both have tools to manage urls and uris, figured PHP should have one too. |
|
JAAuldeWeb Developer MVM join:2001-05-09 Frederick, MD |
to Jafo232
Since this topic keeps popping back up, I've modified my previous code to allow for a second parameter which limits the output to a particular number of nodes. » codepad.org/4lyu1XMh |
|
Jafo232You Can't Spell Democrat Without Rat. Premium Member join:2002-10-17 Boonville, NY |
Jafo232
Premium Member
2009-Jan-26 2:37 pm
Seems to cause issues when dealing with domains like:
somewhere.co.uk..
Any idea of a work around? |
|
PetePumaHow many lumps do you want MVM join:2002-06-13 Arlington, VA |
I can't think of a one-size-fits-all solution that doesn't incorporate a list of all possible TLDs and adjusts accordingly. And even then, it's probably not precise. |
|
JAAuldeWeb Developer MVM join:2001-05-09 Frederick, MD ARRIS SB6141 Ubiquiti EdgeRouter Lite Ubiquiti UniFi AP
|
to Jafo232
I forgot an ELSE... look at » codepad.org/NGlABcACYou can add more logic if you don't want ANY invlaid domains being returned, even when max node count is greater than 0. But like PetePuma said, there is no catch all for this. Some of it is going to have to be implementation specific. |
|