dslreports logo
 
    All Forums Hot Topics Gallery
spc
Search similar:


uniqs
9994

Jafo232
You Can't Spell Democrat Without Rat.
Premium Member
join:2002-10-17
Boonville, NY

Jafo232

Premium Member

PHP: Get TLD from URL

Regex has never been my expertise, so I am going to ask here.

I have a form where a URL is being submitted. My task is to parse the URL and get just the top level domain (i.e. dslreports.com) and save that value. I cannot seem to find any decent existing code that does this in a bullet proof way. Anyone have any ideas?

JAAulde
Web Developer
MVM
join:2001-05-09
Frederick, MD
ARRIS SB6141
Ubiquiti EdgeRouter Lite
Ubiquiti UniFi AP

1 edit

JAAulde

MVM

Code and output: »codepad.org/LSQ1VyyL

Edit: this could be done with a regex, but URLs can vary so widely that I'd rather take a few steps and use built in functions where possible. I think this is more reliable in this case.

Jafo232
You Can't Spell Democrat Without Rat.
Premium Member
join:2002-10-17
Boonville, NY

Jafo232

Premium Member

said by JAAulde:

Code and output: »codepad.org/LSQ1VyyL

Edit: this could be done with a regex, but URLs can vary so widely that I'd rather take a few steps and use built in functions where possible. I think this is more reliable in this case.
I tried that code and the output was:

"com"

JAAulde
Web Developer
MVM
join:2001-05-09
Frederick, MD
ARRIS SB6141
Ubiquiti EdgeRouter Lite
Ubiquiti UniFi AP

1 recommendation

JAAulde

MVM

said by Jafo232:

said by JAAulde:

Code and output: »codepad.org/LSQ1VyyL

Edit: this could be done with a regex, but URLs can vary so widely that I'd rather take a few steps and use built in functions where possible. I think this is more reliable in this case.
I tried that code and the output was:

"com"
Which would be the TLD

Jafo232
You Can't Spell Democrat Without Rat.
Premium Member
join:2002-10-17
Boonville, NY

1 edit

Jafo232

Premium Member

Actually your code is right, what I asked for is wrong. I need the domain name with the TLD, example:

http://www.dslreports.com/some/path/to/a/file.ext?name=value
 

To return just:

dslreports.com

Same with any URL:

http://somesubdomain.domain.com/path/to/somewhere
 

JAAulde
Web Developer
MVM
join:2001-05-09
Frederick, MD

1 edit

1 recommendation

JAAulde

MVM

Ahh, I see. Very well then:

»codepad.org/UXGS161q

If both TLD and "top level host" are found, you should get them back. Otherwise you get an empty string.

Jafo232
You Can't Spell Democrat Without Rat.
Premium Member
join:2002-10-17
Boonville, NY

Jafo232

Premium Member

Thanks JAAulde! I will try this out. Hopefully it will work better than some of the other code I have tinkered with.
Jafo232

Jafo232 to JAAulde

Premium Member

to JAAulde
JAAulde, shot your a private message..
vollie
join:2009-01-15

1 edit

vollie to Jafo232

Member

to Jafo232
ok, so i was looking for a straight-out answer to your question, then figured it out myself, and figured i might as well post it...
So, without further ado:

(?:.+://)?(?:[^/]*\.)?([^/]+\.[^/]+)(?:/?.*)?
replaced by $1

will strip anything other then the bla.bla before the / in any url, reliable as hell.

Oh, since you're using php; don't forget the starting and ending thing so like:
preg_replace('@(?:.+://)?(?:[^/]*\.)?([^/]+\.[^/]+)(?:/?.*)?@', '$1', $url);

(PS. if you do happen to find some way to beat it, please inform me )

PetePuma
How many lumps do you want
MVM
join:2002-06-13
Arlington, VA

PetePuma

MVM

This might not work for .co.uk URLs, where you want the last *3* parts of the URL. The country domains are a tricky bunch, and are not consistent.

PToN
Premium Member
join:2001-10-04
Houston, TX

PToN to Jafo232

Premium Member

to Jafo232
»regexlib.com
vollie
join:2009-01-15

vollie to PetePuma

Member

to PetePuma
Mh, forgot about that one, thanks. Luckily I needed the subdomain as well (whereby (?:.+://)?([^/]+)(?:/?.*)? does the trick), but in this case it seems one needs to stick with JAAulde's answer indeed.

jayco4376
Premium Member
join:2001-08-11
Lincoln, NE

2 edits

jayco4376 to Jafo232

Premium Member

to Jafo232
Not quite the PHP expert myself, but wouldn't something like this work just fine: »us.php.net/function.parse-url

Edit: Nevermind, just saw the actual code sample that was posted, and it used parse_url. My bad. I was looking at the question and the replies using regex and I know Python and Ruby both have tools to manage urls and uris, figured PHP should have one too.

JAAulde
Web Developer
MVM
join:2001-05-09
Frederick, MD

JAAulde to Jafo232

MVM

to Jafo232
Since this topic keeps popping back up, I've modified my previous code to allow for a second parameter which limits the output to a particular number of nodes.
»codepad.org/4lyu1XMh

Jafo232
You Can't Spell Democrat Without Rat.
Premium Member
join:2002-10-17
Boonville, NY

Jafo232

Premium Member

Seems to cause issues when dealing with domains like:

somewhere.co.uk..

Any idea of a work around?

PetePuma
How many lumps do you want
MVM
join:2002-06-13
Arlington, VA

PetePuma

MVM

I can't think of a one-size-fits-all solution that doesn't incorporate a list of all possible TLDs and adjusts accordingly. And even then, it's probably not precise.

JAAulde
Web Developer
MVM
join:2001-05-09
Frederick, MD
ARRIS SB6141
Ubiquiti EdgeRouter Lite
Ubiquiti UniFi AP

JAAulde to Jafo232

MVM

to Jafo232
I forgot an ELSE... look at »codepad.org/NGlABcAC
You can add more logic if you don't want ANY invlaid domains being returned, even when max node count is greater than 0.

But like PetePuma See Profile said, there is no catch all for this. Some of it is going to have to be implementation specific.