Recolher Links das páginas através de PHP

DarkWolfXP

Power Member
Boas, eu gostaria de saber se é possivel recolher links das páginas através de PHP??
Eu queria fazer um pequeno script que recolhia os links de cada página que introduzisse.
Eu ja tenho um script que faz isso mas no entanto é muito extenso...
Código:
    function get_links($file, $url, $can_leave_domain, $base) {
    	$chunklist = array ();
        // The base URL comes from either the meta tag or the current URL.
        if (!empty($base)) {
            $url = $base;
        }
        
    	$links = array ();
    	$regs = Array ();
    	$checked_urls = Array();
        
    	$file = preg_replace("@<!--.*?-->@si", " ",$file);
    	preg_match_all("/href\s*=\s*[\'\"]?([+:%\/\?~=&;\\\(\),._a-zA-Z0-9-]*)(#[.a-zA-Z0-9-]*)?[\'\" ]?(\s*rel\s*=\s*[\'\"]?(nofollow)[\'\"]?)?/i", $file, $regs, PREG_SET_ORDER);
    	foreach ($regs as $val) {
    		if ($checked_urls[$val[1]]!=1 && !isset ($val[4])) { //if nofollow is not set
    			if (($a = url_purify($val[1], $url, $can_leave_domain)) != '') {
    				$links[] = $a;
    			}
    			$checked_urls[$val[1]] = 1;
    		}
    	}
    	preg_match_all("/(frame[^>]*src[[:blank:]]*)=[[:blank:]]*[\'\"]?(([[a-z]{3,5}:\/\/(([.a-zA-Z0-9-])+(:[0-9]+)*))*([+:%\/?=&;\\\(\),._ a-zA-Z0-9-]*))(#[.a-zA-Z0-9-]*)?[\'\" ]?/i", $file, $regs, PREG_SET_ORDER);
    	foreach ($regs as $val) {
    		if ($checked_urls[$val[1]]!=1 && !isset ($val[4])) { //if nofollow is not set
    			if (($a = url_purify($val[1], $url, $can_leave_domain)) != '') {
    				$links[] = $a;
    			}
    			$checked_urls[$val[1]] = 1;
    		}
    	}
    	preg_match_all("/(window[.]location)[[:blank:]]*=[[:blank:]]*[\'\"]?(([[a-z]{3,5}:\/\/(([.a-zA-Z0-9-])+(:[0-9]+)*))*([+:%\/?=&;\\\(\),._ a-zA-Z0-9-]*))(#[.a-zA-Z0-9-]*)?[\'\" ]?/i", $file, $regs, PREG_SET_ORDER);
    	foreach ($regs as $val) {
    		if ($checked_urls[$val[1]]!=1 && !isset ($val[4])) { //if nofollow is not set
    			if (($a = url_purify($val[1], $url, $can_leave_domain)) != '') {
    				$links[] = $a;
    			}
    			$checked_urls[$val[1]] = 1;
    		}
    	}
    	preg_match_all("/(http-equiv=['\"]refresh['\"] *content=['\"][0-9]+;url)[[:blank:]]*=[[:blank:]]*[\'\"]?(([[a-z]{3,5}:\/\/(([.a-zA-Z0-9-])+(:[0-9]+)*))*([+:%\/?=&;\\\(\),._ a-zA-Z0-9-]*))(#[.a-zA-Z0-9-]*)?[\'\" ]?/i", $file, $regs, PREG_SET_ORDER);
    	foreach ($regs as $val) {
    		if ($checked_urls[$val[1]]!=1 && !isset ($val[4])) { //if nofollow is not set
    			if (($a = url_purify($val[1], $url, $can_leave_domain)) != '') {
    				$links[] = $a;
    			}
    			$checked_urls[$val[1]] = 1;
    		}
    	}

    	preg_match_all("/(window[.]open[[:blank:]]*[(])[[:blank:]]*[\'\"]?(([[a-z]{3,5}:\/\/(([.a-zA-Z0-9-])+(:[0-9]+)*))*([+:%\/?=&;\\\(\),._ a-zA-Z0-9-]*))(#[.a-zA-Z0-9-]*)?[\'\" ]?/i", $file, $regs, PREG_SET_ORDER);
    	foreach ($regs as $val) {
    		if ($checked_urls[$val[1]]!=1 && !isset ($val[4])) { //if nofollow is not set
    			if (($a = url_purify($val[1], $url, $can_leave_domain)) != '') {
    				$links[] = $a;
    			}
    			$checked_urls[$val[1]] = 1;
    		}
    	}
        unset ($chunklist, $regs, $checked_urls);
    	return $links;
    }

Gostaria de saber se existe uma forma mais simplificada de recolher os links de uma página, caso seja possivel como?

Cumprimentos
 
Back
Topo