php icon
17th
May 09

PHP DOM Document Class

The PHP DOMDocument class in PHP allows you to manipulate HTML and XML files by getting elements, creating elements, validating pages, saving pages and more.

It’s a very useful class and makes regular expressions redundant when it comes to searching for elements in a web page. In the case you need to programmatically perform something such as anchor extracting, the DOMDocument class is better than regular expressions as HTML files (in particular) can be formed and structured in various ways.

Quick Example

In this quick example we’ll use the DOM Document class to extract links from a remote page, something I’ve had to do recently.

For the sake of simplicity, and to keep the length of code minimal, we’ll use the file_get_contents function to read the remote file. In a real scenario you’ll most likely want to get the remote content with another function or something like the cURL library.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<?php
// Get remote file
$htmldata = file_get_contents("http://www.bbc.co.uk") or die ("Couldn't get data");
 
// Instantiate the class
$dom = new DOMDocument();
 
// Load the remote file
if (!@$dom->loadHTML($htmldata)) die ("Couldn't load file?");
 
// Get the hyperlinks 
$anchors = $dom->getElementsByTagName("a");
 
// Cycle through and output the links
foreach($anchors as $anchor)
{
	echo $anchor->getAttribute("href"), "<br />";
}
?>
SociBook del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon

No Comments

No comments yet.

TrackBack URL

Leave a comment