
17th
May 09
May 09
PHP DOM Document Class
The PHP DOMDocument class in PHP allows you to manipulate HTML and XML files by getting elements, creating elements, validating pages, saving pages and more.
It’s a very useful class and makes regular expressions redundant when it comes to searching for elements in a web page. In the case you need to programmatically perform something such as anchor extracting, the DOMDocument class is better than regular expressions as HTML files (in particular) can be formed and structured in various ways.
Quick Example
In this quick example we’ll use the DOM Document class to extract links from a remote page, something I’ve had to do recently.
For the sake of simplicity, and to keep the length of code minimal, we’ll use the file_get_contents function to read the remote file. In a real scenario you’ll most likely want to get the remote content with another function or something like the cURL library.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | <?php // Get remote file $htmldata = file_get_contents("http://www.bbc.co.uk") or die ("Couldn't get data"); // Instantiate the class $dom = new DOMDocument(); // Load the remote file if (!@$dom->loadHTML($htmldata)) die ("Couldn't load file?"); // Get the hyperlinks $anchors = $dom->getElementsByTagName("a"); // Cycle through and output the links foreach($anchors as $anchor) { echo $anchor->getAttribute("href"), "<br />"; } ?> |






