Retrieving via POST and parsing webpage contents in PHP
Today I had set myself to the task of retrieving a page's contents via POST (where I had to submit some form parameters), and then extracting some pieces of information from the HTML so I could return it in an XML webservice.
Surprisingly, thanks to the library pecl_http and PHP's DOM parsing classes, this can result in a very simple, straightforward script. So if you are looking to do something similar, take a look at my script below for a headstart (which I think serves as a readable example):
// Get page contents
$fields = array(
'param1' => 'foo',
'param2' => 2,
);
$html = http_post_fields('http://www.example.com', $fields);
// Create DOM object
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->resolveExternals = true;
@$doc->loadHTML($html);
// Retrieve the TRs in question
$xpath = new DOMXPath($doc);
$rows = $xpath->query("//table [@class = 'important_class']/tr");
// Create XML
$xml = '<xml>';
foreach ($rows as $row) {
$td = $row->firstChild;
$xml .= '
<el>
<param1>'.$td->textContent.'</param1>
<param2>'.$td->nextSibling->textContent.'</param2>
</el>';
}
$xml .= '</xml>';
header('Content-type: text/xml');
echo $xml;
References