How To: Parse XML with PHP5

One of the most common things web coders run into is the need to parse some type of XML file. Many web services return API calls in XML format, so it's just handy to know how to parse these results quickly. With PHP4 you usually have to rely on some large parsing library to get the job done or deal with overly complicated PHP functions, but PHP 5 has a great extension called SimpleXML.

When I say parsing XML, I'm talking about navigating through XML markup to return data of interest. For example, let's take a look at the Yahoo! geocoding API. With the geocoding API you can call a specially crafted request URL with parameters, such as city and state, to receive latitude and longitude coordinates which come in handy when creating mapping mashups.

Here is an example call to the geocoding service to get the latitude and longitude of Atlanta, GA. http://api.local.yahoo.com/MapsService/V1/geocode?appid=demo&location=atlanta+ga

This is the XML output for that call:

Yahoo! Geocoding XML

Typical SimpleXML Usage

If we only want to receive the latitude and longitude from the XML result, we can quickly do that with SimpleXML. First we need to load the XML file, which in this case is the special Yahoo! url. request_url = "http://api.local.yahoo.com/MapsService/V1/geocode?appid=demo&location=atlanta ga"; xml = simplexml_load_file(request_url) or die("feed not loading");

The function simplexml_load_file() loads the external XML file. If for some reason that file cannot be accessed or reached, die() cancels the file loading and displays an error. At this point, you can see if the file has been loaded by running:

var_dump(xml);

This displays the SimpleXMLObject structure currently loaded from the XML file into the xml variable. If you want to view it in a more orderly fashion, wrap that var_dump line with the pre tag:

echo "<pre>"; var_dump(xml); echo "</pre>";

Now we can traverse the XML markup and pull out the latitude and longitude. This particular XML has a simple structure, with each result residing inside the Result tag, so we can access those attributes like this:

latitude = xml->Result->Latitude; longitude = xml->Result->Longitude;

From here you can do whatever you want with the data, most likely display it with echo.

Alternate SimpleXML Method

It came to my attention while working on a group computer science project that some servers, such as those at Dreamhost, don't allow for PHP functions that require URL file-access - which is what our previous method did with simplexml_load_file. For this, we can resort to cURL, a command line tool for transferring files with URL syntax that is frequently used when scraping pages with PHP and other similar tasks.

First, we will grab the XML file's content via a cURL transfer and store it as a variable (data in this case). This time, I'll be using a different request URL for a different API, Yahoo! weather which takes in a zip code or location id. request_url = "http://weather.yahooapis.com/forecastrss?p=USGA0028"; ch = curl_init(); timeout = 5; curl_setopt(ch, CURLOPT_URL, request_url); curl_setopt(ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt(ch, CURLOPT_CONNECTTIMEOUT, timeout); data = curl_exec(ch); curl_close(ch);

So now that we have data filled with the Yahoo! geocoding XML result, we need to feed it to SimpleXML somehow. This should get the job done:

xml = new SimpleXMLElement(data);

However, the XML markup from the Yahoo! weather API is considerably more complex. Load this and view source to see the XML. Let's say I'm looking for the temperature which is stored under the yweather:condition XML tag. This time, I will use another method for traversing the structure - XPath.

temp_f = xml->xpath('//yweather:condition/@temp'); temp_f = temp_f[0];

XPath is a markup language using path expressions to select nodes and node-sets. The initial double forward slashes select the yweather:condition node without having to specify exactly where it is (within the channel node), then I use a forward slash once more in addition to an @ sign to grab the temp attribute of yweather:condition. Since that returns an array with one element in it, I need to use the second PHP line to select that element within the array, hence the [0].

XPath is a powerful form of XML (XML Path Language) - "xpath is the future" as Dustin told me.

Dealing with Intricate XML

So far, I have only dealt with relatively simple XML structures that pretty much only have one level of data. Not every API returns quite so easy to use XML. Taking a snippet from WordPress.com XML file that powers their public stats charts as an example: http://wordpress.com/public-charts/common.php?d=posts.

<chart>  <chart_data>   <row>     <string>2006-12-28</string>     <string>2006-12-29</string>   </row>   <row>     <number>51824</number>     <number>56577</number>   </row>  </chart_data> </chart>

Within the chart_data tag there are two row elements, the first for the date and the second for the number of posts on WordPress.com. If you wanted to access the second one and assuming you utilized the same SimpleXML methods above, you could do the following:

posts = xml->chart_data->row[1];

For this we had to utilize the array notation of brackets to specify which row we wanted to access. Alternatively, if you wanted to access the date row, you would do the same but put a 0 in place of the 1 in the brackets. Whenever there are multiple elements within one node, you must use brackets and a number to specify which element you want.

If you find yourself in the situation that you wish to read each element within a certain node - eg, if there were hundreds of items inside of the first date row, you can use a foreach loop.

foreach(xml->chart_data->row[0] as item){    echo item."<br/>"; }

Going a bit further, if you wanted to go through each row you could put an incrementing numeric variable in place of the number in brackets.

for(i=0;i<sizeof(xml->chart_data->row);i++){    foreach(xml->chart_data->row[i] as item){       echo item."<br/>";    } }

You can also do more involved things like set up each row item in an array corresponding to the other row item. Since this XML file deals with a date that is related to a post number, it makes sense to create an array structure linking the two values. However, that's a bit out of the scope of this article.

One last thing I need to cover is how you can access attributes within an XML tag itself. For this I will be using this feed as an example: http://deli.ckoma.net/stats/export_posts_daily (Sorry it appears this website no longer exists: 11-5-09). It's a privately maintained XML file that contains the estimated number of links saved to Yahoo!'s del.icio.us bookmarking service per day. Here's how the XML looks like for one day:

<stats>     <stat date="2005-08-01" estimated_posts="34454" std_deviation="10034" tolerance_upper="50960" tolerance_lower="17948" recorded_posts="4290" tag_distribution="1681 939 649 382 234 173 94 54 21 17 8 5 5 0 2 4 10 1 1 8 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"/> </stats>

To grab the date I would simply do:

date = xml->stat['date'];

But what if I wanted the date for the 5th stat element? Just add on a bracket and specify the element.

date = xml->stat[4]['date'];

With XML and most computer related things, counting starts at zero so to get the 5th element I used the number 4. Overall, to grab an attribute (such as the date I just showed) you simply use the bracket notation but instead of using a number within the brackets you type in the name of the attribute, wrapped in quotes. (You can also do that XPath stuff with @.)

Wrapup

Hopefully this gives you a good look into the world of XML parsing with PHP5's SimpleXML extension. With the powerful ability to parse any XML file, you can start tinkering away at various APIs and mashups. I wrote this all while watching TV so let me know if you see any errors.