How To: Parse XML with PHP5

April 17, 2007 · 46 comments

One of the most common things web coders run into is the need to parse some type of XML file. Many web services return API calls in XML format, so it’s just handy to know how to parse these results quickly. With PHP4 you usually have to rely on some large parsing library to get the job done or deal with overly complicated PHP functions, but PHP 5 has a great extension called SimpleXML.

When I say parsing XML, I’m talking about navigating through XML markup to return data of interest. For example, let’s take a look at the Yahoo! geocoding API. With the geocoding API you can call a specially crafted request URL with parameters, such as city and state, to receive latitude and longitude coordinates which come in handy when creating mapping mashups.

Here is an example call to the geocoding service to get the latitude and longitude of Atlanta, GA.
http://api.local.yahoo.com/MapsService/V1/geocode?appid=demo&location=atlanta+ga

This is the XML output for that call:

Yahoo! Geocoding XML

Typical SimpleXML Usage

If we only want to receive the latitude and longitude from the XML result, we can quickly do that with SimpleXML. First we need to load the XML file, which in this case is the special Yahoo! url.


$request_url = "http://api.local.yahoo.com/MapsService/V1/geocode?appid=demo&location=atlanta ga";
$xml = simplexml_load_file($request_url) or die("feed not loading");

The function simplexml_load_file() loads the external XML file. If for some reason that file cannot be accessed or reached, die() cancels the file loading and displays an error. At this point, you can see if the file has been loaded by running:

var_dump($xml);

This displays the SimpleXMLObject structure currently loaded from the XML file into the $xml variable. If you want to view it in a more orderly fashion, wrap that var_dump line with the pre tag:

echo "<pre>";
var_dump($xml);
echo "</pre>";

Now we can traverse the XML markup and pull out the latitude and longitude. This particular XML has a simple structure, with each result residing inside the Result tag, so we can access those attributes like this:

$latitude = $xml->Result->Latitude;
$longitude = $xml->Result->Longitude;

From here you can do whatever you want with the data, most likely display it with echo.

Alternate SimpleXML Method

It came to my attention while working on a group computer science project that some servers, such as those at Dreamhost, don’t allow for PHP functions that require URL file-access – which is what our previous method did with simplexml_load_file. For this, we can resort to cURL, a command line tool for transferring files with URL syntax that is frequently used when scraping pages with PHP and other similar tasks.

First, we will grab the XML file’s content via a cURL transfer and store it as a variable ($data in this case). This time, I’ll be using a different request URL for a different API, Yahoo! weather which takes in a zip code or location id.

$request_url = "http://weather.yahooapis.com/forecastrss?p=USGA0028";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $request_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);

So now that we have $data filled with the Yahoo! geocoding XML result, we need to feed it to SimpleXML somehow. This should get the job done:

$xml = new SimpleXMLElement($data);

However, the XML markup from the Yahoo! weather API is considerably more complex. Load this and view source to see the XML. Let’s say I’m looking for the temperature which is stored under the yweather:condition XML tag. This time, I will use another method for traversing the structure – XPath.

$temp_f = $xml->xpath('//yweather:condition/@temp');
$temp_f = $temp_f[0];

XPath is a markup language using path expressions to select nodes and node-sets. The initial double forward slashes select the yweather:condition node without having to specify exactly where it is (within the channel node), then I use a forward slash once more in addition to an @ sign to grab the temp attribute of yweather:condition. Since that returns an array with one element in it, I need to use the second PHP line to select that element within the array, hence the [0].

XPath is a powerful form of XML (XML Path Language) – “xpath is the future” as Dustin told me.

Dealing with Intricate XML

So far, I have only dealt with relatively simple XML structures that pretty much only have one level of data. Not every API returns quite so easy to use XML. Taking a snippet from WordPress.com XML file that powers their public stats charts as an example: http://wordpress.com/public-charts/common.php?d=posts.

<chart>
 <chart_data>
  <row>
    <string>2006-12-28</string>
    <string>2006-12-29</string>
  </row>
  <row>
    <number>51824</number>
    <number>56577</number>
  </row>
 </chart_data>
</chart>

Within the chart_data tag there are two row elements, the first for the date and the second for the number of posts on WordPress.com. If you wanted to access the second one and assuming you utilized the same SimpleXML methods above, you could do the following:

$posts = $xml->chart_data->row[1];

For this we had to utilize the array notation of brackets to specify which row we wanted to access. Alternatively, if you wanted to access the date row, you would do the same but put a 0 in place of the 1 in the brackets. Whenever there are multiple elements within one node, you must use brackets and a number to specify which element you want.

If you find yourself in the situation that you wish to read each element within a certain node – eg, if there were hundreds of items inside of the first date row, you can use a foreach loop.


foreach($xml->chart_data->row[0] as $item){
   echo $item."<br/>";
}

Going a bit further, if you wanted to go through each row you could put an incrementing numeric variable in place of the number in brackets.


for($i=0;$i<sizeof($xml->chart_data->row);$i++){
   foreach($xml->chart_data->row[$i] as $item){
      echo $item."<br/>";
   }
}

You can also do more involved things like set up each row item in an array corresponding to the other row item. Since this XML file deals with a date that is related to a post number, it makes sense to create an array structure linking the two values. However, that’s a bit out of the scope of this article.

One last thing I need to cover is how you can access attributes within an XML tag itself. For this I will be using this feed as an example: http://deli.ckoma.net/stats/export_posts_daily (Sorry it appears this website no longer exists: 11-5-09). It’s a privately maintained XML file that contains the estimated number of links saved to Yahoo!’s del.icio.us bookmarking service per day. Here’s how the XML looks like for one day:

<stats>
    <stat date="2005-08-01" estimated_posts="34454" std_deviation="10034" tolerance_upper="50960" tolerance_lower="17948" recorded_posts="4290" tag_distribution="1681 939 649 382 234 173 94 54 21 17 8 5 5 0 2 4 10 1 1 8 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"/>
</stats>

To grab the date I would simply do:

$date = $xml->stat['date'];

But what if I wanted the date for the 5th stat element? Just add on a bracket and specify the element.

$date = $xml->stat[4]['date'];

With XML and most computer related things, counting starts at zero so to get the 5th element I used the number 4. Overall, to grab an attribute (such as the date I just showed) you simply use the bracket notation but instead of using a number within the brackets you type in the name of the attribute, wrapped in quotes. (You can also do that XPath stuff with @.)

Wrapup

Hopefully this gives you a good look into the world of XML parsing with PHP5’s SimpleXML extension. With the powerful ability to parse any XML file, you can start tinkering away at various APIs and mashups. I wrote this all while watching TV so let me know if you see any errors.

PaulStamatiou.com runs on the Thesis Theme for WordPress

How smart is your Theme?  How good is your support? Check out ThesisTheme for WordPress.

Thesis is the search engine optimized WordPress theme of choice for serious online publishers. If you’re a blogger who doesn’t understand a lot of PHP, Thesis will give a ton of functionality without having to alter any code. For the advanced, Thesis has incredible customization possibilities via Thesis hooks.

With so many design options, you can use the template over and over and never have it look like the same site. The theme is robust and flexible enough not only to accommodate a site like PaulStamatiou.com, but also to enable the site to run far more efficiently than it ever has before.

{ 10 trackbacks }

links for 2007-04-18 at Morad’s Bloggie
April 18, 2007 at 8:30 am
XML Parsing With PHP 5 on iface thoughts
April 18, 2007 at 1:38 pm
links for 2007-04-18 « kobak del.icio.us könyvjelzÅ‘i
April 18, 2007 at 7:21 pm
myWorld » Blog Archive » Paul lär oss PHP och SimpleXML
April 19, 2007 at 2:07 pm
developercast.com » Paul Stamatiou’s Blog: How To: Parse XML with PHP5
April 21, 2007 at 12:55 pm
links for 2007-04-23 « thund3rbox
April 22, 2007 at 8:28 pm
Plausible Accuracy » Blog Archive » How to integrate microblogs (Tumblr & Twitter) as a Wordpress Page
March 20, 2008 at 11:11 pm
Robitusin with codeine.
April 5, 2008 at 7:34 am
GeoIP: from an IP to a location + Dreamhost fopen limitations …
September 2, 2008 at 4:53 pm
Almost done… … … … … … …
July 15, 2009 at 3:29 pm

{ 36 comments… read them below or add one }

1 Dustin Bachrach April 18, 2007 at 12:00 am

Wow, great resource. Will really come in handy once I make the switch of to PHP5.

Reply

2 Marvin Sum April 18, 2007 at 2:13 am

What a coincidence, I was just trying to figure out how to use feedburner’s api when this article popped into my feed reader. Thanks!

Reply

3 Adam April 18, 2007 at 6:44 am

Great timeing Paul, many thanks Paul.

Reply

4 Abhijit Nadgouda April 18, 2007 at 7:32 am

PHP5 also introduces the class XMLReader which operates in streaming mode. This lets you implement pull parsing for huge XML documents.

Reply

5 Don Wilson April 18, 2007 at 3:58 pm

PHP5 has probably the best easy XML processor available. I love PHP 5 =)

Reply

6 yoda April 19, 2007 at 1:50 am

Great stuff! What about GeoRSS? Seems like it too would be a good resource for the given example.

Reply

7 Henry April 19, 2007 at 9:59 pm

Awesome guide, Paul. If only my host had PHP5 — SimpleXML looks great!

Reply

8 Justin Henry April 25, 2007 at 10:33 pm

If you do happen to be stuck with PHP4 and thus need to use a parsing library, I’ve found that the PEAR XML_Serializer package to be pretty interchangeable. For example, unserializer will give me data that I can handle in pretty much the same fashion as that delivered by SimpleXML.

Reply

9 James Cassell May 12, 2007 at 10:18 pm

I wish my host had PHP5. I’m building my own parser becaue I’m stuck with PHP4. It is very basic, but tailored to my exact needs.

Reply

10 devdaslover May 20, 2007 at 10:41 am

Hello , you have a great blog here! I’m definitely going to bookmark you ………..

Reply

11 rupert May 29, 2007 at 5:33 pm

Brilliant guide, covers everything I needed to know unlike some other guides elsewhere. Thanks!

Reply

12 Iain Grant June 2, 2007 at 6:26 pm

After searching for 3 or 4 days, you’ve solved every question I could have thought of in one page. Thank you so much.

Reply

13 Paul Stamatiou June 3, 2007 at 4:44 am

Glad I could be of help Iain!

Reply

14 Andrew June 11, 2007 at 5:26 pm

Thanks, Paul… You saved me hours of searching and testing with this post…

Reply

15 ers35 July 26, 2007 at 10:52 pm

Thank you very much for this guide. I had looked everywhere for the information I needed and only found it here.

May I recommend you not use the curly apostrophes in your code examples? They are not copy and paste friendly and will not execute.

Reply

16 Paul Stamatiou August 3, 2007 at 3:48 pm

@ers35 – thanks for the pointer, I usually avoid that with a web tool called Postable that makes my code webfriendly, but it appears that I forgot to use it with this post.

Reply

17 Jim August 25, 2007 at 12:57 am

Thank you so much! This has helped me get started on a rather malformed xml from an API.

One question I have is how to get a little deeper attribute? I can get the “temp” example you gave above. What about this?

<first>
<second>
<third this="something" />
<third this="something else" />
</second>
<second>
<third this="another something" />
<third this="something else still" />
</second>
</first>

I’m trying to get those third somethings. How do I know which “second” set I’m in, and how can I loop through each of these?

Many thanks!

Reply

18 Albert September 5, 2007 at 4:52 am

Hello, I have a problem in receiving XML Post in PHP. I have a script that would receive an XML Post, but I do not know how to get or transfer the xml to a variable so I can parse it. Any help would be appreciated.

Reply

19 Matthom September 25, 2007 at 9:58 am

Hey quick question – not sure if I saw this in the article above – how do you parse elements, for example?

Within my foreach loop, I tried {$item -> media:thumbnail}, but that bombs out.

Any ideas?

Reply

20 Matthom September 25, 2007 at 9:59 am

Woops forgot to encode the tag:

<media:thumbnail>

Reply

21 Ernast April 19, 2008 at 4:26 am

Many THANKS !!!!

Reply

22 Dayananda MR October 13, 2008 at 12:33 am

Thanks buddy!!
It is really very helpful info…

Reply

23 Dro Buddy November 23, 2008 at 4:03 am

Thanks for this great guide! You just helped me finish my first API project ever… Which I had been laxly working on for months! Truly, thank you very much…

Peace.

Reply

24 Chris January 19, 2009 at 10:42 pm

Excellent write up. Switching from PHP 4 to 5 wasn’t as straight forward as I thought it would be.

Reply

25 Jason January 22, 2009 at 8:57 pm

Hey there,

I’ve been working on XML -> XSL -> HTML for freaking hours and it blows. I ended up going back to trying to get XML -> PHP ->HTML.

Your guide has helped me A LOT. W3 Schools guide can goto hell. PHP.net can also goto hell. Your tutorial is much simpler and easier to understand.

Thanks for your help…

-Jason

Reply

26 Conrad January 26, 2009 at 3:11 pm

This is a great tutorial! Thanks! I have run into a stumbling block, though. When I pull numerical data from my xml file, I only get the digits before the decimal point. Could it be related to how the data formatted in the xml file?

the code to set the variable looks like this:

$myNumber = $xml->Row[i]->Cell[9]->Data;

where one of the many rows looks like this:

Kentucky
8.6
7.48
7.34
6.58
5.34
4.33
--
--
6.5
5.57

Any guesses on what I am doing wrong?

TIA -

Conrad

Reply

27 Chris April 20, 2009 at 4:29 pm

Thanks a ton this is a great introductory argument to XML.

I think you should revisit this and discuss XML attributes. Parsing them is fairly difficult without a starting point. So far I can’t find any real good help on this.

For example multiple attributes on a node.

I’ve seen alot more of this.

Reply

28 Paul May 1, 2009 at 8:51 pm

Great tutorial–thanks for sharing.

I think I have brain lock: how do you receive POST data in an XML files/stream? I have a PDF form that POST’s an XML stream and for the life of me I cant figure out how to get it. var_dump($_POST) returns nothing…or:
$data = $_POST;
$xml = simplexml_load_file($data) or die(“feed not loading”);
//returns feed not loading

Reply

29 Paul Stamatiou May 1, 2009 at 9:09 pm

I believe you’ll need to do something like $_POST['name'] on your first line, with name being the name of the field for your form. more info: http://www.w3schools.com/php/php_post.asp

Reply

30 Jonathan Lyon May 2, 2009 at 11:23 am

Thanks! Very Very Useful. This will help me enormously!

Thanks

Jonathan]

Reply

31 ContentChris May 11, 2009 at 7:33 am

This tutorial obviously is good. No complaints :)

However I’ve run into a problem reading tags which include dashes (-).
$extracted_data['original_title'] = $xml->original_title works like a charm, however $extracted_data['original_title'] = $xml->original-title won’t extract the value.

Does anyone have a workaround/fix for this so tags which have dashes in them are readable ?

Thanks in advance.

Reply

32 γιωργος ταρσουλης May 26, 2009 at 4:24 pm

σας ευχαριστω παρα πολυ για ολη την βοηθεια που συνησφαιρατε

Reply

33 Mark August 6, 2009 at 10:45 am

http://www.yappydo.com
Been trying to get through complicated XML file.

Thanks for making me go back to simpleXML

Reply

34 miah October 14, 2009 at 8:51 pm

Is this line $date = $xml->stat['date']; supposed to have the > in there, and if so, what is that for? I’ve never seen syntax like that in PHP 5 before. I guess it is the HTML code for the < (greater than) character, but that seems weird to me so thought I would ask.

Reply

35 miah October 14, 2009 at 8:52 pm

Ah nevermind it was your word press thing or whatever I guess, it’s supposed to be -> as the object syntax. Duh, my bad.

Reply

36 Philip Arthur Moore October 20, 2009 at 7:33 pm

@ContentChris: I ran into the same problem you were having and found this handy tip (link). Use dashes by putting curly braces around your tag names, for example:

$authorname = $xml->commit->{'author-name'};

Otherwise, when you use terms without dashes, just call them like this:

$message = $xml->commit->message;

Reply

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Previous post:

Next post: