Reading a RSS feed from PHP

I wanted to insert a feed of this blog on Keyboard Playing’s homepage. I did it quite simply. Here’s how…

Understanding XML in PHP

Parsing

This is actually much more straightforward than I thought it would be…

The main thing to know is that PHP comes with an object to quickly parse an XML element: SimpleXMLElement.

How to do this? First, let us imagine we have the following XML:

Quite basic, but enough for the purpose. Now, let us imagine this XML is stored as a string in a variable named $xml_string. The main thing to parse it is to write the following:

There, you are done! Was it hard?

Understanding

OK, some more explanation may be welcome: the XML is now an object. Calling print_r on it would output the following:

So, $xml->item is an array of SimpleXMLElement instances. Each of these instance have one field called attr.

Exploiting

Quick example: to print each <attr> code, you would just need to do the following:

Notice the root level does not show in the $xml object.

Now let us put this to use with a RSS feed.

Application to RSS feed

Loading the feed

Loading a URL’s content into a PHP string? There’s a function for that:

Now we have it, we just have to print it.

Printing the feed

We actually showed how to do that, at the end of the XML part. Why, a RSS is just another XML, structured as follows:

So you just have to adapt what we wrote earlier and we are good to go!

Putting it all together

So now we display a list of titles which link to the original article.

But what of the publication date, and the author?

Please, one problem at a time…

The date problem (or RFC822 dates in PHP)

The date in RSS must be in the RFC822 format. What is that? Well, in RFC822, now is Sat, 01 Dec 2012 17:27 +0200.

Though this is quite pleasant to read for a human, it is quite difficult to work with programmatically.

Fortunately, turning it into a date is straightforward enough:

Why a problem, then?

One problem though: seconds are not mandatory. In several versions of PHP, if not set, the current seconds will be used instead. Hence, the date time you will depend on the time you called the function.

An example will be clearer:

will output:

Could be problematic if you want to now which entries have already been read, for instance.

Solution

Straightforward enough once more. The manual of strtotime provides it: an optional argument for strtotime. Quickly adapt, and you’re good to go…

will output:

Now, onto the next…

The creator problem (or namespaces and SimpleXMLElement)

Quickly put: the author of a page is stored as <dc:creator/>. But there is no way to access $item->dc:creator in XML…

The easiest way is to get into a distinct object all children belonging to a specified namespace.

So, what is the dc namespace?

Having this URL, all that is left to do is the following:

Now for the final RSS printer…

Just assembling the bricks (I love LEGO):

This should give you all the tools you need to go further…

Sources

Some useful resources which got me along the way…

Published by

Cyrille Chopelet

Programming addict, UX philosopher, casual gamer, sci-fi enthusiast, hi-tech dilettante, ... Some people even call me a geek.

"Wit beyond measure is man's greatest treasure." − Rowena Ravenclaw