Friday, March 21, 2008

PHP and XML: Parsing RSS 1.0

A Brief Tour Of RSS 1.0

RSS (previously stood for Rich Site Summary developed by Netscape, but now refers to RDF Site Summary, an updated and XML-compliant version of the Netscape technology) is an XML document format intended to describe, summarize, and distribute the contents of a Web site as a 'channel'. Sites such as MoreOver.com and O'Reilly's Meerkat process RSS feeds provided by news and other content sites and provide combined headline newsfeed services. RSS is currently developed by the RSS-DEV Working Group.

As with most XML document formats, the meaning of the document can be gleaned fairly easily simply by looking over a sample document. SitePoint.com provides summaries of its front-page articles in RSS format at http://www.sitepoint.com/rss.php. If you are using Internet Explorer 5 or later, you can view the current version of this XML document directly in your browser. For everyone else, here is the current SitePoint.com RSS file at the time of this writing:


xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/">


SitePoint.com
Master the Web!
http://www.sitepoint.com/














Escape Search Engine Caching
Did you know that many search engines cache your pages?
While this practice can speed up a search, users might not see your
most recent site updates! Ralph shows how you can stop search engines
caching your pages.

http://www.PromotionBase.com/article/551



Add JavaScript to Fireworks
Does your design need more pizazz? Add interactivity to
your site without learning JavaScript! Matt explains the creation of
JavaScript effects in Fireworks, and explores in detail the use of
this program's tools.

http://www.WebmasterBase.com/article/541



eMail Campaigns in 8 Steps - Part 2
Ok, so you've reeled in your prospects and they're on
your mailing list. Now what? How do you communicate effectively, and
turn them into customers? Jason reveals all...

http://www.eCommerceBase.com/article/552



The Need for a Written Website Contract
A written agreement is essential if you pay others to
design, build or maintain your Websites. Ivan explains the necessity
of contracts to those who work on the Web.

http://www.eCommerceBase.com/article/505



Search Engine Strategies 2001 - Conference Report
Sinewave Interactive's Gavin Appel talks to Matt about
this year's Search Engine Strategies conference. He outlines the
discussions and predictions of industry leaders.

http://www.PromotionBase.com/article/556



Better eCommerce Questionnaire
Overhaul your ecommerce strategy now! Face up to the
tough questions with Lee, as he guides you through a simple process
to optimize your ecommerce strategy.

http://www.eCommerceBase.com/article/508




As you can see, the file begins with a tag that contains the title, description, and URL of the site that the RSS file describes as well as a list of the that the channel currently contains. This tag is then followed by an tag for each of the articles that appear of the front page of SitePoint.com. For each, the title, description, and URL are provided. It should be noted that this is a bare-bones RSS file -- many sites make use of standard extensions to the RSS format to include things like author names, images, and publication dates for the items in their channel, but for the purposes of this article this basic RSS file will do.

Now, since most Web browsers can't read XML pages and the browsers that can only display the code of the page (Internet Explorer 5+) or the textual portions of the page (Netscape 6+) by default, you need some intermediate technology to convert this RSS document into something presentable if you want to display it to users. Other possibilities include reading the file and storing the headlines into a database, or emailing subscribed users if particular keywords appear in the descriptions of new articles. In any case, you're going to need something that can read XML. Of the many options available in this arena, this article will examine the use of PHP to parse an XML document.

No comments: