Although one of our older projects, it remained our favourite for many years, and gives a great taste of the problem solving we like to do, and the projects that enthuse us!
Way back in 2002, the Arkansas Democrat-Gazette (a regional newspaper) used ActivePaper, a proprietary XML-based publishing platform that displayed all stories on the website using Flash and images, mimicking exactly a newspaper's layout. A keen advocate for accessibility, their technical director wanted to display the newspaper in an accessible and search-engine friendly form.
ActivePaper uses a convoluted XML schema and no documentation was available, so working out the meaning of the elements, and which were semantic vs presentational, had to be done by trial and error. At this time PHP 4 was relatively new and its support for parsing XML wasn't yet mature. After trying to use PHP's XML extension and encountering problems we chose to parse the files using regular expressions, an approach which worked successfully.
The code we wrote to parse the variety of XML files (Table of contents, page content & layout, individual articles etc) allowed a complete HTML version of the website to be produced. The code has continued in daily use across 4 regional newspapers, and to the best of our knowledge is still being used.
|Client:||North-West Akansas News|