surprise W3C has borked your XML app

Have you ever wondered why your beautiful XML application stopped working? Instead you always get the error “parsing error: Server returned HTTP response code: 503 for URL: http://www.w3.org….”? I guess you have never noticed that your app is constantly accessing the W3C pages for every opened XML file – neither have I.

At least until recently. As soon as your XML app does not work anymore – you will definitely notice!

What has happed?

W3C decided to block all Java and Python based XML libraries that keep fetching a lot of DTD (Wikipedia) fom W3C sites whenever you open any XML file. (see W3C fights DTD traffic)
I do understand their reason. Even Apache’s XML parser xerces loads all DTDs all the time instead of loading them from a local copy. But they could have moved all DTDs to a subdomain and could work with Akamai to serve them.

They must have activated their server changes recently because my app has definitely worked in January.

  • Why is there no transition plan in place to mitigate the impact on XML
    apps?
  • Why have the XML parser vendors failed to incorporate a local copy of
    the DTDs into their libraries – the DTDs haven’t changed for years!
  • Why is there no download (zip) file for all necessary DTDs?

What to do

1) Use a squid-proxy with squirm to
redirect every DTD traffic to your own server where you have saved a
copy of all needed DTDs.
Unfortunately I have not found a comprehensive list of available
XML-DTDs nor any archive file to download them.
So you have to use “trial and error” to fetch all necessary files for your
application. I have done it for XHTML and you may download a ZIP file of
it (may be incomplete).

Be careful to set a proxy server at the command line of your java app
by adding: “-Dhttp.proxyHost=PROXY -Dhttp.proxyPort=PROXYPORT”
I have never managed to set these values utilizing
“System.setProperty(…);”

Use the follwing rules with squirm:

regexi .*/DTD/(xhtml[-_a-z0-9A-Z]+\.(dtd|mod|ent))$ http://www.yourserver.com/your-dtds-copy/\1 ^http://www.w3.org/

regexi .*/TR/xhtml-basic/(xhtml[-_a-z0-9A-Z]+\.(dtd|mod|ent))$ http://www.yourserver.com/your-dtds-copy/\1 ^http://www.w3.org/

2) Or use a catalog file for your XML parser if you are able to manipulate the
source code of your app.

(nr)

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

Please log in to WordPress.com to post a comment to your blog.

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.