Have you ever wondered why your beautiful XML application stopped working? Instead you always get the error “parsing error: Server returned HTTP response code: 503 for URL: http://www.w3.org….”? I guess you have never noticed that your app is constantly accessing the W3C pages for every opened XML file – neither have I.
At least until recently. As soon as your XML app does not work anymore – you will definitely notice!
What has happed?
W3C decided to block all Java and Python based XML libraries that keep fetching a lot of DTD (Wikipedia) fom W3C sites whenever you open any XML file. (see W3C fights DTD traffic)
I do understand their reason. Even Apache’s XML parser xerces loads all DTDs all the time instead of loading them from a local copy. But they could have moved all DTDs to a subdomain and could work with Akamai to serve them.
They must have activated their server changes recently because my app has definitely worked in January.
- Why is there no transition plan in place to mitigate the impact on XML
apps? - Why have the XML parser vendors failed to incorporate a local copy of
the DTDs into their libraries – the DTDs haven’t changed for years! - Why is there no download (zip) file for all necessary DTDs?
What to do
1) Use a squid-proxy with squirm to
redirect every DTD traffic to your own server where you have saved a
copy of all needed DTDs.
Unfortunately I have not found a comprehensive list of available
XML-DTDs nor any archive file to download them.
So you have to use “trial and error” to fetch all necessary files for your
application. I have done it for XHTML and you may download a ZIP file of
it (may be incomplete).
Be careful to set a proxy server at the command line of your java app
by adding: “-Dhttp.proxyHost=PROXY -Dhttp.proxyPort=PROXYPORT”
I have never managed to set these values utilizing
“System.setProperty(…);”
Use the follwing rules with squirm:
regexi .*/DTD/(xhtml[-_a-z0-9A-Z]+\.(dtd|mod|ent))$ http://www.yourserver.com/your-dtds-copy/\1 ^http://www.w3.org/
regexi .*/TR/xhtml-basic/(xhtml[-_a-z0-9A-Z]+\.(dtd|mod|ent))$ http://www.yourserver.com/your-dtds-copy/\1 ^http://www.w3.org/
2) Or use a catalog file for your XML parser if you are able to manipulate the
source code of your app.
- http://www.sagehill.net/docbookxsl/UseCatalog.html
- http://xml.apache.org/commons/components/resolver/resolver-article.html#ctrlresolver
(nr)