Adapting the XHTML Simple schema for use in validating weblog comments

Jump to menu

23 August 2003

Comment validation using a modified (see below) XHTML Simple schema. Test it, break it, tell me about it.

Yesterday I briefly mentioned the idea of using a schema to validate weblog comments. It’s only really useful if you use XHTML on your site, but hey, that’s one of the benefits of using an XML format — you pays your money, you takes your tools.

Now I’m sure I’m not the first person to have thought of this, so I searched for an implementation. If there are any out there, they’ve been a bit backward about coming forward, and so I’ve built my own. In case of disaster (I have just installed a hard drive that dwarfs its new partner), here’s what I did at the schema level. The actual VBScript code is incredibly simple once the schema’s right.

  1. Take one XHTML Simple schema. It’s modular, so make sure to get all of the files.
  2. Whoah, what the hell does this mean? Let’s start with something easy: we don’t need html, body, or head, so remove them from structure.xsl.
  3. …and if you don’t have them, then link and title aren’t very useful.
  4. That’s right, we’re wrapping the comment in a div for validation. Unlike entries in RSS feeds, comments are divs. If you’ve got some kind of weird system — and I’m not talking mailing lists here — where they’re shown on individual pages, then perhaps they aren’t. But really this doesn’t matter.
  5. Remove div from the block elements allowed: what’s the point? Also get rid of the semantically-dead span.
  6. While we’re at it, no need for class or id attributes, they’re bound to be bad news.
  7. Headings aren’t such a great idea, either.
  8. Oh, and the address element isn’t that useful.
  9. XHTML Simple is nice, but we want more. If we’re assuming our readers have a weblog of their own, then they’ll know something about markup.
  10. Give a its full complement of attributes: hreflang is of type language; rel and rev are more complex. Sean Palmer, who wrote the XHTML Simple schema, seemingly never received an answer to his question about how to reference them, so we’ll fake it.
  11. One new simpleType later, with a NMTOKEN restriction and enumerations listing the link types and we’re done.
  12. Add abbr, acronym, cite, code, del, dfn, ins, kbd, samp, and var to the ‘inlinenoht’ (no hypertext) group.
  13. (To make the elements, I found it easier to declare a ‘phrase.content’ group and a ‘phrase.attlist’ attributeGroup to reference in the element declarations.)
  14. Many wasted words later, that appears to be that. Let’s go!
  15. ‘What the fuck do you mean, “Undeclared XSD type” ? uriReference is a type, damnit.’
  16. Change all instances instance of uriReference to anyURI .
  17. Try again.
  18. Doesn’t work.
  19. Spend many hours trying.
  20. ‘A-ha. So a choice only works like I want it to when you feel like it, Mr Parser.’
  21. Copy the contents of the ‘inlinenoht’ choice into the inline one, then change it to all.
  22. Rest. All is done.
  23. No it isn’t — all won’t work, silly. Change back to choice, set minOccurs and maxOccurs on that.
  24. Doesn’t work. MSXML complains that ‘The minOccurs attribute is not supported in this context.’ Is too.
  25. Still doesn’t work.
  26. Shake fist at screen, yelling ‘you win for now, MSXML, but I’ll get you next time.’
  27. Go crazy with complexTypes and substitutionGroups.
  28. Note that lis can only contain inline elements.
  29. Don’t even go there.
  30. Test it with this entry: it works! Hooray! Clearly there are no bugs.

I fucking hate XML schemas. That was a waste of a few hours I could have spent punching myself in the face.