Writing a DTD

Home

Reading assignments

Programming Assignments

Resources

Syllabus

Slide Sets

Final Project Ideas

Updates


List of useful resources for Document Engineering and text encoding

Write a DTD for journal articles based on the Readings for this week. You should also document the DTD you write (block comments will suffice). DTD documentation needs to address several issues. First the description of a tag should give enough information that parson attempting to apply the DTD can recognize when the tag should be used. In cases where tag-abuse seems especially likely, or will have particularly bad effects, an explicit mention of how not to tag something may be in order.

For some elements, and especially in the case of attributes, there are often data normalization issues to address. For example, bibliographic databases are less useful when names are not consistently ananlyzed. Information intended for machine processing (such as dates and numbers) may need to be encoded in a precise format, and this may need to be described.

Sometimes you may want encoders to preserve inconsistent or incorrectly normalized data, but supplement it with a controlled value of use to a mechanical processor. Most journals, for instance, have a unique identifier (the ISSN, or International Standard Serial Number), and author-created abbreviations of journal titles might be preserved and supplemented by an attribute containing the ISSN.

You should think about the opportunities and also the dangers, of automatically generated text in marking structural divisions and cross-references. While these kinds of feature can enhance the consistency of a document's presentation in different media, it can also get in the way of an author's freedom to use alternative phrasing. Punctuation is one area where text-generation solutions often create difficulties.

Posted: 29 Jan