In ordinary English, a schema is defined as "an outline or image universally applicable to a general conception, under which it is likely to be presented to the mind; as, five dots in a line are a schema of the number five; a preceding and succeeding event are a schema of cause and effect" (Websters).
The English language definition of schema does not really apply to XML schema languages. Most of the schema languages are too complex to "present to the mind" or to a program the instance documents that they describe, and, more importantly and less subjectively, they often focus on defining validation rules more than on modeling a class of documents.
All XML schema languages define transformations to apply to a class of instance documents. XML schemas should be thought of as transformations. These transformations take instance documents as input and produce a validation report, which includes at least a return code reporting whether hhe document is valid and an optional Post Schema Validation Infoset (PSVI), updating the original document's infoset (the information obtained from the XML document by the parser) with additional information (default values, datatypes, etc.)
One important consequence of realizing that XML schemas define transformations is that one should consider general purpose transformation languages and APIs as alternatives when choosing a schema language.
Before we dive into the features of XML schema languages, I'd like to step back and look at the downsides of the use of any schema language.
One of the key strengths of XML, sometimes called "late binding," is the decoupling of the writer and the reader of an XML document: this gives the reader the ability to have its own interpretation and understanding of the document. By being more prescriptive about the way to interpret a document, XML schema languages reduce the possibility of erroneous interpretation but also create the possibility of unexpectedly adding "value" to the document by creating interpretations not apparent from an examination of the document itself.
Furthermore, modeling an XML tree is very complex, and the schema languages often make a judgment on "good" and "bad" practices in order to limit their complexity and consequent validation processing times. Such limitations also reduce the set of possibilities offered to XML designers. Reducing the set of possibilities offered by a still relatively young technology, that is, premature optimization, is a risk, since these "good" or "bad" practices are still ill-defined and rapidly evolving.
The many advantages of using and widely distributing XML schemas must be balanced against the risk of narrowing the flexibility and extensibility of XML.
A document conforming to a particular schema is said to be valid, and the process of checking that conformance is called validation. We can differentiate between at least four levels of validation enabled by schema languages:
-
The validation of the markup -- controlling the structure of a document.
-
The validation of the content of individual leaf nodes (datatyping)
-
The validation of integrity, i.e. of the links between nodes within a document or between documents.
-
Any other tests (often called "business rules").
Validating markup and datatypes are the most powerful (or most dangerous, since they often imply a kind of modeling which limits diversity of the markup and datatypes). Link validation, especially between different documents, is poorly covered by the current schema languages.