The file format of SGML documents is always that of an ASCII text. Such files can therefore be used universally.
The SGML markings (known as "tags") are used in these documents to store information about the structure of the contents (hierarchies), the meaning of the contents (description) and, occasionally, format information. Software recognizes the tags from the delimiting marks, usually "<" and ">". The permitted tags and structural relationships of an SGML document are specified in a DTD (Dokument Type Definition). Only if the markings in a document match these rules is it recognized as "valid". HTML, as is used for the present text, is an application of SGML. The German version of the start of this page, when seen as unformatted source text, looks like this:
<span class="blue">SGML</span><br>
<hr>
<br>
<h1>SGML</h1>
<h2>Standard Generalized Markup Language<br>
Standardisierte Allgemeine Auszeichnungssprache<br>
ISO 8879:1986</h2> <h2>Beschreibung</h2> <p>Das Dateiformat eines SGML-Dokuments ist grundsätzlich ASCII-Text.
Diese Dateien sind daher universell einsetzbar.</p>
<p>Mit den SGML-Auszeichnungen ("Tags" genannt) werden
in diesen Dokument Informationen über die Inhaltsstruktur
(Hierarchien), die Inhaltsbedeutung (Beschreibung) und seltener
Formatinformationen abgelegt. Die Tags erkennt eine Software an
den Begrenzungszeichen, normalerweise "<" und
">". Die zulässigen Tags und und
Strukturzusammenhänge eines SGML-Dokuments werden in einer DTD
(Dokument Type Definition) festgelegt. Nur wenn die Auszeichnungen
eines Dokuments diesen Regeln entsprechen, wird es als
"valide" (gültig) anerkannt. HTML, wie es für
diesen Text verwendet wird, ist eine Anwendung von SGML. Der Anfang
dieser Seite sieht im nicht formatierten Quelltext so aus:</p>
As this source text is German, you may be able to see that non-standard ASCII characters are converted to what are known as entities; an "ä", for instance, becomes an "ä".
An element generally consists of a starting tag ("<h2>"), and end tag ("</h2>"), and the content between them. The start tag may often contain other attributes that will have a value, "class", for instance, in "<span class="blue">" at the beginning of this example. The value of "class" is "blue", and its effect is to cause the browser to look in a stylesheet file for the format it should give to the content of "span".
Structural relationships are given less emphasis in HTML, since its aim is just to achieve a proper display in the internet browser. "<h1>" is just a format instruction to the browser, similar to the paragraph format of a word processing program. The fact that the following text actually "belongs" to it cannot be seen directly from the structure. 'Proper' SGML might look like this:
<span class="blue">SGML</span><br>
<hr>
<br>
<h1><title>SGML</title>
<p font=large>Standardized General Markup Language<br>
Standardisierte Allgemeine Auszeichnungssprache<br>
ISO 8879:1986</p> <h2><title>Beschreibung</title> <p>Das Dateiformat eines SGML-Dokuments ist grundsätzlich ASCII-Text.
Diese Dateien sind daher universell einsetzbar.</p>
<p>Mit den SGML-Auszeichnungen ("Tags" genannt) werden
in diesen Dokument Informationen über die ...
</p>...</h2>...</h1>
Here "h1" comprises the "title" and a sub-section, "h2". "h2" also contains a "title" and a paragraph element, "p". The name "title" immediately indicates that its content is a header.
Images are only included by reference. Complex layout for either printing or display is only generated by the application on the basis of the SGML markings.
The term "mark-up" originates with classic typesetting technology.
In traditional document marking, the copy editor writes instructions for the typesetter on the typescript about the typeface, size, type area and so on. This is done through annotations that are written at locations in the text where the format is to change.
Later, these markings were made in electronic form in accordance with special coding systems, specified by agreement between the publisher and the printing works, since almost every printer used its own marking scheme. The confusion became even greater as more are more authors began to prepare their documents with the aid of word processing systems; these, naturally, used coding/marking languages that differed from the typesetters' systems. Towards the end of the 1960s, therefore, more general marking languages, such as IBM Script or Tex were developed. In all these cases, the principle was to write specific markings in the form of formatting commands into the document. But even this method, due to the mixing of data and formatting commands, was too specialized to be an acceptable input for all typesetting systems. Efforts to develop a standard mark-up language were therefore continued.
At the end of the 1960s, Stanley Rice developed the idea of a universal catalog, with editorial structural markings. Norman Scharpf, Director of GCA (Graphic Communications Association) recognized the significance of this idea, and initiated a process that led to „GenCode(r) Concept“. It became recognized, on this basis, that modified markings were required for different kinds of document.
In 1969 Charles Goldfarb led a research project at IBM that was concerned with office information technology. Together with Edward Mosher and Raymond Lorie he developed the Generalized Markup Language (GML, not merely by coincidence the initials of the developers) for word processing, formatting and search systems. The formally defined rules, containing a nested structure, were recorded separately.
This idea then provided the foundation for an ANSI project under the title "Computer Languages for the Processing of Text". This developed multi-part standards for the entire process from input through to printing. The development was not restricted to the national level, and the section that was concerned with generic text marking was published in 1986 as ISO Standard 8879, with the title Standard Generalized Markup Language.