About the Specification

Reading the specification

This specification has been written for two groups of people; documentation authors and CHM capability implementors. We hope this specification will provide documentation authors all the resources they need to get the most out of its features and exploit the valuable undocumented features of HTML Help, whilst allowing them to skip over implementation details. Those who wish to implement alternative CHM compilers, decompilers and or viewers should find enough information to create such software.

This specification begins with a general introduction to HTML Help and proceeds to more specific technical information.

This specification has been designed for both screen and print media, with formatting and content tailored to the capabilities of the medium. For example in print media the location of external links are put in brackets after the text and internal links have chapter.section.subsection numbers instead. In electronic media various styles are used to convey contextual information.

All versions come with a tables of contents, a glossary and an index for quick access to information.

Organisation

This specification begins with a discussion of help systems in general, examines MS' help solutions and gives a breif introduction to HTML Help.

Following this is a broad discussion of the features of HTML Help.

A discussion of all the data files and MS' executable files involved in HTML Help follows this.

Following this is an in depth analysis of the formats of the data files.

This specification is concluded with miscellaneous topics and appendices.

Conventions

Please read the glossary as all abbreviations in the document that are used many times are defined there.

All binary data in all structures described in this document are little-endian, unless otherwise indicated.

Where part of a structure is variable length then the offset of the next item will be shown as +0.

BYTE, WORD & DWORD have their normal meanings (8, 16 & 32-bit integers) and QWORD indicates a 64-bit integer.

The FILETIME structure is a QWORD representing the number of 100-nanosecond intervals since January 1, 1601. This is one of three types of timestamps used in Windows.

The time_t structure is a DWORD representing the the number of seconds elapsed since midnight (00:00:00), January 1, 1970, coordinated universal time. This is the standard Unix timestamp, and is also used on Windows.

An LCID is a DWORD locale identifier made up of a language ID in the lower WORD and a sort ID in the upper WORD.

Bits Explanation
0-9 Primary Language ID
10-15 Secondary Language ID
16-19 Sort ID
20-31 Reserved

A GUID is a "Globally Unique IDentifier". 0x10 bytes, arranged as 1 DWORD, 2 WORDs, and 8 BYTEs. It is based on UUIDs from DCE and is a combination of a timestamp, A clock sequence and related persistent state to deal with retrograde motion of clocks, A forcibly incremented counter to deal with high-frequency allocations, The truly globally unique IEEE machine identifier, obtained from a network card (the implementation does not require a network card; if no network card is present, a machine identifier can be synthesized from highly variable machine states and stored persistently).

This specification is extensively hyperlinked, which ensures that unknown concepts and reference information are only a click away. For those who prefer to read this specification as printed onto dead trees (or Marijuana) there is an glossary and index, all intradocument hyperlinks have chapter.section.subsection numbers and all external hyperlinks have the URLs printed.

The table below indicates the styles used to indicate various things.

Example(s) Explanation
0x??000000, 0 (unknown), ?? This style indicates that the highlighted information is unknown. Please let us know if you decipher any of the unknown information.
Custom tab This style indicates the name of a variable in the [OPTIONS] section of the hhp file.
[WINDOWS] This style indicates the name of a section in the hhp file.
ShowWindow This style indicates a Win32 API.
MSDN This style indicates a link to an external site.
<foo@bar.org> This style indicates a link to an email address.
MVPs, 19??; Far, 19?? This style indicates a citation(s).

Contributions

Name Website Contact Contribution
Pabs http://zip.to/pabs3 <pabs3@zip.to> Editor. Also reversed the internal files, gathered info from the net and wrote all the sections other than the ITSF & LZX sections and cleaned up those two sections.
Matthew Russotto http://www.speakeasy.org/~russotto <mrussotto@speakeasy.net> Reversed the ITSF format & its LZX compression. Wrote the predecessor to the ITSF section.
Olivier Sannier http://obones.free.fr <obones@meloo.com> Provided docs on CHM Samples as used by the MSDN
Unknown http://www.vivid-creations.com Unknown Provided just enough info in a Usenet message to allow the overall structure of CHS files to be reversed.
Please send your fixes, suggestions, content & anything else to the Pabs

This document is copyright © Pabs et al. All Rights Reserved.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being "GNU Free Documentation License" and with no Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".