Spell checking html pages is not the same as performing a spell check on a plain text document.
Html pages may look just like any other text document on screen.
But the html source code contains many markup tag elements in addition to the text.
The tag elements must be separated from the visible text, and in some cases, must still be spell checked.
Additionally, some text structures must be ignored because they are frequently composed of arbitrary strings.
To understand the results of the spell check service, the following points should be kept in mind:
- text in url formats is ignored, more particularly, unbroken text prefixed by a protocol identifier is ignored, as are dotted names
- tags enclosing non-visible and non-indexed strings are ignored, for example script tags, object tags and comment markers
- text enclosed by address tags is also ignored because they will frequently contain names, zip codes, and telephone numbers
- text in tags with language significance on the internet is spell checked, examples include the meta description tags, meta keywords tags, alt attributes in image tags, and title attributes in anchor tags
- html character entities are fully expanded and substituted before spell checking
- spell check jobs specify a character encoding such as ISO-8859-1 to ensure that the correct character substitutions are made during processing.
the character encoding as well as the dictionary language are sent as part of the http request headers so that web servers that use them have the information available.
This specialised handling of html documents aids in the reduction of false alerts.
In turn, this makes the spell check reports more accurate and useful to the human editor.