lexer
that converts characters from a HTML page into a linear sequence of nodesparser
that provides a heirarchical document model of a HTML page
Several task lists are used to track the items that are not percieved as bugs, but are viewed by developers as things that need attention. The following list summarizes the purpose and target issues for each list.
The Request For Enhancement list contains items that are proposed for future versions of the parser. Users may add to this list what they feel are extensions beyond simple bug fixing. Some user entered bugs are also transferred to this list if the scope of the fix would be too significant a change for the current version, or involve API changes that need to be vetted against the current user community.
Package | Description |
---|---|
org.htmlparser |
The basic API classes which will be used by most developers when working with
the HTML Parser.
|
org.htmlparser.beans |
The beans package contains Java Beans using the HTML Parser.
|
org.htmlparser.filters |
The filters package contains example filters to select only desired nodes.
|
org.htmlparser.http |
The http package is responsible for HTTP connections to servers.
|
org.htmlparser.lexer |
The lexer package is the base level I/O subsystem.
|
org.htmlparser.lexerapplications.tabby |
The Tabby program is a demonstration of how to use the underlying Lexer
classes to perform file I/O.
|
org.htmlparser.lexerapplications.thumbelina |
Extract the images behind thumbnail images.
|
org.htmlparser.nodes |
The nodes package has the concrete node implementations.
|
org.htmlparser.parserapplications |
Example applications.
|
org.htmlparser.parserapplications.filterbuilder | |
org.htmlparser.parserapplications.filterbuilder.layouts | |
org.htmlparser.parserapplications.filterbuilder.wrappers | |
org.htmlparser.sax |
The sax package implements a SAX (Simple API for XML) parser for HTML.
|
org.htmlparser.scanners |
The scanners package contains classes responsible for the tertiary
identification of tags.
|
org.htmlparser.tags |
The tags package contains specific tags.
|
org.htmlparser.util |
Code which can be reused by many classes, is located in this package.
|
org.htmlparser.util.sort |
Provides generic sorting and searching.
|
org.htmlparser.visitors |
The visitors package contains classes that use the Visitor pattern.
|
HTML Parser is an open source library released under LGPL.