Package | Description |
---|---|
org.htmlparser |
The basic API classes which will be used by most developers when working with
the HTML Parser.
|
org.htmlparser.filters |
The filters package contains example filters to select only desired nodes.
|
org.htmlparser.lexer |
The lexer package is the base level I/O subsystem.
|
org.htmlparser.nodes |
The nodes package has the concrete node implementations.
|
org.htmlparser.parserapplications.filterbuilder | |
org.htmlparser.parserapplications.filterbuilder.wrappers | |
org.htmlparser.sax |
The sax package implements a SAX (Simple API for XML) parser for HTML.
|
org.htmlparser.scanners |
The scanners package contains classes responsible for the tertiary
identification of tags.
|
org.htmlparser.tags |
The tags package contains specific tags.
|
org.htmlparser.util |
Code which can be reused by many classes, is located in this package.
|
org.htmlparser.visitors |
The visitors package contains classes that use the Visitor pattern.
|
Modifier and Type | Interface | Description |
---|---|---|
interface |
Remark |
This interface represents a comment in the HTML document.
|
interface |
Tag |
This interface represents a tag (<xxx yyy="zzz">) in the HTML document.
|
interface |
Text |
This interface represents a piece of the content of the HTML document.
|
Modifier and Type | Method | Description |
---|---|---|
Node |
Node.getFirstChild() |
Get the first child of this node.
|
Node |
Node.getLastChild() |
Get the last child of this node.
|
Node |
Node.getNextSibling() |
Get the next sibling to this node.
|
Node |
Node.getParent() |
Get the parent of this node.
|
Node |
Node.getPreviousSibling() |
Get the previous sibling to this node.
|
Modifier and Type | Method | Description |
---|---|---|
boolean |
NodeFilter.accept(Node node) |
Predicate to determine whether or not to keep the given node.
|
void |
Node.setParent(Node node) |
Sets the parent of this node.
|
Modifier and Type | Field | Description |
---|---|---|
protected Node |
IsEqualFilter.mNode |
The node to match.
|
Modifier and Type | Method | Description |
---|---|---|
boolean |
AndFilter.accept(Node node) |
Accept nodes that are acceptable to all of its predicate filters.
|
boolean |
CssSelectorNodeFilter.accept(Node node) |
Accept nodes that match the selector expression.
|
boolean |
HasAttributeFilter.accept(Node node) |
Accept tags with a certain attribute.
|
boolean |
HasChildFilter.accept(Node node) |
Accept tags with children acceptable to the filter.
|
boolean |
HasParentFilter.accept(Node node) |
Accept tags with parent acceptable to the filter.
|
boolean |
HasSiblingFilter.accept(Node node) |
Accept tags with a sibling acceptable to the filter.
|
boolean |
IsEqualFilter.accept(Node node) |
Accept the node.
|
boolean |
LinkRegexFilter.accept(Node node) |
Accept nodes that are a LinkTag and have a URL
that matches the regex pattern supplied in the constructor.
|
boolean |
LinkStringFilter.accept(Node node) |
Accept nodes that are a LinkTag and
have a URL that matches the pattern supplied in the constructor.
|
boolean |
NodeClassFilter.accept(Node node) |
Accept nodes that are assignable from the class provided in
the constructor.
|
boolean |
NotFilter.accept(Node node) |
Accept nodes that are not acceptable to the predicate filter.
|
boolean |
OrFilter.accept(Node node) |
Accept nodes that are acceptable to any of its predicate filters.
|
boolean |
RegexFilter.accept(Node node) |
Accept string nodes that match the regular expression.
|
boolean |
StringFilter.accept(Node node) |
Accept string nodes that contain the string.
|
boolean |
TagNameFilter.accept(Node node) |
Accept nodes that are tags and have a matching tag name.
|
boolean |
XorFilter.accept(Node node) |
Accept nodes that are acceptable to an odd number of its predicate filters.
|
Constructor | Description |
---|---|
IsEqualFilter(Node node) |
Creates a new IsEqualFilter that accepts only the node provided.
|
Modifier and Type | Method | Description |
---|---|---|
protected Node |
Lexer.makeRemark(int start,
int end) |
Create a remark node based on the current cursor and the one provided.
|
protected Node |
Lexer.makeString(int start,
int end) |
Create a string node based on the current cursor and the one provided.
|
protected Node |
Lexer.makeTag(int start,
int end,
java.util.Vector attributes) |
Create a tag node based on the current cursor and the one provided.
|
Node |
Lexer.nextNode() |
Get the next node from the source.
|
Node |
Lexer.nextNode(boolean quotesmart) |
Get the next node from the source.
|
Node |
Lexer.parseCDATA() |
Return CDATA as a text node.
|
Node |
Lexer.parseCDATA(boolean quotesmart) |
Return CDATA as a text node.
|
protected Node |
Lexer.parseJsp(int start) |
Parse a java server page node.
|
protected Node |
Lexer.parsePI(int start) |
Parse an XML processing instruction.
|
protected Node |
Lexer.parseRemark(int start,
boolean quotesmart) |
Parse a comment.
|
protected Node |
Lexer.parseString(int start,
boolean quotesmart) |
Parse a string node.
|
protected Node |
Lexer.parseTag(int start) |
Parse a tag.
|
Modifier and Type | Class | Description |
---|---|---|
class |
AbstractNode |
The concrete base class for all types of nodes (tags, text remarks).
|
class |
RemarkNode |
The remark tag is identified and represented by this class.
|
class |
TagNode |
TagNode represents a generic tag.
|
class |
TextNode |
Normal text in the HTML document is represented by this class.
|
Modifier and Type | Field | Description |
---|---|---|
protected Node |
AbstractNode.parent |
The parent of this node.
|
Modifier and Type | Method | Description |
---|---|---|
Node |
AbstractNode.getFirstChild() |
Get the first child of this node.
|
Node |
AbstractNode.getLastChild() |
Get the last child of this node.
|
Node |
AbstractNode.getNextSibling() |
Get the next sibling to this node.
|
Node |
AbstractNode.getParent() |
Get the parent of this node.
|
Node |
AbstractNode.getPreviousSibling() |
Get the previous sibling to this node.
|
Modifier and Type | Method | Description |
---|---|---|
void |
AbstractNode.setParent(Node node) |
Sets the parent of this node.
|
Modifier and Type | Field | Description |
---|---|---|
protected Node |
HtmlTreeModel.mRoot |
The root
Node . |
Modifier and Type | Method | Description |
---|---|---|
boolean |
AndFilterWrapper.accept(Node node) |
Predicate to determine whether or not to keep the given node.
|
boolean |
HasAttributeFilterWrapper.accept(Node node) |
Predicate to determine whether or not to keep the given node.
|
boolean |
HasChildFilterWrapper.accept(Node node) |
Predicate to determine whether or not to keep the given node.
|
boolean |
HasParentFilterWrapper.accept(Node node) |
Predicate to determine whether or not to keep the given node.
|
boolean |
HasSiblingFilterWrapper.accept(Node node) |
Predicate to determine whether or not to keep the given node.
|
boolean |
NodeClassFilterWrapper.accept(Node node) |
Predicate to determine whether or not to keep the given node.
|
boolean |
NotFilterWrapper.accept(Node node) |
Predicate to determine whether or not to keep the given node.
|
boolean |
OrFilterWrapper.accept(Node node) |
Predicate to determine whether or not to keep the given node.
|
boolean |
RegexFilterWrapper.accept(Node node) |
Predicate to determine whether or not to keep the given node.
|
boolean |
StringFilterWrapper.accept(Node node) |
Predicate to determine whether or not to keep the given node.
|
boolean |
TagNameFilterWrapper.accept(Node node) |
Predicate to determine whether or not to keep the given node.
|
protected void |
HasAttributeFilterWrapper.addAttributes(java.util.Set set,
Node node) |
Add the attribute names from the node to the set of attribute names.
|
protected void |
HasAttributeFilterWrapper.addAttributeValues(java.util.Set set,
Node node) |
Add the attribute values from the node to the set of attribute values.
|
protected void |
TagNameFilterWrapper.addName(java.util.Set set,
Node node) |
Add the tag name and it's children's tag names to the set of tag names.
|
Modifier and Type | Method | Description |
---|---|---|
protected void |
XMLReader.doSAX(Node node) |
Process nodes recursively on the DocumentHandler.
|
Modifier and Type | Method | Description |
---|---|---|
protected void |
CompositeTagScanner.addChild(Tag parent,
Node child) |
Add a child to the given tag.
|
Modifier and Type | Class | Description |
---|---|---|
class |
AppletTag |
AppletTag represents an <Applet> tag.
|
class |
BaseHrefTag |
BaseHrefTag represents an <Base> tag.
|
class |
BodyTag |
A Body Tag.
|
class |
Bullet |
A bullet tag.
|
class |
BulletList |
A bullet list tag.
|
class |
CompositeTag |
The base class for tags that have an end tag.
|
class |
DefinitionList |
A definition list tag (dl).
|
class |
DefinitionListBullet |
A definition list bullet tag (either DD or DT).
|
class |
Div |
A div tag.
|
class |
DoctypeTag |
The HTML Document Declaration Tag can identify <!DOCTYPE> tags.
|
class |
FormTag |
Represents a FORM tag.
|
class |
FrameSetTag |
Identifies an frame set tag.
|
class |
FrameTag |
Identifies a frame tag
|
class |
HeadingTag |
A heading (h1 - h6) tag.
|
class |
HeadTag |
A head tag.
|
class |
Html |
A html tag.
|
class |
ImageTag |
Identifies an image tag.
|
class |
InputTag |
An input tag in a form.
|
class |
JspTag |
The JSP/ASP tags like <%...%> can be identified by this class.
|
class |
LabelTag |
A label tag.
|
class |
LinkTag |
Identifies a link tag.
|
class |
MetaTag |
A Meta Tag
|
class |
ObjectTag |
ObjectTag represents an <Object> tag.
|
class |
OptionTag |
An option tag within a form.
|
class |
ParagraphTag |
A paragraph (p) tag.
|
class |
ProcessingInstructionTag |
The XML processing instructions like <?xml ...
|
class |
ScriptTag |
A script tag.
|
class |
SelectTag |
A select tag within a form.
|
class |
Span |
A span tag.
|
class |
StyleTag |
A StyleTag represents a <style> tag.
|
class |
TableColumn |
A table column tag.
|
class |
TableHeader |
A table header tag.
|
class |
TableRow |
A table row tag.
|
class |
TableTag |
A table tag.
|
class |
TextareaTag |
A text area tag within a form.
|
class |
TitleTag |
A title tag.
|
Modifier and Type | Method | Description |
---|---|---|
Node |
CompositeTag.childAt(int index) |
Get child at given index
|
Node |
CompositeTag.getChild(int index) |
Get the child of this node at the given position.
|
Node[] |
CompositeTag.getChildrenAsNodeArray() |
Get the children as an array of
Node objects. |
Modifier and Type | Method | Description |
---|---|---|
int |
CompositeTag.findPositionOf(Node searchNode) |
Returns the node number of a child node given the node object.
|
Modifier and Type | Field | Description |
---|---|---|
protected Node |
NodeTreeWalker.mCurrentNode |
The current Node element, which will be a child of the root Node, or null.
|
protected Node |
NodeTreeWalker.mNextNode |
The next Node element after the current Node element.
|
protected Node |
NodeTreeWalker.mRootNode |
The root Node element which defines the scope of the current tree to walk.
|
Modifier and Type | Method | Description |
---|---|---|
Node |
NodeList.elementAt(int i) |
|
static Node[] |
ParserUtils.findTypeInNode(Node node,
java.lang.Class type) |
Search given node and pick up any objects of given type.
|
Node |
NodeTreeWalker.getCurrentNode() |
Get the Node in the tree that the NodeTreeWalker is current at.
|
protected Node |
NodeTreeWalker.getNextNodeBreadthFirst() |
Traverses to the next Node from the current Node using breadth-first tree traversal
|
protected Node |
NodeTreeWalker.getNextNodeDepthFirst() |
Traverses to the next Node from the current Node using depth-first tree traversal
|
Node |
NodeTreeWalker.getRootNode() |
Get the root Node that defines the scope of the tree to traverse.
|
Node |
IteratorImpl.nextNode() |
Get the next node.
|
Node |
NodeIterator.nextNode() |
Get the next node.
|
Node |
NodeTreeWalker.nextNode() |
Traverses to the next Node from the current Node, using either depth-first or breadth-first tree traversal as appropriate.
|
Node |
SimpleNodeIterator.nextNode() |
Get the next node.
|
Node |
NodeList.remove(int index) |
Remove the node at index.
|
Node[] |
NodeList.toNodeArray() |
Modifier and Type | Method | Description |
---|---|---|
void |
NodeList.add(Node node) |
|
boolean |
NodeList.contains(Node node) |
Check to see if the NodeList contains the supplied Node.
|
void |
NodeList.copyToNodeArray(Node[] array) |
|
static Node[] |
ParserUtils.findTypeInNode(Node node,
java.lang.Class type) |
Search given node and pick up any objects of given type.
|
int |
NodeList.indexOf(Node node) |
Finds the index of the supplied Node.
|
protected void |
NodeTreeWalker.initRootNode(Node rootNode) |
Sets the root Node to be the given Node.
|
void |
NodeList.prepend(Node node) |
Insert the given node at the head of the list.
|
boolean |
NodeList.remove(Node node) |
Remove the supplied Node from the list.
|
void |
NodeTreeWalker.setRootNode(Node rootNode) |
Sets the specified Node as the root Node.
|
Constructor | Description |
---|---|
NodeList(Node node) |
Create a one element node list.
|
NodeTreeWalker(Node rootNode) |
Creates a new instance of NodeTreeWalker using depth-first tree traversal, without limits on how deep it may traverse.
|
NodeTreeWalker(Node rootNode,
boolean depthFirst) |
Creates a new instance of NodeTreeWalker using the specified type of tree traversal, without limits on how deep it may traverse.
|
NodeTreeWalker(Node rootNode,
boolean depthFirst,
int maxDepth) |
Creates a new instance of NodeTreeWalker using the specified type of tree traversal and maximum depth from the root Node to traverse.
|
Modifier and Type | Method | Description |
---|---|---|
Node[] |
ObjectFindingVisitor.getTags() |
|
Node[] |
TagFindingVisitor.getTags(int index) |
HTML Parser is an open source library released under LGPL.