module HTree

Public Class Methods

parse(input) click to toggle source

HTree.parse parses input and return a document tree. represented by HTree::Doc.

input should be a String or an object which respond to read or open method. For example, IO, StringIO, Pathname, URI::HTTP and URI::FTP are acceptable. Note that the URIs need open-uri.

HTree.parse guesses input is HTML or not and XML or not.

If it is guessed as HTML, the default namespace in the result is set to www.w3.org/1999/xhtml regardless of input has XML namespace declaration or not nor even it is pre-XML HTML.

If it is guessed as HTML and not XML, all element and attribute names are downcaseed.

If opened file or read content has charset method, HTree.parse decode it according to $KCODE before parsing. Otherwise HTree.parse assumes the character encoding of the content is compatible to $KCODE. Note that the charset method is provided by URI::HTTP with open-uri.

# File htree/parse.rb, line 34
def HTree.parse(input)
  HTree.with_frozen_string_hash {
    parse_as(input, false)
  }
end
parse_xml(input) click to toggle source

HTree.parse_xml parses input as XML and return a document tree represented by HTree::Doc.

It behaves almost same as HTree.parse but it assumes input is XML even if no XML declaration. The assumption causes following differences.

  • doesn’t downcase element name.

  • The content of <script> and <style> element is PCDATA, not CDATA.

# File htree/parse.rb, line 48
def HTree.parse_xml(input)
  HTree.with_frozen_string_hash {
    parse_as(input, true)
  }
end

Public Instance Methods

==(other) click to toggle source

compare tree structures.

# File htree/equality.rb, line 10
def ==(other)
  check_equality(self, other, :usual_equal_object)
end
Also aliased as: eql?
eql?(other)
Alias for: ==
hash() click to toggle source

hash value for the tree structure.

# File htree/equality.rb, line 16
def hash
  return @hash_code if defined? @hash_code
  @hash_code = usual_equal_object.hash
end