dhtmlparser API

Most important function here is parseString(), which is used to process string and create Document Object Model.

class dhtmlparser.StateEnum[source]
content = 0
tag = 1
parameter = 2
comment = 3

Return first element from inp_data, or raise StopIteration.


This function was created because it works for generators, lists, iterators, tuples and so on same way, which indexing doesn’t.

Also it have smaller cost than list(generator)[0], because it doesn’t convert whole inp_data to list.

Parameters:inp_data (iterable) – Any iterable object.
Raises:StopIteration – When the inp_data is blank.
dhtmlparser.parseString(txt, cip=True)[source]

Parse string txt and return DOM tree consisting of single linked HTMLElement.

  • txt (str) – HTML/XML string, which will be parsed to DOM.
  • cip (bool, default True) – Case Insensitive Parameters. Use special dictionary to store HTMLElement.params as case insensitive.

Single conteiner HTML element with blank tag, which has whole DOM in it’s HTMLElement.childs property. This element can be queried using HTMLElement.find() functions.

Return type:


dhtmlparser.makeDoubleLinked(dom, parent=None)[source]

Standard output from dhtmlparser is single-linked tree. This will make it double-linked.

  • dom (obj) – HTMLElement instance.
  • parent (obj, default None) – Don’t use this, it is used in recursive call.

Remove all tags from dom and obtain plaintext representation.

Parameters:dom (str, obj, array) – str, HTMLElement instance or array of elements.
Returns:Plain string without tags.
Return type:str