dhtmlparser API

Most important function here is parseString(), which is used to process string and create Document Object Model.

class dhtmlparser.StateEnum[source]
content = 0
tag = 1
parameter = 2
comment = 3
dhtmlparser.first(inp_data)[source]

Return first element from inp_data, or raise StopIteration.

Note

This function was created because it works for generators, lists, iterators, tuples and so on same way, which indexing doesn’t.

Also it have smaller cost than list(generator)[0], because it doesn’t convert whole inp_data to list.

Parameters:inp_data (iterable) – Any iterable object.
Raises:StopIteration – When the inp_data is blank.
dhtmlparser.parseString(txt, cip=True)[source]

Parse string txt and return DOM tree consisting of single linked HTMLElement.

Parameters:
  • txt (str) – HTML/XML string, which will be parsed to DOM.
  • cip (bool, default True) – Case Insensitive Parameters. Use special dictionary to store HTMLElement.params as case insensitive.
Returns:

Single conteiner HTML element with blank tag, which has whole DOM in it’s HTMLElement.childs property. This element can be queried using HTMLElement.find() functions.

Return type:

obj

dhtmlparser.makeDoubleLinked(dom, parent=None)[source]

Standard output from dhtmlparser is single-linked tree. This will make it double-linked.

Parameters:
  • dom (obj) – HTMLElement instance.
  • parent (obj, default None) – Don’t use this, it is used in recursive call.
dhtmlparser.removeTags(dom)[source]

Remove all tags from dom and obtain plaintext representation.

Parameters:dom (str, obj, array) – str, HTMLElement instance or array of elements.
Returns:Plain string without tags.
Return type:str