dhtmlparser API¶

Most important function here is parseString(), which is used to process string and create Document Object Model.

class dhtmlparser.StateEnum[source]¶

dhtmlparser.first(inp_data)[source]¶

Return first element from inp_data, or raise StopIteration.

Note

This function was created because it works for generators, lists, iterators, tuples and so on same way, which indexing doesn’t.

Also it have smaller cost than list(generator)[0], because it doesn’t convert whole inp_data to list.

Parameters:	inp_data (iterable) – Any iterable object.
Raises:	`StopIteration` – When the inp_data is blank.

dhtmlparser.parseString(txt, cip=True)[source]¶

Parse string txt and return DOM tree consisting of single linked HTMLElement.

Parameters:	txt (str) – HTML/XML string, which will be parsed to DOM. cip (bool, default True) – Case Insensitive Parameters. Use special dictionary to store `HTMLElement.params` as case insensitive.
Returns:	Single conteiner HTML element with blank tag, which has whole DOM in it’s `HTMLElement.childs` property. This element can be queried using `HTMLElement.find()` functions.
Return type:	obj

dhtmlparser.makeDoubleLinked(dom, parent=None)[source]¶

Standard output from dhtmlparser is single-linked tree. This will make it double-linked.

Parameters:	dom (obj) – `HTMLElement` instance. parent (obj, default None) – Don’t use this, it is used in recursive call.

dhtmlparser.removeTags(dom)[source]¶

Remove all tags from dom and obtain plaintext representation.

Parameters:	dom (str, obj, array) – str, HTMLElement instance or array of elements.
Returns:	Plain string without tags.
Return type:	str

Submodules¶