dhtmlparser API¶
Most important function here is parseString()
, which is used to process
string and create Document Object Model.
-
dhtmlparser.
first
(inp_data)[source]¶ Return first element from inp_data, or raise StopIteration.
Note
This function was created because it works for generators, lists, iterators, tuples and so on same way, which indexing doesn’t.
Also it have smaller cost than list(generator)[0], because it doesn’t convert whole inp_data to list.
Parameters: inp_data (iterable) – Any iterable object. Raises: StopIteration
– When the inp_data is blank.
-
dhtmlparser.
parseString
(txt, cip=True)[source]¶ Parse string txt and return DOM tree consisting of single linked
HTMLElement
.Parameters: - txt (str) – HTML/XML string, which will be parsed to DOM.
- cip (bool, default True) – Case Insensitive Parameters. Use special
dictionary to store
HTMLElement.params
as case insensitive.
Returns: Single conteiner HTML element with blank tag, which has whole DOM in it’s
HTMLElement.childs
property. This element can be queried usingHTMLElement.find()
functions.Return type: obj
-
dhtmlparser.
makeDoubleLinked
(dom, parent=None)[source]¶ Standard output from dhtmlparser is single-linked tree. This will make it double-linked.
Parameters: - dom (obj) –
HTMLElement
instance. - parent (obj, default None) – Don’t use this, it is used in recursive call.
- dom (obj) –
-
dhtmlparser.
removeTags
(dom)[source]¶ Remove all tags from dom and obtain plaintext representation.
Parameters: dom (str, obj, array) – str, HTMLElement instance or array of elements. Returns: Plain string without tags. Return type: str