Bonjour,
J'aimerais comprendre comment fonctionne cette méthode.
J'ai redéfini mon Parser, avec sa fonction HandleStartTag:
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print "Encountered the beginning of a %s tag" % tag
Ensuite, je fais:
p = MyHTMLParser()
test = '<A HREF="http://www.cwi.nl/"> <td class="src"> Oyo <td> <tr> <td> ceci est un </td> test'
p.feed(test)
p.handle_starttag('a', [('href', 'http://www.cwi.nl/')])
et ils me retournent ça:
Encountered the beginning of a a tag
Encountered the beginning of a td tag
Encountered the beginning of a td tag
Encountered the beginning of a tr tag
Encountered the beginning of a td tag
Encountered the end of a td tag
Encountered the beginning of a a tag
Ne devrais-je pas avoir "Encountered the beginning of a a tag" uniquement?
Merci
a+
dje-dje
PS:
J'ai été voir là:
http://www.python.org/doc/current/lib/module-HTMLParser.html
Ils y donnent ça:
handle_starttag( tag, attrs)
This method is called to handle the start of a tag. It is intended to be overridden by a derived class; the base class implementation does nothing.
The tag argument is the name of the tag converted to lower case. The attrs argument is a list of (name, value) pairs containing the attributes found inside the tag's <> brackets. The name will be translated to lower case and double quotes and backslashes in the value have been interpreted. For instance, for the tag <A HREF="http://www.cwi.nl/">, this method would be called as "handle_starttag('a', [('href', 'http://www.cwi.nl/')])".
Il y a 10 types de personne dans le monde,
ceux qui comprennent le binaire et les autres
