web_programming.emails_from_url

Get the site emails from URL.

Attributes

__author__

__email__

__license__

__maintainer__

__status__

__version__

emails

Classes

Parser

Find tags and other markup and call handler functions.

Functions

emails_from_url(→ list[str])

This function takes url and return all valid urls

get_domain_name(→ str)

This function get the main domain name

get_sub_domain_name(→ str)

Module Contents

class web_programming.emails_from_url.Parser(domain: str)

Bases: html.parser.HTMLParser

Find tags and other markup and call handler functions.

Usage:

p = HTMLParser() p.feed(data) … p.close()

Start tags are handled by calling self.handle_starttag() or self.handle_startendtag(); end tags by self.handle_endtag(). The data between tags is passed from the parser to the derived class by calling self.handle_data() with the data as argument (the data may be split up in arbitrary chunks). If convert_charrefs is True the character references are converted automatically to the corresponding Unicode character (and self.handle_data() is no longer split in chunks), otherwise they are passed by calling self.handle_entityref() or self.handle_charref() with the string containing respectively the named or numeric reference as the argument.

handle_starttag(tag: str, attrs: list[tuple[str, str | None]]) None

This function parse html to take takes url from tags

domain
urls: list[str] = []
web_programming.emails_from_url.emails_from_url(url: str = 'https://github.com') list[str]

This function takes url and return all valid urls

web_programming.emails_from_url.get_domain_name(url: str) str

This function get the main domain name

>>> get_domain_name("https://a.b.c.d/e/f?g=h,i=j#k")
'c.d'
>>> get_domain_name("Not a URL!")
''
web_programming.emails_from_url.get_sub_domain_name(url: str) str
>>> get_sub_domain_name("https://a.b.c.d/e/f?g=h,i=j#k")
'a.b.c.d'
>>> get_sub_domain_name("Not a URL!")
''
web_programming.emails_from_url.__author__ = 'Muhammad Umer Farooq'
web_programming.emails_from_url.__email__ = 'contact@muhammadumerfarooq.me'
web_programming.emails_from_url.__license__ = 'MIT'
web_programming.emails_from_url.__maintainer__ = 'Muhammad Umer Farooq'
web_programming.emails_from_url.__status__ = 'Alpha'
web_programming.emails_from_url.__version__ = '1.0.0'
web_programming.emails_from_url.emails