Jared Foy Teach me good judgement and knowledge

Custom classes and other cool things

Published December 20th, 2018 4:38 pm

Python has packages galore, to do basically whatever there is to do, and the Object-Oriented beauty of Python is exceptional!

We can get all the strings that have been written to the io.StringIO object by calling io.StringIO.getvalue(). Because we previously assigned sys.stdout = io.StringIO() we can access the .getvalue() method by using sys.stdout.getvalue(). This will return with a string containing all the lines that have been written. This returned string could be printed, saved to a log, or sent over a network connection like any other string. Later I'm sure we'll encounter another example using io.StringIO().
Command-Line Programming
Perhaps we need a program to be able to process text that may have been redirected in the console or that may be in files listed on the command line. We can use the fileinput module, especially the fileinput.input() method. This func iterates over all the lines redirected from the console and over all the lines in the files which are listed on the command line, as one continuous sequence of lines. fileinput() can report the current filename and line number at any time using fileinput.filename() and fileinput.lineno(). It can also handle some kinds of compressed files.
We are given two different modules for handling command-line options, optparse and getopt. The getopt module is popular because it is simple to use and has been standardized for a long time. However, optparse is newer and more powerful.
An example of the optparse module is as follows:
import optparse
def main():
parser = optparse.OptionParser()
parser.add_option('-w', '--maxwidth', dest='maxwidth', type='init', help=('the maximum number of characters that can be ' 'output to string fields [default: %default]'))
parser.add_option('-f', '--format', dest="format", help=('the format used for outputting numbers ' '[default: %default]'))
parser.set_defaults(maxwidth=100, format='.0f')
opts, args = parser.parse_args()

We do not need to explicitly provide -h and --help options because these are handled by the optparse module to produce a suitable usage message using the texts from the help keyword arguments that we gave. Any with '%default' text is replaced with the option's default value. We should also note that the options now use the conventional Unix style which has both short and long option names that begin with a hyphen. We can use short names for interaction in the console, and the long names are more understandable when used in bash scripts. If we set the max width to 80 we can use any of these commands in the command line:
-w80, -w 80, --maxwidth=80, or --maxwidth 80

After the command line is parsed, the options will be available using the 'dest' names that are given such as 'maxwidth' and 'format'. We can access these by doing: opts.maxwidth and opts.format. Any command line args that haven't been processed which are usually filenames will be in the args list (sys.argv). If an error occurs when parsing the command line, the optparse parser will call sys.exit(2). This will cause a clean program termination and returns 2 to the os as the program's result value. Usually, a return value of 2 means a usage error occurred, 1 means any other kind of error, 0 means success. When sys.exit() is called with no args it returns a 0 to the operating system.

Maths and Numbers
In addition to float, int, and complex types, the lib provides the decimal.Decimal and fractions.Fraction types. There are three numeric libraries available: math for the standard mathematical functions, cmath for complex number math functions, and random which provides many functions for random number generation. Python's numeric abstract base classes (these are classes that can be inherited from however they cannot be used directly) are in the numbers module. The methods in the number module are useful for checking that an object is any kind of number using the syntax isinstance(x,numbers.Number), or is a specific kind of number. For example, isinstance(x,numbers.Rational) or isinstance(x,numbers.Integral). For the folks involved in scientific and engineering programming this one should be particularly handy: NumPy which is a third party package. NumPy provides highly efficient 'n-dimensional' arrays, basic linear algebra functions, and Fourier transforms, as well as tools for integration with C, and C++ and Fortran code. The SciPy package incorporates NumPy and extends it to include modules for statistical computations, signal and image processing, genetic algorithms, and a great deal more. These are available at scipy.org.

Times and Dates
The calendar and datetime modules give us functions and classes for date and time handling. They are based on an idealized Gregorian calendar so they are not suitable for dealing with 'pre-Gregorian' dates (such a common accomodation too! haha). Time and date handling is quite complex. The calendars in use today have varried a lot in places and times. A day is not exactly 24 hours, a year is not exactly 365 days, and daylight saving time and time zones vary. The datetime.datetime class (don't mistake for the datetime.date class) has provisions for handling time zones, however this must be configured. In order to do so third party modules like dateutil and mxDateTime are available.
The time module handles timestamps. These are simply numbers that hold the number of seconds since the epoch (1970-01-01T00:00:00 on Unix). The time module can be used to get a timestamp of the machine's current time in UTC, or as a local time that accounts for DST, and also to create date, time, and date/time strings formatted in many different ways. With the time module we can also parse strings that have dates and times.
Here we show an example of the calendar, datetime, and time modules. Objects of type datetime.datetime are created programmatically in most cases whereas objects that hold UTC date/times are usually received from external sources like file timestamps. Here are some examples:
import calendar, datetime, time
moon_datetime_a = datetime.datetime(1969, 7, 20, 20, 17, 40)
moon_time = calendar.timegm(moon_datetime_a.utctimetuple())
moon_datetime_b = datetime.datetime.utcfromimestamp(moon_time)
moon_datetime_a.isoformat() #returns: '1969-07-20T20:17:40'
moon_datetime_b.isoformat() #returns: '1969-07-20T20:17:40'
time.strftime("%Y-%m-%dT%H:%M:%S", time.gmtime(moon_time))

Wow, that was certainly a lot of ways to display dates! The moon_datetime_a variable is of type datetime.datetime and holds the date and time that Apollo 11 landed on the moon. The moon_time variable is of type int and holds the number of seconds since the epoch to the moon landing (this is provided by calender.timegm() which takes a time_struct object returned by the datetime.datetime.utctimetuple() and returns the number of seconds that the time_struct represents.) Because the moon landing occurred before the Unix epoch, the number is negative. The moon_datetime_b variable is of type datetime.datetime and is created from the moon_time variable in order to show how the conversion from the number of seconds since the epoch to a datetime.datetime object. The last three syntaxes all return ISO 8601-format date/time strings. If we want the current UTC date/time we can get it by datetime.datetime.utcnow(). We can also get the current time as it is counted from the Unix epoch by calling time.time(). For local date/time, use datetime.datetime.now() or time.mktime(time.localtime())

Algorithms and Collection Data Types
The bisect module gives us ability to search sorted sequences like sorted lists, and for inserting items while preserving the sort order. This module's functions use the binary search algorithm, this makes them very fast. The heapq module gives us the ability to turn a sequence like a list into a 'heap'. A heap is a collection data type where the first item (at index 0) is always the smallest item. The heapq module also lets us insert and remove items while keeping the sequence as a heap.
The collections package gives the collections.defaultdict dictionary which we have previously seen as well as the collections.namedtuple. Additionally the collections package provides the collections.UserList and collections.UserDict types, although subclassing the built-in list and dict types is probably more common than using these collections types. Another type is collections.deque, which is similar to a list, but whereas a list is very fast for adding and removing items at the end, a collections.deque is very fast for adding and removing items both at the beginning and at the end.
Python 3.1 gave us the collections.OrderedDict and the collections.Counter classes. OrderedDicts have the same API as normal dicts, although when iterated the items are always returned in insertion order, and the .popitem() always returns the most recently added item. The Counter class is a dict subclass used to provide a fast and easy way of keeping various counts. Given an iterable or a mapping (like a dictionary) a Counter instance can return a list of the unique elements or a list of the most common elements as (element, count) 2-tuples.
Python's non-numeric abstract base classes (these are classes that can be inherited from but not used directly) are also in the collections package. They are discussed more later.
The array module provides the array.array sequence type that can store numbers or characters in a very space-efficient way. It has similar behavior to lists except that the type of object it can store is fixed when it is created. Therefore array.array can only store objects of a single type. The third-party package NumPy discussed previously also provides efficient arrays.
The weakref module gives us ability to create weak references. Weak references behave like normal object references except that if the only reference to an object is a weak reference, the object can still be scheduled for garbage collection. This prevents objects from being kept in memory simply because we have a reference for them. With the weakref module we can check if the object a weak reference refers to still exists, and we can access the object if it does.
An example of the heapq module. The heapq module lets us convert a list into a heap and allows us to add and remove items from the heap while preserving the 'heap property'. A heap is a binary tree that respects the heap property, which is that the first item (at index pos. 0) is always the smallest item. Each of the heap's subtrees is also a heap, so they also respect the heap property. Here is how we could start heaping on our own:
import heapq
heap = []
heapq.heappush(heap, (5, 'rest'))
heapq.heappush(heap, (2, 'work'))
heapq.heappush(heap, (4, 'study'))

If we happen to already have a list we can turn it into a heap by doing: heapq.heapify(alist). This will do any necessary reordering in-place. The smallest item can be removed from the heap using heapq.heappop(heap)
for x in heapq.merge([1, 3, 5, 8], [2, 4, 7], [0, 1, 6, 8, 9]):
print(x, end=" ") #prints a heap ordered from smallest number to largest
The heapq.merge() function takes any number of sorted iterables as args and returns an iterator that iterates over all the items from the iterables in order.
File Formats, Encodings, and Data Persistence
The std lib has extensive support for a variety of standard file formats and encodings. The base64 module has functions for reading and writing using the Base16, Base32, and Base64 encodings which are specified in RFC 3548. The quopri module as functions for reading and writing 'quoted-printable' format, which is defined in RFC 1521 and is used for MIME (Multipurpose Internet Mail Extensions) data. The uu module has functions for reading and writing uuencoded data. RFC 1832 defines the External Data Representation Standard and module xdrlib provides functions for reading and writing data in this format.
Some modules are also provided for reading and writing archive files in the most popular formats. The bz2 module can handle .bz2 files, the gzip module handles .gz files, the tarfile module handles .tar, .tar.gz, .tgz, and .tar.bz2 files. The zipfile module handles .zip files. We will see and example of using the tarfile module here.
There is also support for handling some audio formats, with the aifc module for AIFF (Audio Interchange File Format) and the wave module for uncompressed .wav files. Some forms of audio data can be manipulated using the audioop module, and the sndhdr module provides a couple of functions for determining what kind of sound data is stored in a file and some of its properties, such as the sampling rate.
Sidenote: RFC standards are comprised of Requests for Comment, in which folks who have questions or solutions to problems of standardization can get things standardized.
A format for configuration files (like old windows .ini files) is specified in RFC 822, and the configparser module provides functions for reading and writing these files.
Many applications like Excel can read and write CSV (Comma Separated Value) data, or variants such as tab-delimited data. The csv module can read and write these formats, and also account for the idiosyncracies that prevent CSV files from being straightforward to handle directly.
The std lib has packages and modules that provide data persistence. The pickle module is used to store and retrieve arbitrary Python objects (including entire collections) to and from disk; this module is covered in chp 7. The library also supports DBM files of various kinds. DBM files are like dictionaries except that their items are stored on disk rather than in memory, and both their keys and their values must be bytes objects or strings. The shelve module, covered in chp 12, can be used to provide DBM files with string keys and arbitrary Python objects as values. The shelve module seamlessly converts the Python objects to and from bytes objects behind the scenes. The DBM modules, Python's database API, and using the built-in SQLite database are all covered in chp 12.
Here is an example using the base64 module. The base64 module is used mostly for handling binar data that is embedded in emails as ASCII text. It can be used as well to store binary data inside .py files. The first thing to do is to get the binary data into Base64 format.
import base64
binary = open(left_align_png, 'rb').read()
ascii_text = ""
for i, c in enumerate(base64.b64encode(binary)):
if i and i % 68 == 0:
ascii_text += "\\\n"
ascii_text += chr(c)

This snippet reads the file in binary mode and converts it to a Base64 string of ascii characters. Every sixty-eighth character a blackslash-newline combo is added. Doing this limits the width of the lines of ASCII characters to 68, but ensures that when the data is read back the newlines will be ignored. They way this is done is clever, it is only ignored because the backslash will escape them. The ASCII text obtained like this can be stored as a bytes literal in a .py file like so:
If we like, we can convert it back to its original binary form like this:
binary = base64.b64encode(LEFT_ALIGN_PNG)
If we want, the binary data could be written to a file using: open(filename, 'wb').write(binary). Keeping binary data in .py files is much less compact than keeping it in its original form, but can be useful if we want to provide a program that requires some binary data as a single .py file.
Here is an example of the tarfile module. The .tar format is widely used on Unix systems, but Windows doesn't usually provide support. We can fix this using Python's tarfile module which can create and unpack .tar and .tar.gz archives (tarballs). If we have the correct libraries installed we can also do .tar.bz2 archives as well.

import tarfile
import bz2
except ImportError as err:

Some Python binaries for Unix don't have the bz2 library. Therefore we put it in a try block.
UNTRUSTED_PREFIXES = tuple(["/", "\\"] + [c + ":" for c in string.ascii_letters])
In this snippet we have created the tuple ('/', '\', 'A:', 'B:', …, 'Z:', 'a:', 'b:', …, 'z:').
These are all untrusted paths. Tarballs shouldn't be given absolute paths since then they risk overwriting system files. Therefore as a precaution we don't unpack any file whose name starts wit one of the prefixes in the tuple.

def untar(archive):
tar = None
tar = tarfile.open(archive)
for member in tar.getmembers():
if member.name.startswith(UNTRUSTED_PREFIXES):
print('untrusted prefix, ignoring', member.name)
elif "..." in member.name:
print('suspect path, ignoring', member.name)
print('unpacked', member.name)
except (tarfile.TarError, EnvironmentError) as err:
if tar is not None:

Each file in a tarball is called a 'member'. The tarfile.getmembers() function returns a list of tarfile.TarInfo objects. There is one for each member. The member's filename, including its path is in the tarfile.TarInfo.name attribute. If the name begins with a prefix that we think is suspect we output an error message, otherwise we call tarfile.extract() which saves the member to disk. The tarfile module has its own list of exceptions, here we have simplified this and just did a sloppy catchall.
def error(message, exit_status=1):
File, Directory, and Process Handling
The shutil module gives high-level functions for file and directory handling which include shutil.copy() and shutil.copytree() if we like to copy files and entire directory trees. shutil.move() will move directory trees, and shutil.rmtree() will remove entire directory trees, including nonempty ones (use with caution!).
If we want to create temporary files and directories we should use tempfile module which provides the necessary functions like tempfile.mkstemp(). The tempfile module creates the temporaries in the most secure manner possible.
The filecmp module is used to compare files with the filecmp.cmp() function and to compare entire directories with the filecmp.cmpfiles() function.
We can very powerfully use Python by orchestrating the running of other programs. This can be done with the subprocess module which can start other processes, communicate with them using 'pipes', and retrieve their results. This is covered in chp 10.
An even more powerful module is the multiprocessing module which gives us facilities for offloading work to multiple processes and for accumulating results, and can often be used as an alternative to 'multithreading'.
The os module gives platform-independent access to os functionality. The os.environ variable holds a mapping object whose items are environment variable names along with their values. The program's working directory is provided by os.getcwd() and can be changed using os.chdir(). The os module also provides functions for low-level file-descriptor-based file handling. The os.access() function can be used to determine whether a file exists or whether it's readable or writable. The os.listdir() function returns a list of the entries (these are the files and subdirs, excluding the . and .. entries) in the directory it is given. The os.stat() function returns various items of info about a file or dir, like its mode, access time, and size. We can create directories using os.mkdir(), or if intermediate dirs need to be created, we can use os.mkdirs(). We can remove empty dirs using os.rmdir() and dir trees that contain only empty dirs can be removed using os.removedirs(). We can remove files or dirs using os.remove(). We can rename with os.rename(). The os.walk() function iterates over an entire directory tree, retrieving the name of every file and dir in turn. The os module gives many low-level platform specific functions. If we want to work with file descriptors we can do so as well as fork (on Unix), spawn and exec.
The os module gives functions for interacting with the operating system. But the os.path module gives a mixture of string manipulation (of paths), and some file system convenience functions. The os.path.abspath() function returns the absolute path of its argument, with redundant path separators and .. elements removed. os.path.split() returns a 2-tuple with the first element containing the path and the second being the filename (which will return empty if a path with no filename was given). These two parts are also available directly using os.path.basename() and os.path.dirname(). A filename can also be split into two parts: the name and the extension. We can do this using os.splitext(). The os.path.join() function takes any number of path strings and returns a single path using the platform-specific path separator.
If we want several pieces of info about a file or dir we can use os.stat(). However, if we want just one piece, we can use the relevant os.path function like these: os.path.exists(), os.path.getsize(), os.path.isfile(), or os.path.isdir().
The mimetypes module gives the mimetypes.guess_type() function which tries to guess the given file's MIME type.
Here is an example of the os and os.path modules in action. We can use them to create a dictionary where each key is a filename. This will include the path. And each value is the timestamp (this is in seconds since the Unix epoch) when the file was last modified, for those files in the given path:
date_from_name = {}
for name in os.listdir(path):
fullname = os.path.join(path, name)
if os.path.isfile(fullname):
date_from_name[fullname] = os.path.getmtime(fullname)

This code can only be used for the files in a single directory. If we want to traverse an entire dir tree we can us the os.walk() function. Here is a snippet that creates a dictionary where each key is a 2-tuple (containing file size, filename) in which the filename excludes the path, and where each value is a list of the full filenames that match their key's filename and have the same file size:
data = collections.defaultdict(list)
for root, dirs, files in os.walk(path):
fullname = os.path.join(root, filename)
key = (os.path.getsize(fullname), filename)

For each dir, os.walk() returns the root and two lists, one of the subdirs in the dir and the other of the files in the dir. In order to get the full path for a filename we have to combine just the root and the filename. Note that we don't have to recurse into the subdirs ourselves, os.walk() is doing that for us. Once the data has been gathered, we can iterate over it to produce a report of possible duplicate files like so:
for size, filename in sorted(data):
names = data[(size, filename)]
if len(names) > 1:
print('{filename} ({size} bytes) may be duplicated ' '({0} files):'.format(len(names), **locals()))
for name in names:

Because the dictionary keys are (size, filename) tuples, we don't need to use a key function to get the data sorted in size order. If any (size,filename) tuple has more than one filename in its list, these are possible duplicates.

Networking and Internet Programming
Packages and modules for networking and Internet programming encompass a great deal of the std lib. At the lowest level, the socket module gives the most fundamental network functionality, with functions for creating sockets, doing DNS lookups, and handling IP addresses. Encrypted and authenticated sockets can be set up using the ssl module. The socketserver module gives TCP which stands for Transmission Control Protocol and UDP (User Datagram Protocol) servers. These TCP and UDP servers provided by the socketserver module can handle requests directly, or they can create a seperate process (by forking) or a seperate thread to handle each request. 'Asynchronous' client and server socket handling can be achieved using the asyncore module and the higher-level asynchat module is built on top of asyncore.
Python has defined the WSGI (Web Server Gateway Interface) to provide a std interface between web servers and web apps written in Python. In support of the WSGI standard, the wsgiref package gives a reference implementation of WSGI that has modules for providing WSGI-compliant HTTP servers, and for handling response header and CGI (Common Gateway Interface) scripts. In addition, the http.server module gives and HTTP server which can be given a request handle (a standard one is provided) in order to run CGI scripts. The http.cookies and http.cookiejar modules give functions for managing cookies, and CGI script support is provided by the cgi and cgitb modules.
If we want client access to HTTP requests we can use the http.client module, however the higher-level urllib package's modules urllib.parse, urllib.request, urllib.response, urllib.error and urllib.robotparser give us easier and more convenient access to URLs. We can grab a file from the interwebs by simply:
fh = urllib.request.urlopen('http://www.python.org/index.html')
html = fh.read().decode('utf8')

Python is telling me that urllib.request doesn't exist, the book says otherwise. I'll have to take a look at this. Anyway, the urllib.request.urlopen() function returns an object that behaves much like a file object opened in read binary mode. Here we retrieve the websites index.html file (as a bytes object), and store it as a string in the html variable. We can also grab files and store them in local files with the urllib.request.urlretrieve() function. We can parse HTML and XHTML docs using the html.parser module. URL's can be parsed and created using the urllib.parse module and robots.txt files can be parsed with the urllib.robotparser module. Data represented using JSON (Javascript Object Notation) can be read and written using the json module. Additionally, the std lib gives XML-RPC (Remote Procedure Call) support with the xmlrpc.client and xmlrpc.server modules. There is additional client functionality provided for FTP (File Transfer Protocol) by the ftplib module, for NNTP (Network News Transfer Protocol) there is the nntplib module, and for TELNET there is the telnetlib module.
The smtpd module gives and SMTP (Simple Mail Transfer Protocol) server, and the email client modules are smtplib for SMTP, imaplib for IMAP4 (Internet Message Access Protocol), and poplib for POP3 (Post Office Protocol). Mail boxes in various formats can be accessed using the mailbox module. Individual messages can be created and manipulated using the email module.
If you find the std lib packages and modules insufficient here, Twisted provides a comprehensive third-party networking library. Many third party web programming libs are available including Django and Turbogears for creating web applications, Plone and Zope provide complete web frameworks and content management systems. These are all written in Python.

There exist two widely used approaches to parsing XML documents. One is the DOM (Document Object Model) and the other is SAX (Simple API for XML). Two DOM parsers are provided, one by the xml.dom module and the other by the xml.dom.minidom module. A SAX parser is provided by the xml.sax module. We already used xml.sax.saxutils in order to use its xml.sax.saxutils.escape() function. Remember, we did this to get the XML-escape and other characters.
Two other parsers are available. The xml.parsers.expat module which can parse XML docs with 'expat', giving the expat lib, and the xml.etree.ElementTree can be used to parse XML documents using a kind of dictionary/list interface. Note that the DOM and element tree parsers use the expat parser under the hood.
Writing XML manually and writing XML using DOM and element trees, and parsing XML using the DOM,SAX, and element tree parsers is covered in chp 7.
We also have the third-party lib, lxml from codespeak.net/lxml that is supposedly the most feature-rich and easiest library to use for working with XML and HTML in Python. This lib gives an interface that is essentially a superset of what the element tree module provides, as well as many additional features like support for XPath, XSLT, and other XML technologies.

Here we will do an example of the xml.etree.ElementTree module. Python's DOM and SAX parsers give the API's tha experienced XML programmers are used to, and the xml.etree.ElementTree module offers a more Pythonic approach to parsing and writing XML. The element tree module is a fairly recent addition to the std lib, so perhaps you are not familiar with it. Here is a very short example and chp 7 gives a more substantial example and uses comparative code using the DOM and SAX. The US govs NOAA website gives a wide variety of data including an XML file that lists the US weather stations. The XML file is more than 20,000 lines long and contains details of about 2000 stations. A typical entry is like so:
This example, which may not be visible because it is in an HMTL comment has been reduced in size but you get the gist. It's a big file, so we compress it using gzip. The element tree parser requires either a filename or a file object to read, but we can't give it the compressed file because that will just appear to be random binary data. This can be solved by doing two steps:
binary = gzip.open(filename).read()
fh = io.StringIO(binary.decode('utf-8'))

The gzip module's gzip.open() is similar to the built-in open() function except it read the gzip compressed files as raw binary data. We need the data available as a file that the element tree parser can work with, so we use the bytes.decode() method to convert the binary data to a string using UTF-8 encoding (which is the encoding that xml uses), then we create a file like io.StringIO object with the string containing the entire XML file as its data
tree = xml.etree.ElementTree.ElementTree()
root = tree.parse(fh)
stations = []
for element in tree.getiterator('station_name'):

In this snippet we created a new xml.etree.ElementTree.ElementTree object and gave it a file object from which to read the XML we want it to parse. As far as the element tree parser is concerned it has been passed a file object open for reading, although in reality it is reading a string inside the io.StringIO object. We want to extract the names of all the weather stations, we can do this by using the xml.etree.ElementTree.ElementTree.getiterator() method which returns an iterator that returns all the xml.etree.ElementTree.Element objects that have the given tag name. We use the element's text attribute to retrieve the text. Like os.walk(), we don't have to do any recursion ourselves, the iterator method .getiterator() is already doing that for us. We don't have to specify a tag either, the iterator will return every element in the entire XML doc.

Other Modules
There's like two hundred modules and packages that are available in the std lib. This over should be good to see what the std lib provides and some of the key packages in the major areas it serves. Here we will discuss a few more things of interest. Previously, we saw how easy it was to create tests in docsrings and to run them using the doctest module. The std lib also has a unit-testing framework provided by the unittest module. This is a Python version of the Java JUnit test framework. The doctest module also gives some basic integration with the unittest module. Testing is covered more in chp 9. There are also several third-party testing frameworks like py.test, and nose from code.google.com/p/python-nose. Apps that are noninteractive liek servers, often report problems by writing to log files. The logging module gives a uniform interface for logging and in addition to being able to log to files, it can log using HTTP GET or POST requests, or using email sockets. The std lib gives many modules for introspection and code manipulation, and though most of these said modules are beyond the scope here, it is worth mentioning pprint. The pprint module has functions for 'pretty printing' Python objects, which includes collection data types. pprint is sometimes useful for debugging. In chp 8 we'll see a simple use of the inspect module that introspects live objects. The threading module gives support for creating threaded applications, and the queue module provides three different kinds of thread-safe queues. Threading is late covered in chp 10. Python doesn't have native support for GUI programming. However there are several GUI libs that can be utilized in Python programs. For instance, the Tk library is available using the tkinter module, it is usually installed as standard. GUI programming is introduced in chp 15. The abc (Abstract Base Class) module gives functions for creating abstract base classes. This is covered in chp 8. The copy module gives us copy.copy() and copy.deepcopy() which we have seen previously. Access to 'foreign functions' (these are functions in shared libs) --.dll files on Windows, .dylib files on MacOSX, and .so files on Linux-- is available using the ctypes module. Python gives a C API, so it is possible to create custom data types and functions in C and then make them available to Python. However, the ctype module and the C API are beyond our scope here.
Even if we can't find the functionality we need in the std lib, before writing anything from scratch it's worth check the Python docs Global Module Index to see if there is a suitable module available for our needs. Then if that fails take a look at the PyPI (Python Package Index) at pypi.python.org/pypi. It contains several thousand Python add-ons ranging from small one-file modules to large libs and framework packages.
Chapter 6: Object Oriented Programming
We've already used objects a lot. But we've been doing procedural programming. Python, however, is a 'multiparadigm' language which allows us to do procedural, object-oriented, and functional programming, or we can mix all these up. We can write programs in procedural style, and for programs about 500 lines or less, this isn't a problem. However, for most programs, and medium and large ones, OOP gives us advantages. Here we cover the fundamental concepts and techniques for doing OOP in Python. Firstly, we cover stuff for people like me (inexperienced) and for folks coming from C or Fortran. We start by looking at some problems that procedural programming can cause that OOP can solve. We then describe Python's approach to OOP and explain the relevant terminology. Then we have the two main sections. The second section covers creation of custom data types that hold single items (however these items may have several attributes). The third section covers the creation of custom data collections data types that can hold any number of objects of any types. These sections cover most aspects of OOP in Python, some more advanced stuff is in chp 8.

The Object-Oriented Approach
Here we see some of the problems which come with procedural programming by considering a situation where we need to represent circles, potentially many of them. The minimum data required to represent a circle is its (x,y) position and its radius. We could do this with a 3-tuple for each circle:
circle = (11, 68, 8)
However, a drawback of this is that it isn't obvious what each element means in the tuple. It could be x,y, radius or something else. Another bummer is that we can access the elements by index position only. If we have two functions say: distance_from_origin(x,y) and say: edge_distance_from_origin(x,y,radius), we would have to use tuple unpacking to call them with a circle tuple:
distance = distance_from_origin(*circle[:2])
distance = edge_distance_from_origin(*circle)
Both of these statements assume that the circle tuples are of the form (x,y,radius). We can solve the problem of having to know the element order and of using tuple unpacking by using a named tuple like so:
import collections
Circle = collections.namedtuple('Circle', 'x y radius')
circle = Circle(13, 84, 9)
distance = distance_from_origin(circle.x, circle.y)

Using this object-oriented approach, we create Circle 3-tuples with named attributes which makes function calling much easier to understand, because accessing elements can be done by using there names, this gives helpful context! However, we still have problems, like, there is nothing to stop an invalid circle from being created like so:
circle = Circle(33,56,-5) #negative radius value: Error
The circle named tuple of class Circle is created here with a negative radius and it doesn't raise an exception. However, the error will be noticed if we call edge_distance_from_origin(), and then only if that function checks for a negative radius. This inability to validate when creating an object is the worst apsect of taking a purely procedural approach. If we want circles to be changeable so that we can move them by changing their coordinates or resize them by changing their radius, we can do so by using the private collections.namedtuple._replace() method:
circle = circle._replace(radius=12)
In so making use of this private fucntion, in similar way to creating the circle literal, there is nothing stopping us from replacing the radius value with bad data. If the circles were going to need a bunch of changes, we might use a mutable data type like a list, which would probs be more convenient:
circle = [36,77, 8]
However, this still doesn't protect us from inputting bad data, and the best we can do about accessing elements by name is to create some constants so tat we can write things like: circle[RADIUS] = 5. But using a list brings additional problems, like we can call circle.sort() (oops). We could perhaps use a dictionary: circle = dict(x=36,y=77,radius=8) but again there is no way to ensure a valid radius and no way to prevent inapropriate methods from being called.
Object-Oriented Concepts and Terminology
The solution lies in our need to package up the data that is necessary to represent a circle, and some way to restrict the methods that can be applied to the data so that only valid operations are posible. Both of these things can be done by creating a custom 'Circle' data type. Here we will see how to create a Circle data type later in this section, first lets cover the prelims and expalain some terms.
The terms class, type, and data type are interchangeable. In Python we can create custom classes that are fully integrated and that can be used just like the built-in data types. We have already mentioned many classes like dict, int, str. We use the term object, and occasionally the term instance to refer to an instance of a particular class. 5 is an int, and 'oblong' is a str. Most classes encapsulate both data and the methods that can be applied to the data of such a class. Por ejemplo, the str class holds a string of Unicode characters as its data and supports methods like str.upper(). Many classes also support other features like being able to concatenate two strings (or any two sequences) using the + operator. We can also find a sequences length using len(). These features are provided by 'special methods'. Special methods are like normal methods except that their names always begin and end with two underscores, they are also predefined. Por ejemplo, if we want to create a class that supports concatenation using the + operator and also the len() function, we can do so by implementing the __add__() and __len__() special methods in our custom class. Conversely, we should never define any method with a name that begins and ends with two underscores unless it is one of the predefined special methods and is appropriate to our class. This ensures that we dont get conflicts with alter versions of python if they introduce new predefined special methods.
Generally, objects will have attributes. Methods are callable attributes, and the other attributes are data. Por ejemplo, a 'complex' object has 'imag' and 'real' attributes and a bunch of methods which include special methods like __add__() and __sub__() (these support the binary + and - operators). The complex object also has normal methods like .conjugate(). Data attributes (usually just called attributes with no further qualification) are normally implemented as 'instance variables'. These are variables that are unique to a particular object. Later we'll see examples of this, and also some examples of how to provide data attributes as 'properties'. A property is an item of object data that is accessed like an instance variable but where the accesses are handled by methods behind the scenes. Using properties makes it easier to do data validation.
Inside a method --which is simply a function whose first arg is the instance on which it is called to operate-- several kinds of variables are potentially accessible. The object's instance variables can be accessed by qualifying their name with the instance itself. Local variables can be created inside the method, these can be accessed without qualification. Class variables which are sometimes called static variables can be accessed by qualifying their name with the class name, and global variables, which are module variables, are accessed without qualification.
Some Python literature uses the concept of a 'namespace' which is a mapping from names to objects. Modules are namespaces. Por ejemplo, after the satement: import math we can access objects in the math module by qualifying them with their namespace name (such as math.pi and math.sin()). In like manner, classes and objects are also namespaces. If we have: z = complex(1,2) , the z object's namespace has two attributes which we can access (z.real and z.imag)
One of the advantages of object orientation is that if we have a class, we can 'specialize' it. Specializing is making a new class that inherits all the attributes (the data and methods) from the original class that was used to specialize. We do this so that we can add or repalce methods or add more instance variables. We can also 'subclass' (just another name for specializing) any Python class, whether it is a built-in class or from the std lib, or one of our own custom classes. Note: some library classes that are implemented in C cannot be subclassed; these will be specified in their respective docs. The ability to subclass is one of the great advantages that OOP gives us because it maes it straightforward to use an existing class that has tried and tested functionality as the basis for a new class that extends the original, adding new data attributes or new functionality in a very clean and direct way. Additionally, we can pass objects of our new class to functions and methods that were written for the original class and they will work correctly. Cool!
The term 'base class' is used to refer to a class that is inherited. A base class may be the immediate ancestor, or may be further up the inheritance tree. Base classes may also be referred to as super classes. We use the term subclass, derived class, or derived in order to describe a class that inherits from another class. In Python every built-in class as well as lib class and our own created classes are derived directly or indirectly from the ultimate base class THE OBJECT. Can I please mention how beautiful this is?
Any method can be overridden which is the same as saying it can be reimplemented, in a subclass; this works the same way as in Java. If we have an object of class MyDict (a class that inherits dict) and we call a method that is defined by both dict and MyDict, Python will call the more specific MyDict version of the method. This is called 'dynamic method binding' or alternatively 'polymorphism'. If we need to call the base class version of a method inside a reimplemented method we can do so by using the built-in super() function.
Python allows us to use 'duck typing': "If it walks like a duck and quacks like a duck, it is a duck". Meaning that if we want to call certain methods on an object, it doesn't matter what class the object is, only that it has the methods we want to call. We already saw that when we needed a file object we could provide one by calling the built-in open() function -- or by creating a io.StringIO object and poviding that instead because io.StringIO objects have the same API, that is, the same methods as the file objects returned by the built-in open() function when it is used in text mode.
Inheritance is used to model 'is-a' relationships. In which a class's objects are essentially the same as some othr class's objects, but with some variations, like extra data attributes and extra methods. We can also use an approach called 'aggregation' or 'composition'. This is where a class includes one or more instance variables that are of other classes. Aggregation is used to model 'has-a' relationships. In Python, every class uses inheritance because all custom classes have 'object' as their ultimate base class. Most classes also use aggregation because most classes have instance variables of various types.
Some OOP languages have two features which Python lacks. The first is 'overloading' which is having methods with the same name but with different parameter lists in the same class. Beacuse Python has versatile arg-handling capabilities this doesn't seem to be a practical limitation. The second lacking feature in Python is 'access control'. There aren't any bullet proof mechanisms for enforcing data privacy. However, if we create attributes (instance variables or methods) that begin with two leading underscores, Python will prevent unintentional accesses so that they can be considered private. This process is done by 'name mangling' covered later.
Just like we have an uppercase letter as the first letter of custom modules, we will do the same for custom classes. We can define as many classes as wel like, either directly in a program or in modules. Class names don't have to match module names, and modules may contain as many class definitions as we like. Now that we've seen some of the problems that classes can solve for us and done some terminology work, and covered some background stuff, let's create some custom classes.

Custom Classes
Earlier we created custom classes which were custom exceptions. Here are two new syntaxes for creating custom classes:
class className:
class className(base_classes):
Since the exception subclasses we previously created didn't add any new attributes (no instance data or methods) we used a suite 'pass' which means that nothing is added, and since the suite was just one statement we put it on the same line as the class statement itself. Note that just like def satements, class is a statement, so we can create classes dynamically if we desire. A class's methods are created using def statements in the class's suite. Class instances are created by calling the class with any necesssary args like so: x = complex(4,8) creates a complex number and sets x to be an object reference to it.

Attributes and Methods
Let's begin with a simple class 'Point' which holds an (x,y) coordinate. The class is in file Shape.py and its complete implementation is shown here:
class Point:
def __init__(self, x=0, y=0):
self.x = x
self.y = y
def distance_from_origin(self):
return math.hypot(self.x, self.y)
def __eq__(self, other):
return self.x == other.x and self.y == other.y
def __repr__(self):
return "Point({0.x!r}, {0.y!r}".format(self)
def __str__(self):
return "{0.x!r}, {0.y!r}".format(self)

Because no base classes are specified, Point is a direct subclass of object, just as though we had written: class Point(object): Before we discuss these methods, let's see some examples of their use:
import Shape
a = Shape.Point()
repr(a) #returns 'Point(0,0)'
b = Shape.Point(3,4)
str(b) #returns '(3, 4)'
b.distance_from_origin() #returns 5.0
b.x = -19
str(b) #returns '(-19, 4)'
a == b, a != b #returns (False, True)

The Point class has two data attributes: self.x, and self.y. It has five methods (not counting the ones it inherited), four of these are special methods; illustrated in the fig. Once we have imported the Shape module, the Point class can be used like any other. The data attributes can be accessed directly (y = a.y) and the class integrates nicely with all of Python's other classes by providing support for the equality operator (==). It also produces strings and representational forms (repr). Python is smart enough to supply the inequality operator (!=) based on the equality operator. We can also specify each operator individually if we want total control)
Python automatically supplies the first arg in method calls because it is an object reference to the object itself (called 'this' in C++ and Java). We must include this arg in the parameter list, and by convention the parameter is called self. This requires a little bit more typing compared with some other languages, but has the advantage of providing absolute clarity. For instance, we always know that we are accessing an object attribute if we qualify with self.
In order to create an object there are two necessary steps. First a raw or uninitialized object must be created, and then the object must be initialized making it ready for use. Some OOP languages like C++ and Java combine these two steps into one, but Python separates them. When an object is created like so: p = Shape.Point(),first the special method called __new__() is called to create the object, and then the special method __init__() is called to initialize it. In practice almost every Python class we create will require us to reimplement only the __init__() method because the object.__new__() method is almost always sufficient and is automatically called if we don't provide our own __new__() method. Later we will see a rare example where we do need to reimplement __new__(). Not having to reimplement methods in a subclass is another benefit of OOP. If the base class method is sufficient we don't have to reimplement it in our subclass. This works because if we call a method on an object and the object's class doesn't have an implementation of that method, Python will automatically go through the object's base classes, and their base classes, and so on, until it finds the method. If the method is not found an AttributeError exception will be raised.
For example, if we execute: p = Shape.Point(), Python begins by looking for the method Point.__new__(). Because we have not reimplemented this method, Python looks for the method in Point's base classes. In this case there is only one base class, that being 'object'. This has the required method, so Python calls object.__new__() and creates a raw and uninitialized object. The Python looks for the initializor which is the function __init__(), and because we have reimplemented it inside of our custom class, Python doesn't need to look further so it calls Point.__init__(). Finally, Python sets p to be an object reference to the newly created and initialized object of the custom type Point. Let's go over those methods defined in the class Point:
def __init__(self, x=0, y=0):
self.x = x
self.y = y

The two instance variables (those being x and y) also noted as self.x and self.y are created in the initializer which is __init__(). The instance variables are assigned the values of the x and y parameters. Since Python will find this initializer when we create a new Point object, the object.__init__() method will not be called. This is because as soon as Python has found the required method it calls it and doesn't look further up the inheritance tree.
An OOP purist might start the method off with a call to the base class __init__() method by calling super().__init__(). The effect of calling the super() function like this is to call the base class's __init__() method. For classes that directly inherit 'object' there is no need to do such a silly thing, and we should really only do this when it is necessary, like when creating classes that are designed to be subclassed, or when creating classes that don't directly inherit object. This is kind of a coding style peculiarity and we can call super().__init__() at the beginning of a custom class's __init__() method.
def distance_from_origin(self):
return math.hypot(self.x, self.y)

This is a conventional method that performs a computation based on the object's instace variables. It is common for methods to be faily short and to have only the object they are called on as an arg, since often all the data the method needs is available inside the object.
def __eq__(self, other):
return self.x == other.x and self.y == other.y

Methods should not have names that begin and end with __ unless they are some of the predefined special methods. Python gives special methods for all the comparison operators as shown later. All instaces of custom classes support == by default, and the comparison returns False unles we compare a custom object with itself. We can override this default behavior by reimplementing the __eq__() special method as we have so here done. Python will supply the __ne__() method (not equal) (!=) automatically if we implement the equality operator but don't reimplement the inequality operator. By default, all instances of custom classes are hashable, so hash() can be called on them and they can be used as dictionary keys and stored in sets as well. But if we reimplement __eq__(), instances are no longer hashable (interesting). Later we we'll see how to fix this. By implementing this special method we can compare Point objects, but if we were to try to compare a Point with an object of a different type like int we would get an AttributeError simply because int's dont have an x attribute.
Here is something really cool! If it so happens that another object of a differing class as that of the one we desire to compare to the other has an attribute in common to the custom class (say our attribute x) we can still compare these two types even though they are different classes. This is where Python's fantastic and flexible duck typing is witnessed. We may get surprising results from this kind of comparison.
If we want to avoid inappropriate comparisons there are a few approaches we can take. One is to use an assertion, like: assert isinstance(other,Point). Another is to raise an exception as a TypeError to indicate that comparisons between the two types aren't supported like so: if not isinstance(other,Point): raise TypeError(). The third way (which is most Pythonic) is to do this: if not isinstance(other,Point): return NotImplemented. In this third case, if NotImplemented is returned, Python will then try calling other.__eq__(self) to see whether the other types supports the comparison with the Point type, and if there is no such method or if that method also returns NotImplemented, Python will give up and raise a TypeError exception. Note also that in this list, only reimplementations of the comparison special methods may return NotImplemented.

__lt__(self, other) x < y Returns True if x is less than y
__le__(self, other) x <= y Returns True if x is less than or equal to y
__eq__(self, other) x == y Returns True if x is equal to y
__ne__(self, other) x != y Returns True if x is not equal to y
__ge__(self, other) x >= y Returns True if x is greater than or equal to y
__gt__(self, other) x > y Returns True if x is greater than y

The built-in isinstance() func calls the __repr__() special method for the object it is given and returns the result. The string returned is one of two kinds. One kind is where the string returned can be evaluated using the built-in eval() func in order to produce an object equvalent to the one repr() was called on. The other kind is used where this is not possible; we will see an example later. Here is how we can go from a Point object to a string and back to a Point object:

p = Shape.Point(3, 9)
q = eval(p.__module__ + '.' + repr(p))