Jared Foy Teach me good judgement and knowledge

Lambda functions and modules, packages, etc...

Published December 19th, 2018 5:10 pm

Python is extremely capable, to do basically whatever you think you could possibly want to do. My brain is swimming with Python.

Using 'global variable_name', we can call a variable that has been defined in the global scope of the .py file instead of creating a variable which is internal to the function we are currently writing. Basically we are inputing the global variable value into the function by declaring global Language (or what have you) the we can assign that in the following lines. For 'non-trivial' programs it's best not to use the global keyword to insert variables unless they are constants in which case we wouldn't need to use the global statement. Lambda Functions
Lambda functions are functions created using the following syntax:
lambda parameters: expression
The parameters are optional, if they are given they are normally just comma-separated variable names, that is, positional args, although the complete arg syntax supported by def statements can be used. The expression can't contain branches or loops (although conditional expressions are allowed). Lamda functions cannot have a return or yield statement. The result of a lambda expression is an anonymous function. When a lambda function is called it returns the result of computing the expression as its result. If the expression is a tuple it should be enclosed in parentheses. This is a lamda function for adding an 's' (or not) depending on whether its argument is 1:
s = lambda x: "" if x == 1 else 's'
This lambda expression returns an anonymous function which we assign to the variable 's'. Any (callable) variable can be called using parentheses, so given the count of files processed in some operation we could output a message using the s() function like so: print('{0} file{1} processed'.format(count, s(count)))
Lambda functions are often used as the key function for the built-in sorted() function and for the list.sort() method. Perhaps we have a list of elements as 3-tuples of (group,number,name) and we wanted to sort this list in various ways:
elements = [(2, 12, 'Mg'), (1, 11, 'Na'), (1, 3, 'Li'), (2, 4, 'Be')]
Sorted, this list would be:
[(1, 3, 'Li'), (1, 11, 'Na'), (2, 4, 'Be'), (2, 12, 'Mg')]
Earlier, we saw that we can provide a key function to alter the sort order of sorted(). If we wanted to sort te list by number and name, rather than the natural ordering of group, number, name, we could write a tiny little function like so:
def ignore0(e): return e[1], e[2]
which could be provided as the key function. Creating a bunch of little functions like this is inconvenient, so we might as well employ the lambda function:
elements.sort(key=lambda e: (e[1], e[2]))
Here the key function is lambda e: (e[1], e[2]) where e is each 3-tuple element in the list. The parens around the lambda function are required when the expression is a tuple and the lambda function is created as a function's argument. We could use slicing in order to achieve the same result:
elements.sort(key=lambda e: e[1:3])
A slightly more elaborate version gives us sorting in case-insensitive name, number order:
elements.sort(key=lambda e: (e[2].lower(), e[1]))
Here are two equivalent ways to create a function that calculates the area of a triangle using the conventional 1/2 x base x height formula:
area = lambda b, h: 0.5 * b * h #or:
def area(b, h):
return 0.5 * b * h
We can call area(6,5) whether we created the function using a lambda or a def statement, we get the same result. Although the lambda is certainly more succinct. Another neat use of the lambda is when we want to create default dictionaries. Remember, that when we access a default dictionary with a non existent key, a suitable item is created with the given key and with a default value, like so:
minus_one_dict = collections.defaultdict(lambda: -1)
point_zero_dict = collections.defaultdict(lambda: (0,0))
message_dict = collections.defaultdict(lambda: 'No message available')
If we access the minus_one dictionary with a nonexistent key, a new item will be created with the given key and with a value of -1. For the point_zero_dict where the value will be the tuple (0, 0), and for the message_dict where the value will be the "No message available" string.
What happens if a func receives args with invalid data? Or when we make a mistake in the implementation of an algorithm and perform an incorrect computation? The worst thing which could happen is that the program executes without any (apparent) problem and no one is any the wiser. In order to avoid this we can write tests (covered in chp 5). Another way is to state he preconditions and postconditions and to indicate an error if any of these are not met. We should use tests and also state preconditions and postconditions. These can be specified using 'assert' statements which has a syntax like so:
assert boolean_expression, optional_expression
If the boolean_expression evaluates False an AssertionError is raised. If the optional_expression is given, it is used as the arg to the AssertionError exception. This is useful for providing error messages. Note that assertions are designed for developers, not end-users. Problems that occur in normal program use such as missing files or invalid command-line args should be handled by other means, like logging a message. Take a look at these equivalent snippets that require that all the args passed to them are nonzero, and consider a call with a 0 arg to be a coding error:
def product(*args):
assert all(args), '0 argument'
result = 1
for arg in args:
result *= arg
return result
def product(*args):
result = 1
for arg in args:
result *= arg
assert result, '0 argument'
return result
The difference in these two snippets is in regard to where the assert declaration is. In the first snippet we have assert all(args) which is forcing a boolean evaluation that all inputs are nonzero, and provides an string to be given upon failure. The second snippet simply asserts the result will be nonzero because boolean evals of 0 return false. Then it returns the result, of course if one of the args is zero, the product will be zero, therefore we might as well evaluate the result instead of all the args. If one of these product() functions is called with a 0 argument an AssertionError exception will be raised, and output similar to the following will be written to the error stream (sys.stderr) which is usually the console. When an assertion is tripped Python provides a traceback that gives the filename, function, and line number, as well as the error message that we specified. When a program is ready for public release (in which it passes all the tests and assertion statements), then what do we do with all our assert statments? We can tell Python not to execute assert statements which throws them away at runtime. This can be done by running the program at the command line with the -0 option like so: python -0 program.py. Otherwise, we can set the PYTHONOPTIMIZE environment variable to 0. If we don't want docstrings, we can use the -00 option which strips out both assert statements and docstrings. (note that there is no environment variable for setting this option). We can take a simpler aproach as well. We can produce a copy of our program with all assert statements commented out, and then test it, then release the assertion free version.
Chapter 5: Modules
Functions allow us to parcel up pieces of code so they can ben reused in a program. Modules provide a means of collecting sets of functions (and also custom data types) together so they can be used by any number of programs. Python also has facilities for creating 'packages'. These are sets of modules that are grouped together, usually because their modules provide related functionality or because they depend on each other. Here we will describe the syntaxes for importing functionality from modules and packages. We then show how to create custom packages and modules. The first module example shows an introduction and the second ilustrates how to handle practical issues like platform independence and testing. In the second section we review Python's standard library. This is good to know because using the standard library is much more efficient than baking our own functions. Also the standard library's modules are widely used, well tested and robust. A few small examples are used to illustrate some common use cases.
A Python module is a .py file. It can contain any Python code we like. All the programs we've written so far have been contained in a single .py file. Therefore they are modules as well as programs. The difference is that programs are designed to be run whereas modules are designed to be imported and used by programs. Not all modules have associated .py files. For example, the sys module is built into Python, and some modules are written in other languages (mostly C). However, much of Python's library is writtenin Python, so when we import collections we can create named tuples by calling collections.namedtuple(). The functionality that we are accessing is in the collections.py module file. It doesn't make a difference to our programs what language a module is written in because all modules are impoted and used in the same way. Here are a few syntaxes we can use to import:
import importable
import importable, importable2, ..., importableN
import importable as preferred_name
Importable is usually a module like sys, it could also be a package or a module in a package, in which case each part is separated with a dot (.), like os.path . The first two syntaxes are used throughout. They are simplest and safest because they avoid the possibility of having naming conflicts because they force us to always use fully qualified names. The third syntax lets us give a name to the package or module. This can lead to name clashes but the syntax is provided in order we need to ensure that we are not importing a module that is the same as one we have already declared (oops). Renaming is useful when experimenting with different implementations of a module. Like if we had two modules ModA and ModB that had the same API, we could write import ModA as MyModule in a program, and then later seamlessly switch to using import ModB as MyModule.
It's common practice to put all the import statements at the beginning of a .py file, right after the shebang line, and after the module's documentation. Import standard library modules first, then third party, then custom modules. Here are some moar import syntaxes:
from importable import object as preferred_name
from importable import object1, object2 ..., objectN
from importable import (object1, object2, object3)
from importable import *
These syntaxes can cause name conflicts because they make the imported objects directly accessible. If we wantto use the from ... import syntax to import a lot of objects, we can use multiple lines either by escaping each newline except the last, or by enclosing the object names in () like the third syntax shows. In the last syntax the * means import everything that isn't private. Which in practical terms means either that every object in the module is imported except for those whose names begin with a leading underscore, or, if the module has a global __all__ variable that holds a list of names, that all the objects named in the __all__ variable are imported. Because of the potential for name collisions some programming teams specify in their guidelines that only the import importable syntax may be used. However, some large packages like GUI libraries are often imported this way because they have large numbers of functions and classes that can be tedious to type out by hand.
How does Python know where to look for the modules and packages that are imported? The built in sys module has a list called sys.path that holds a list of the directories that constitute the Python path. The first directory is the one that contains the program itself, even if the program was invoked from another directory. If the PYTHONPATH environment variable is set, the paths specified in it are the next ones in the list, and the final paths are those needed to access Python's standard library. These are set when Python is installed.
When we first import a module (and its not built in) Python looks for the module in each path listed in sys.path. One consequence of this is that if we create a module or program with the same name as one of Python's library modules, ours will be found first, which surely would induce problems. In order to avoid this, never create a program or module with the same name as one of the Python library's top-level directories or modules (unless you are providing your own implementation of that module and are deliberately overriding it). A top-level module is one whose .py file is in one of the directories in the python path, rather than in one of those directories subdirectories. For example on Windows the Python path usually includes a directory called C:\Python32\Lib, so on that platform we shouldn't create a module called Lib.py, nor a module with the same name as any of the modules in the Lib directory. We can quickly check to see whether a module name is in use by trying to import it. This can be done at the console by calling the interpreter with the -c 'execute code' command-line option followed by an import statement. If we want to see if there is a module called Music.py (or a top-level directory in the python path called Music), we can type the following at the console: python -c "import Music"
If we get an ImportError exception then we know that no module or top-level directory of that name is in use. If we get any other output or no output that means the name is being used. This doesn't guarantee that the name will always be ok, since we may later on install a third party module which causes a conflict, though rarely. In here we use an uppercase letter for the first letter of custom module filenames, this allows us to avoid name conflicts (at least on Unix) because standard library module filenames are lowercase. It's possible that a program might import some modules which in turn import modules of their own. Some of these may be already imported. This won't cause problems. When a module is imported into Python it first checks to see whether it has already been imported. If it hasn't, Python executes the module's byte-code compiled code, which creates the variables, functions, and other objects it provides, and internally records that the module has been imported. At every subsequent import of the moddule Python will detect that the module has already been imported and will do nothing. When Python needs a module's byte-code compiled code, it generates it automatically (this differs from Java, where compiling to byte code must be done explicitly). First Python looks or a file with the same name as the module's .py file but with the extension .pyo (this is an optimized byte-code compiled version of the module). If there is no .pyo file (or the .pyo is older than the .py which would indicate that it is out of date), Python then looks for a file with the extension .pyc (this is a nonoptimized byte-code compiled version of the module). It loads the .pyc, or Python loads the .py file and compiles a byte-code compiled version. Either way, Python ends up with the module in memory in byte-code compiled form. In the event that Python had to compile the .py file itself, it saves a .pyc version (or .pyo if -0) was specified on the commandline, or is set in the PYTHONOPTIMIZE environment variable, providing the directory is writable. We can select not to save the byte-code by using -B in the console, or by setting the PYTHONDONTWRITEBYTECODE environment variable.
Using byte-code compiled files leads to faster start-up times since the interpreter only has to load and run the code, rather than load, compile, maybe save if possible, and run the code. Runtimes aren't affected. When Python is installed, the standard library modules are usually byte-code compiled as part of the installation process.
A package is a directory that contains a set of modules and a file called __init__.py Suppose that we had a fictitious set of module files for reading and writing various graphics file formats, like Bmp.py, Jpeg.py, Png.py, Tiff.py and Xpm.py. All of which provided the functions load(), save()... We could keep the modules in the same directory as our program, but for a large program that uses scores of custom modules the graphics modules will be dispersed. If we put them in their own subdirectory like 'Graphics', we can keep them together. If we put an empty __init__.py file in the Graphics directory along with them, the directory is now a package! boom. As long as the Graphics dir is a subdir inside our program's dir or is in the Python path, we can import any of these modules and make use of them. We must be careful to ensure that our top-level module name (Graphics) is not the same as any top-level name in the standard library in order to avoid name conflicts. Here is how we can import and use our module:
import Graphics.Bmp
image = Graphics.Bmp.load('bashful.bmp')
For short programs some devs like to use shorter names, and Python makes this possible using two slightly different approaches:
import Graphics.Jpeg as Jpeg
image Jpeg.load('doc.jpeg')
Here we imported the Jpeg module from the Graphics package and told Python we want to call it Jpeg. We can also do:
from Graphics import Png
This will import Png directly from the Graphics package. Making the Png module directly accessible. We can even do this:
from Graphics import Tiff as picture
Which has directly imported Tiff and saved it to the alias picture. We would access it now by calling picture.load(). Sometimes it's convenient to load in all of a package's modules using a single statement. In orer to do so we must edit the pacakage's __init__.py file to contain a statement which specifies which modules we want to load. This statement must assign a list of module names to the special variable __all__. For example, here is the necessary line for the Graphics/__init__.py file:
__all__ = ['Bmp', 'Jpeg', 'Png', "Tiff,"]
This is all that is required inorder to do so. We are also free to put other code in the __init__.py file. This allows us to write a different kind of import statement:
from Graphics import *
The from package import * syntax directly imports all the modules that are named in the __all__ list which is given in the __init__.py file. As noted previously, this syntax can also be applied to a module: from module import *, in such a case all the functions, variables,and other objects defined in the module (except those which are declared with a leading underscore) will be imported. In order to control exactly what is imported when the from module import * syntax is used, we can define an __all__ list in the module .py file itself. Therefore, only those objects named in the __all__ list would be made available by the * operator.
So far we've dealt with only one level of nesting, however Python allows for nesting of packages as deeply as we please. Therefore we can have a subdirectory inside the Graphics directory (say Vector), with module files inside that, such as Eps.py and Svg.py. Here is a relevant tree:
For the 'Vector' dir to be a package it must have an __init__.py file. In order to access a nested package we just build on the syntax we have already used: import Graphics.Vector.Eps
This is called a 'fully qualified' name and it is long, so some guis try to keep their module hierarchies fairly flat to avoid this: import Graphics.Vector.Svg as Svg
We can always use our own short name for a module, as we have done here, although this will increase the risk of a name conflict. All of these imports are called 'absolute' imports, meaning that every module we import is in one of the sys.path's dirs or subdirs if the import name include one or more periods which effectively serve as path separators. When creating large 'multimodule' 'multidirectory' packages, it is often useful to import other modules that are part of the same package. For example, in Eps.py or Svg.py we could get access to the Png module using a conventional import, or using a relative import:
import Graphics.Png as Png #conventional import
from ...Graphics import Png #relative import
These two snippets are equiv, they both mae the Png mod directl available inside the module in which they are used. Note the relative imports. These are imports that use the from module import syntax with leading dots in front of the module name (each dot represents stepping up one directory). This can be used only in modules that are nested inside a package. Using relative imports makes it easier to rename the top-level package and prevents accidentally importing standard modules rather than the owns which we have created inside our own packages.
Custom Modules
Because modules are just .py files, they can be created without formality. Here we look at two custom modules. Here we will see how to execute the code in docstrings as unit tests. The other module shows some techniques that are more typical of larger, more complex modules.
The TextUtil Module
The structure of this module differs little from that of a program. Firstly, we have the shebang line, then some comments. Next it is common to have a triple quoted string that provides an overview of the module's contents, sometimes including examples. This is called the modules docstring.
#!/usr/bin/env python3
# Copyright info
Here is the doctring which spans several lines and may include examples
import string
The module's docstring is available to programs (or other modules) that import the module. It is available as ProgramName.__doc__. After the moodule docstring come the imports, then the rest of the module.
When declaring a function in a module we can then use a docstring:
def simplify(text, whitespace=string.whitespace, delete=""):
"""This is a docstring in a function that can span multiple lines and may include examples of the function's use.
The function's docstring is conventially laid out with a single line description, a blank line, further description, and then some examples written as though they were typed into the console. Beacuse the quoted strings are inside a docstring we must either escape the backslashes inside them, or do what we have down before and use a raw triple quoted string.
result = []
word = ""
for char in text:
if char in delete:
elif char in whitespace:
if word:
word = ""
word += char
if word:
return " ".join(result)
The result list is used to hold 'words'(these are strings that have no whitespace or deleted characters.) The given text is iterated over character by character, with deleted characters skipped. If a whitespace character is encountered and a word is in the making, the word is added to the result list and set to be an empty string; otherwise the whitespace is skipped. Any other character is added to the word being built up. At the end a single string is returned consisting of all the words in the result list joined with a single space between each one.
I skipped over the second function in this example.py file. Nevertheless, if we want to make use of the functions in our example.py file, we have a couple options. If we put the module example.py in the same subdir as the program we want to use it in all we have to do then is import it into our other program: import example. If we want example.py to be available to all our programs, there are a few approaches that can be taken. One is to put the module in the Python distribution's site-packages subdir. This dir is in the Python path, so any module that is here will always be found. A second way is to creaste a dir specifically for the custom modules we want to use for all our programs, and to set the PYTHONPATH environment variable to this directory. A third way is to put the module in the local site-packages subdir (this is %APPDATA%\Python\Python31\site-packages) on windows and ~/.local/lib/python3.1/site-packages on Unix --this is also in the Python path.
Having our example module is great, but if we have a lot of programs using it we might want to be more confident that is works like it should. In order to ensure this we can execute the examples that we have given in the docstrings and make sure they produce the expected results. This can be done by adding three lines at the end of the example module's .py file:
if __name__ == '__main__':
import doctest
Whenever a module is imported Python creates a variable for the module called __name__ and stores the module's name in this variable. A module's name is simply the name of its .py file but without the extension. Therefore, when the module is imported __name__ will have the value 'example'. The if condition will not be met, so the last two lines will not be executed. This means that these last three lines have nearly no cost when the module is imported. Whenever a .py file is run Python creates a variable for the program called __name__ and sets its string to __main__. So if we were to run example.py as though it were a program, Python will set __name__ to '__main__' and the if condition will evaluate to True and the last two lines of code will be executed. The doctest.testmod() func uses Python's 'introspection' features to discover all the functions in the module, and their docstrings. It then attempts to execute all the docstring snippets it finds. Running a module like this produces output only if there are errors. This can be disconcerting because it doesn't look like anything happened at all, but if we pass a command-line flag of -v, we will get output like this:
is_balanced("(Python(is (not (lisp))))")
If there are functions (or classes or methods) that don't have tests, they are listed when the -v option is used. Notice that the doctest module found the tests in the module's docstring as well as those in the function's docstrings. Examples in docstrings that can be executed as tests are called doctests. Note that when we write doctests, we are able to call simplify() and the other functions unqualified (because doctests occur inside the module itself). Outside the module, assuming we have done import example we must use the qualified names such as: example.is_balanced().
The CharGrid Module
This module holds a grid of characters in memory. It provides functions for 'drawing' lines, rectangles, and text on the grid, and for rendering the grid onto the console. The CharGrid.add_rectangle() function takes at least four arguments, the top-left corner's row and column and the bottom-right corner's row and column. The character used to draw the outline can be given as a fifth argument, and a Boolean indicating whether the rectangle should be filled (with the same character as the outline) as the sixth argument. The first time we call it we pass the third and fourth arguments by unpacking the 2-tuple (width, height), returned by the CharGrid.get_size() function. By default the CharGrid.render() func clears the screen before printing the grid, but this can be prevented by passing False as we have done so here.
This module uses the sys module and the subprocess module which is covered more later. There are two error-handling policies in place. Several functions have a char parameter whose actual arg must always be a string containing exactly one character, assert statements are used in order to maintain this. If we pass out-of-range row or column numbers it is considered erroneous but normal, therefore custom exceptions are raised when this happens. Let's now review some key parts of the module's code, beginning with the custom exceptions:
class RangeError(Exception): pass
class RowRangeError(RangeError): pass
class ColumnRangeError(RangeError): pass
None of the functions in the module that raise an exception ever raise a RangeError, instead they always raise the specific exception depending on whether an out-of-range row or column was given. However, by using a hierarchy, we give users of the module the choice of catching the specific exception, or to catch either of them by catching their RangeError base class. Note also that inside doctests the exception names are used as they appear here. But if the module is imported with import CharGrid, the exception names are going to be fully qualified in order to make use of them such as: CharGrid.RangeError, and so forth.
We can define some 'private' data for internal use of the module by following the varable name with an underscore like so:
_CHAR_ASSERT_TEMPLATE = ("char must be a single character: '{0}' " "is too long")
_max_rows = 25
_max_columns = 80
_grid = []
_background_char = " "
We use underscores so that if the module is imported using 'from CharGrid import *', none of these variable will be imported. Alternatively, we could create an __all__ = [] list in which these specific variables are not included.
if sys.platform.startswith('win'):
def clear_screen():
subprocess.call(['cmd.exe', '/C', 'cls'])
def clear_screen():
clear_screen.__doc__ = """Clears the screen using the underlying \ window system's clear screen command"""
Here we have shown a platform dependent means of clearing the console screen. On windows, we have to execute the cmd.exe program with appropriate args and on most Unix systems we execute the clear program. The subprocess module's .call() function lets us run an external program, so we can use it to clear the screen in the appropriate platform specific way. The sys.platform string holds the name of the operating system the program is running on, for example 'win32' or 'linux2'. Therefore, one way of handling the platform differences is to have a single clear_screen() func like so:
def clear_screen():
command = (['clear'] if not sys.platform.startswith('win') else ['cmd.exe', '/C', 'cls'])
Here the disadvantage is that we perform the check every time the function is called. Therfore, to avoid checking which platform the program is being run on every time the clear_screen() func is called, we created a platform-specific clear_screen() func once when the module is imported, and from then on we always use it. This is possible because the def statement is a Python statement like any other; when the interpreter reaches the 'if' it executes either the first or the second def statement, 'dynamically' creating one or the other clear_screen() function which we defined in different branches (how cool!) Because the clear_screen() function is not defined inside another function (or inside a class as we will see in the next chapter), it is still a global function, which means it is accessible like any other function in the module. After creaing the function we explicitly set its docstring; this avoids having to write the same docstring in two places, it also shows us that a docstring is just one of the attributes of a function. Other attributes would include the function's module and it's name.
def resize(max_rows, max_columns, char=None):
"""Changes the size of the grid, wiping out the contents and changing the background if the background char is not None
assert max_rows > 0 and max_columns > 0, 'too small'
global _grid, _max_rows, _max_columns, _background_char
if char is not None:
assert len(char) == 1, _CHAR_ASSERT_TEMPLATE.format(char)
_background_char = char
_max_rows = max_rows
_max_columns = max_columns
_grid = [[_background_char for column in range(_max_columns)] for row in range(_max_rows)]
In this function we used asert statements to enforce the policy that it is a coding error to attempt to resize the grid smaller than 1x1. If a background char is specified an assert is used to guarantee that it is a string of exactly one character; if it is not, the assertion error message is the _CHAR_ASSERT_TEMPLATE's text with the {0} replaced with the given char string. Here it is necessary to use the global statement because we need to update a number of global variables inside this function. This is something that using an OOP approach can help us to avoid, as we'll see later. The _grid is created using a list comprehension inside a list comprehension. Using list replication such as [[char] * columns] * rows won't work here because the inner list will be shared (shallow-copied). In order to get the intended functionality we could have also used nested for...in loops:
_grid = []
for row in range(_max_rows):
for column in range(_max_columns):
Here we also show a function with its docstring:
def add_horizontal_line(row, column0, column1, char="-"):
"""Adds a horizontal line to the grid using the given char
>>> add_horizontal_line(8,20,25, '=')
>>> char_at(8,20) == char_at(8,24) == "="
>>> add_horizontal_line(31, 11, 12)
Traceback (most recent call last):
Here the docstring has two tests, one that is expected to work and the other that is expected to raise an exception. When dealing with exceptions in doctests the pattern is to specify the 'Traceback' line, since that is always the same and tells the doctest module an exception is to be expected, then we add an ellipsis to stand for the intervening lines (which vary), and ending with the exception line we expect to get, which in this case is RowRangeError. The char_at() func is one of those provided by the module. It returns the character at the given row and column position in the grid.
assert len(char) == 1, _CHAR_ASSERT_TEMPLATE.format(char)
for column in range(column0, column1):
_grid[row][column] = char
except IndexError:
if not 0 <= row <= _max_rows:
raise RowRangeError()
raise ColumnRangeError()
The code above begins with the same char length check that is used in the resize() func. Rather than explicitly checking the row and column args, the func works by assuming that the args are valid. If an IndexError occurs because a nonexistent row or column is accessed, we catch the exception and raise the appropriate module-specific exception in its place. This style of programming is known as "its easier to ask for forgiveness than it is for permission" and is considered 'Pythonic' (which is good), over 'look before you leap', where checks are made in advance. Relying on exceptions to be raised rather than checking in advance is more efficient when exceptions are rare. Using assertions doesn't count as 'look before you leap' because they should never occur, and often are commented out when its time to deploy code.
Near the end of the program we have a single call to resize(): resize(_max_rows, _max_columns)
This call initializes the grid to the default size of (25x80) and ensures that code that imports the module can safely make use of it immediately. Without this call, every time the module was imported, the importing program or module would have to call resize() to initialize the grid, which would force programmers to remember that fact and also leading to multiple initializations.
if __name__ = '__main__':
import doctest
Here we have the previously covered snippet for the doctest module which will check our doctests.
Overview of Python's Standard Library
Python's standard lib is described as 'batteries included', there are about two hundred packages and modules. There are a whole bunch of awesome Python modules. Therefore because the library only includes so many modules, the contents of that library moreso reflects the history and interests of its developers than of any concerted or systematic effort to create a 'balanced' library. Also, some modules are very difficult to maintain within the library (notably that of the Berkeley DB module). Therefore some have been taken out of the library and are now maintainted independently. This means that many excellent third party modules are available for Python that despite their quality and usefulness are not in the standard library. Of these are included the PyParsing and PLY modules reviewed later on. Here we present a broad overview of what is offered in the std lib, taking a thematic approach, excluding packages and mods that are of very specialized interest and of those which are platform specific.
String Handling
The string module gives useful constants like string.ascii_letters and string.hexdigits. It also gives the string.Formatter class which we can subclass to provide custom string formatters. The textwrap module can be used to wrap lines of text to a specified width, and to minimize indentation. The struct module gives functions for packing and unpacking numbers, Booleans, and strings to and from bytes objects using their binary representations. This is useful when handling data to be sent to or received from low level libraries written in C. The struct and textwrap modules are used by the convert-incidents.py program that is covered in chp 7. The difflib module gives classes and methods for comparing sequences, lik strings. It is able to produce output both in standard 'diff' formats and in HTML. Python's most powerful string handling module is the 're' (stands for regular expression) module. This is covered in chp 13. The io.stringIO class provides a string-like object that behaves like an in-memory text file. This is convenient if we want to use the same code that writes to a file to write to a string.
The io.StringIO Class
Python lets us write text to files in two different ways. Firstly, we can use the write() method, and secondly we can use the print() function with the file keyword arg set to a file object that is open for writing. Like so:
print('An error message', file=sys.stdout)
sys.stdout.write('Another erro message\n')
Both of these are printed to sys.stdout, which is a file object that represents the 'standard output stream' (which is usually the console) this differes from sys.stderr which is the 'error output stream' only in that the error stream is 'unbuffered'(learn more about that). Python automatically creates and opens sys.stdin, sys.stdout, and sys.stderr at program start-up (cool!). The print() func adds a newline by default (as we previously saw), however we can curtail this functionality by giving the end keyword arg to set an empty string.
Sometimes it is useful to capture into a string the output that is intended to go to a file. This can be achieved using the io.StringIO class which gives us an object that can be used like a file object, but which holds any data written to it in a string. If the io.StringIO object is given an initial string, it can also be read as though it were a file. We can access io.StringIO if we import io. We can use it to capture output destined for a file object such as sys.stdout like so:
sys.stdout = io.StringIO()
If we put this line at the beginning of a program, after the imports of course but before any use is made of sys.stdout, any text that would usually be sent to the output stream will actually be sent to the io.StringIO file-like object which this line has created and which has replaced the standard sys.stdout file object. Now, if we call the print() func and write sys.stdout.write() lines shown earlier will be executed, their output will be given to the io.StringIO object instead of the console. In order to restore this to normal we can do: sys.stdout = sys.__stdout__