Jared Foy Before I was afflicted I went astray

Mapping types and dictionaries and other things too

Published December 14th, 2018 4:20 pm

This is becoming exceptionally interesting, my knowledge of computer science is expanding in general, and in Python specifically. I really think this is quite a good book.

The del statement can also be used to remove individual items. We can use the slice operator to do removing if we give the right-hand side a list as an empty literal. What is cool about the slice operator and lists is that we can stride with them to access every nth item which can often be useful. Suppose we have the list: x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] and we want to set every odd-indexed item to 0. We can access every second item by striding, like so x[::2] This gives us the items at 0,2,4... We can fix this by giving an initial starting index like so: x[1::2] The slice x[1::2] gives us a slice of the items we want, to set each item in the slice to 0 we need a list of 0's and this list must have exactly the same number of 0's as there are items in the slice. Using len(x[1::2]) gives us the number of items in this particular slice. We use the replication operator (*), to produce a list consisting of the number of 0's we needed based on the length of the stride segment we took. The interesting aspect is that when we assign the list [0, 0, 0, 0, 0, 0] to the strided slice, Python correctly replaces x[1]'s value with the first zero and so on and so forth. So our full operation with look like this:
x= [0,1,2,3,4,5,6,7,8,9,10]
x[1::2] = [0] * len(x[1::2])
In this operation Python is extracting the strided items and then building a list full of zeros that is as long as the strided list and then pumping it full of zeros. Lists can be reversed and sorted in the same way as any other iterable using the built-in reversed() and sorted() functions which are covered later. Lists also have equivalent methods list.reverse(), list.sort(), both of which work in-place (therefore they don't return anything), list.sort() accepts the same optional args as sorted(). One common idiom is to case-insensitively sort a list of strings like so woods.sort(key=str.lower) The 'key' argument is used to specify a function which is applied to each item, and the return value is used to perform the comparisons used when sorting. As noted previously, for non-English languages, sorting strings in a way that is meaningful to humans can be quite challenging. For inserting items, lists perform best when items are added or removed at the end do this this we could use list.append() and list.pop() The worst performance occurs when we search for items in a list, like using list.remove() or list.index()or using 'in' to test for membership. If fast searching or membership testing is required, a set or a dict (covered later) may be a better option. Alternatively, lists can provide fast searching if they are kept in order by sorting them, Python's sort algorithm is well optimized for sorting partially sorted lists, and using a binary search which is provided by the bisect module, to find items. Later we will create an 'intrinsically sorted' custom list class.
List Comprehensions
Small lists are often created using list literals, but longer lists are usually created programmatically. For a list of integers we can use list(range(n)), or if we just need an integer iterator range() is sufficient. For lists using a for ... in loop is very common. If we wanted to produce a list of the leap years in a given range, we might do this:
leaps = []
for year in range(1900, 1940):
if (year % 4 == 0 and year % 100 != 0) or (year % 400 == 0):
When the built-in range() func is given two integer args, 'n', and 'm' the iterator that range() returns produces the integers n,n + 1, ....,m -1 (again this needs more study). If we knew the exact range beforehand we could use a list literal like:
leaps = [1904, 1908, 1912, 1916, 1920, 1924, 1928]
A list comprehension is an expression and a loop with an optional condition enclosed in brackets where the loop is used to generate items for the list, and where the condition can filter out unwanted items. The simplest form of a list comprehension is:
[item for item in iterable]
This will return a list of every item in the iterable, and it is semantically no different than list(iterable). Two things that make list comprehensions more interesting and powerful are that we can use expressions, and we can attach a condition. This take us to the two general syntaxes for list comprehensions:
[expression for item in iterable]
[expression for item in iterable if condition]
The second of these two sytaxes is equivalent to:
temp = []
for item in iterable:
if condition:
These are really neat because they are so compact. Normally, the expression will either be or involve the 'item'. Of course, the list comprehension does not need the 'temp' variable that is needed by the 'for...in' loop version. Now we can rewrite the code to generate the leaps list using a list comprehension. We wil develop the code in three stages. First we will generate a list that has all the years in the given range:
leaps = [y for y in range(1900, 1940)]
This could also be done using: leaps = list(range(1900,1940))
Now we can add a simple condition to get every fourth year:
leaps = [y for y in range(1900,1940) if y % 4 == 0]
(make a note to study how the modulo operator is working) In order to get all the leaps years with the years 1900 and 1940 we can use a list comprehension like so:
leaps = [y for y in range(1900,1940) if (y % 4 == 0 and y %100 != 0)or(y % 400 == 0)]
I am supposing that using 'y for y' means that we are going to iterate over everything in the range. Using this list comprehension reduced the code from four lines to two, this may seem insignificant but I'm sure it adds up. Because list comprehensions produce lists which are iterables, and because the syntax for list comprehensions requires an iterable it is possible to nest list comprehensions (gee wiz). This would be the same as having nested for..in loops. For example, if we wanted to generate all the possible clothing label codes for given sets of sexes, sizes, and colors, but excluding labels for the ... other people... we could do so using nested for...in loops:
codes = []
for sex in 'MF':
for size in "SMLX"
if sex == 'F' and size == 'X':
for color in 'BGW':
codes.append(sex + size + color)
This will produce the 21 item list, however we could do the same thing using a list comprehension:
codes = [s + z + c for s in 'MF' for z in 'SMLX' for c in 'BGW' if not (s == 'F' and z == 'X')]
Here, each item in the list is produced by the expression s + z + c. Also we have used subtly different logic for the list comprehension where we skip invalid sex/size combos in the innermost loop, whereas in the nested for...in loops version skips invalid combos in its middle loop (by using the continue keyword). We can rewrite any list comprehension using one or more for...in loops. If the generated list is very large, it may be more efficient to generate each item as it is needed rather than produce the whole list at once. This can be achieved by using a generator rather than a list comprehension, which is discussed later.
Set Types
A set type is a collection data type that supports the membership operator 'in', the size function 'len()' and is iterable. Additionally, set types at least provide a set.isdisjoint() method, and support for comparisons, as well as support for the bitwise operators (which when used in sets are used for union, intersection, and more). Python provides two built-in set types: the mutable set type and the immutable 'frozenset'. When iterated, set types provide their items in an arbitrary order.
Only 'hashable' objects may be added to a set. These are objects which have a __hash__() special method whose return value is always the same throughout the object's lifetime. It can be compared for equality using the __eq__() special method. Special methods begin and end with '__'. All the built-in immutable data types like float, frozenset, int, str, and tuple are hashable, which means they can be added to sets. The built-in mutable types like dict, list, and set, can't be hashed because their hash value changes depending on the items they contain, therefore they can't be added to sets. (this is interesting because sets aren't hashable yet can contain only hashable objects). Set types can be compared using the standard comparison operators which we have previously covered. Note that although == and != have their usual meanings, with comparisons being applied item by item (and also recursively) the other comparison operators perform subset and superset comparisons, which we will see shortly.
A 'set' is an unordered collection of zero or more object references that refer to hashable objects. Sets are mutable, we can add and remove items from them, however because they are unordered they have no notion of index position, and cannot be sliced or strided. We can create a set like this:
S = {7, 'well', 'helloworld', 12, -29, ('x', 11), frozenset({8, 4, 6}), 913}
The set data type can be called as a function set(). When we give no args it returns an empty string, with a set arg it returns a shallow copy (that is if we give it an argument that is a set literal itself). If we give it an arg of anyother type, the function will attempt to convert it to a set, it accepts no more than one argument. Nonempty sets can also be created without using the set() function. However, the empty set must be created using set(). We can't just use empty braces in similar fashion as to creating lists and tuples which are empty. A set of one or more items can be created by using a comma-separated sequence of items inside braces, as we have previously shown. We can also create a set using a 'set comprehension' which I assume it similar to that of a list comprehension. Sets always contain unique items, we can add duplicate items, but it is pointless. Very interesting this bit here, check it out, these sets are all identical:
set('apple') set('aple') set('a','e','l',p)
Because of this unique characteristic of sets, they are often used to eliminate duplicates. For example, if 'x' is a list of strings, after executing x = list(set(x)), all of x's strings will be unique, and in arbitrary order. Fascinating!
Sets support len(), and fast membership testing with 'in' and 'not in'. They also provide the usual set operators.
The Union operator | puts all the same unique items together
The Intersection & gives you only the characters which are similar
The Difference / gives you that which is different in each
The Symmetric Difference ^ gives you that which is different and also that which is unique
All the 'update' methods like set.update(), set.intersection_update() accept any iterable as their argument. But the equivalent operator version |= &= require both of their operands to be sets. One common use case for sets is when we want fast membership testing. For example, we might want to give the user a usage message if they don't enter any command-line arguments, or if they enter -h or --help:
if len(sys.argv) == 1 or sys.argv[1] in {'-h', '--help'}:
Interesting, it looks like the set is composed of just -h or --help. Another common use case for sets is to ensure that we don't process duplicate data. For example, suppose we had an iterable (such as a list), containing the IP addresses from a web server's log files, and we wanted to perfrom some processing, once for each unique address. Assuming that the IP addresses are hashable and are in iterable 'ips' and that the function we want called for each one is called process_ip() and is already defined, the following code snippets will do what we want, although with subtly different behavior:
seen = set() #declaring set()
for ip in ips #for loop with membership
if ip not in seen: #if item in iterable ips is not in the set()
seen.add(ip) #add it to the set
process_ip(ip) #then call function on it
for ip in set(ips): #here we are converting the list to a set
process_ip(ip) #because sets are composed of unique items we don't need to worry about membership testing!
In the first snippet, if we haven't processed the IP address before, we add it to the seen set and process it; other we ignore it. I guess every time it adds somethings, it is going back and checking if it exists in the set already. If the membership test doesn't give a true boolean value then the item in the iterated data is ignored. For the second snippet we only ever get each unique IP address to process in the first place. The differences between the snippets are first that first snippet creates the seen set which the second snippet doesn't need, and secondly, the first snippet processes the IP addresses in the order they are encountered in the ips iterable while the second snippet processes them in an arbitrary order (fascinating!) The second snippet is easier to code, but if the ordering of the ips iterable is important we must either use the first snipper approach or change the second snipets first line to something like: for ip in sorted(set(ips)):
In theory, that second snippet might be slower if the number of items in ips is very large, since it creates the set in one go rather than incrementally. Sets are also used to eliminate unwanted items. If we have a list of filenames but don't want any makefiles included we might write:
filenames = set(filenames)
for makefile in {"MAKEFILE", 'Makefile', 'makefile'}:
First we convert the filenames iterable to a set and then assign it again to the identifier filenames. Then we do a for in loop which searches for membership of something that we will give the identifier makefile inside the iterable (is it a dictionary?) which contains various naming conventions in the form of strings. If such a membership is found, then we use the set.discard() method which excludes makefile.
Note that we can also use set.remove() to remove items, although this method raises a KeyError exception if the item it is asked to remove is not in the set.
Set Comprehensions
In addition to creating sets by calling set(), or by using the set literal, we can also create sets using set comprehensions. A set comprehension is an expression and a loop with an optional condition enclosed in braces. In similar fashion to that of list comprehensions, we have two syntaxes which are supported:
{expression for item in iterable}
{expression for item in iterable if condition}
We can use these to achieve a filtering effect:
html = {x for x in files if x.lower().endswith(('.htm', '.html'))}
Given a list of filenames in files, this set comprehension makes the set html hold only filenames that end in .html or .htm, regardless of upper or lower case. Just like list comprehensions, the iterable used in a set comprehension can itself be a set comprehension (or any other kind of comprehension), this allows for sophisticated set comprehensions to be created.
Frozen Sets
A frozen set is a set that once created cannot be changed. We can rebind the variable that refers to a frozen set to refer to something else, however. Frozen sets can only be created using the frozenset data type, it is called as a function frozenset(). If we give it no args it returns an empty literal, if the arg is a frozenset it returns a shallow copy, with any other ar it attemps to convert it to a frozenset. No more than one arg can be given. Frozensets are immutable, therefore they support only those methods and operators that produce a result without affecting the frozen set or sets to which they are applied. Frozen sets support: .copy(), .difference(), .intersection(), isdisjoint(), issubset(), .issuperset(), .union(), .symmetric_difference(). If a binary operator is used with a set and a frozen set, the data type of the result is the same as the left-handed operand's data type. In the case of == and != the order of the operands doesn't matter comparing a set and a frozenset with an equivalency will produce True if both sets contain the same items. Additionally, because frozensets are immutable they meet the hashable criterion for set items, so sets and frozen sets can contain frozen sets.
Mapping Types
A 'mapping type' is one that supports the membership operator 'in' and the size function 'len()', and is also iterable. Mappings are collections of key-value items and provide methods for accessing items and their keys and values. When iterated, unordered mapping types provide their items in an arbitrary order. Since Python 3.0 we are provided with two unordered mapping types: the built-in dict type and the standard library's collections.defaultdict type. A new ordered mapping type called collections.OrderedDict was introduced in 3.1, it is a dictionary that has the same methods and properties (ie the same API) as the built-in dict, but it stores its items in 'insertion' order. Here we use the term 'dictionary' to refer to any of these types when the difference doesn't matter.
Only hashable objects may be used as dictionary types. so immutable data types like floats,int,str,frozenset,tuple can be used as keys, but mutable types cant be used as keys. However, each key's associated value can be an object reference referring to an object of any type. Dict types can be compared using the standard equality comparisons == !=, with comparisons being applied item by item as well as recursively. Comparisons using other operators such as <,<=,>=,> are not supported since they don't make sense for unordered collections like dictionaries.
A dict is an undordered collection of zero or more key-value pairs whose keys are object references that refer to hashable objects, and whose values are object references referring to any data type. They are mutable, so we can easily add and remove items, but since they are unordered they have no notion of index pos, therefore they cannot be sliced or strided.
Side note: API stands for Application Programming Interface, it is a generic term used to refer to the public methods and properties that classes provide, and to the parameters and return values of functions and methods. For example, Python's docs contain the API's that Python provides.