by Jason Tackaberry
Wednesday August 30th,
2000
This tutorial is the first in a series that will introduce you to Python, a dynamic, object-oriented language
that is rapidly gaining popularity. Since I myself have garnered a modest Perl
background before learning Python, I will target this tutorial at the Perl
programmer. However, anyone with a little programming experience should find
this text useful.
In the interest of full disclosure, Python is my language of choice. I have a
reasonably solid background in Perl, C, and C++, but when I feel Python will do
the job, I tend to favor it. So, despite my care to the contrary, some of the
comments in this series may be subjective. If you have any strong objections,
please mail me. We might be able to spark
some interesting discussions.
In the Beginning...
Some time in the 80s, Guido van Rossum, Python's lead developer, co-authored
a programming language called ABC, geared toward teaching programming concepts.
ABC, while never really becoming very popular, was overhauled in its entirety,
and became the proud father of Python. One of Python's core principles was and
still is simplicity and elegance. More than 10 years later, Python has retained
its elegance, and has only grown in features and popularity.
...and, yes, Python was indeed named after Monty Python.
So what is Python, anyway? Python is every one of these things: interpreted,
dynamic (loosely typed), object-oriented, portable, clean, elegant, easy to
learn, powerful, embeddable, extendable, freely available, actively developed,
and widely used. Because of its simplicity and clean syntax, Python makes an
excellent first language. And because of its incredibly diverse library of
modules, it is an excellent language for experienced developers. Python is
suitable for small projects, and scales beautifully for very large projects.
Python is commonly used to create software prototypes: versions that are used
to model and prove a design, and then ported to lower-level languages or
discarded. You may be surprised that one would simply discard a prototype;
throw-away prototyping is a common practice in software engineering, and
Python's simplicity makes this practice feasible. This case study documents
Python's role in a commercial environment, used to prototype and ultimately
build a modest sized system.
Getting Down to Business
Okay, enough chitchat; let's get our hands dirty. Like Perl, Python is
interpreted, and so runs using an interpreter. Also, like Perl, Python code is
dynamically converted to byte-code before execution. This byte-code is saved to
disk so that subsequent executions need not recompile the code, unless
modifications to the source code have been made, of course.
So, our first Python program can be directly typed into the interpreter, run
through the interpreter by passing the source file as an argument, or by
prefixing the source file with a #! directive pointing to the Python
interpreter:
??#!/usr/bin/env python
Once this file is made executable, it can be run directly. The Python version
of Hello World is rather uninteresting, but in the spirit of tradition, here it
is, in all its glory:
??print "Hello World!"
A programming language is pretty useless without any form of flow control or
iteration. However, before I introduce these constructs, I should gently ease
you into one of Python's most controversial features. Whereas in C and Perl,
groups of statements (blocks) are sandwiched in braces, in Python blocks are
denoted solely by their indentation. Consider the following code in Perl:
??if ($foo == 1) {
????$foo = 4;
????print "Changed foo to 4";
????do_something_else();
??}
The same code would look like this in Python:
??if foo == 1:
????foo = 4
????print "Changed foo to 4"
????do_something_else()
In the Perl snippet, the whitespace preceding the lines in the code block is
arbitrary. I could use any number of spaces or tabs; the choice is purely
stylistic. In Python, the prefixed whitespace is mandatory. A certain degree of
style is permitted: you can use any number of whitespaces or tabs. The only
requirement is that you be consistent in your use. So, if the first line in the
block is prefixed with a tab, you had better make sure the next line is as well,
or else the interpreter will generate an error.
If you're accustomed to Perl or C, at this point you must be crying, "that's
just absurd!" I know I did. But once you start using Python you quickly become
at ease with this. And once you start weeding through thousands of lines of
code, you begin to appreciate it. It may seem absurd now, but unless you have
years of experience sorting through nests of braces, it makes the code much more
readable. 3 or 4 nested blocks in Perl will look to the beginner like Lisp looks
to us mere mortals.
The Basics
Now we know how to start a Python program, and hopefully by now have gotten
over the shock from the indentation requirement. There's a few things we need to
have under our belts before we can dive in.
First, in Python, mostly everything is an object. Strings are objects,
integers are objects, functions are objects, lists are objects, and so on. Some
objects may be acted on directly; that is, these objects have methods that may
be invoked. For example, list objects have an append method that let you append
an object to the list. Other objects, such as strings and integers, must be
operated on indirectly. So, you won't call a string's split method, but instead
will use the split function offered by the string library (module). This
idiosyncrasy has been addressed somewhat in Python 1.6 (at least, string objects
now have methods), but in general knowing what's what is just a matter of memory
work.
Python objects come in two flavors: mutable, and immutable. Immutable objects
are those objects which cannot be directly changed, such as strings or integers.
If you want to concatenate some text onto a string, you don't modify a string
object. Instead, you concatenate two string objects together and produce a new
string object. Mutable objects can be modified directly. Lists, for example, are
mutable. Adding an item to a list does not produce a new list object, it just
modifies the list you're working with.
Like Perl, Python has all the high level data types we've come to know and
love. Tuples are much like lists in an array context in Perl, except that they
are immutable. For example, (1, 2, 3, "foo", "bar", "baz") is a tuple construct.
Python lists are more like Perl lists because they are mutable. What's the point
of the tuple data type, you ask? Immutable types are easier to build internally,
they have a simpler interface, and they are much more efficient. Why incur the
overhead of a list when only a tuple is needed, such as with return values for
example? Finally, Python offers dictionaries, associative arrays which Perl
coders will know as hash tables. Let's have a look at some code that uses all
three of these data structures:
??stuff = (1, 2, 3, "foo", "bar", "baz") # this is a tuple with 6
elements
??list = [] # initialize an empty list
??ages = {
"Dick": 36, "Jane": 25, "John": 20 } # a dictionary
??# Go
through the tuple and append only the strings onto the list
??for
element in stuff:
????if type(element) == types.StringType:
??????list.append(element)
??# Now list all the people and their
ages from the ages dictionary
??for person in ages.keys():
????print
person, "is", ages[person], "years old"
This example shows us not only tuples, lists, and dictionaries, but
introduces some iteration and selection constructs. The for statement
iterates through any sequence object (either a list, tuple, or string) and
executes the code block that follows it. In the first instance, the for
loop will iterate across each of the items in the stuff tuple. The first
time through, element becomes 1, and then 2, and so on,
until it finally finishes baz. The second for loop iterates over
all the keys in the ages dictionary. The keys are those things that you
want to look up, or index, in the dictionary. So, the person variable takes on
the values of Dick, Jane, and John (in no particular
order).
The block under the first for loop shows an if construct. The
syntax of this line should be fairly intuitive, especially if you have a Perl or
C background. Equality comparisons are done using the == operator;
inequality is tested using !=; other unary, binary, bit-wise, and shifting
operators work as you would expect. The only operators Python lacks that you may
miss are the increment and decrement operators (++ and --,
respectively), and the shorthand assignment operators, like +=,
-=, and so on. I cringe every time I am forced to use a = a + 1
instead of a++; supposedly these operators were not included to improve
readability, but the jury's still out on that one. However, these augmented
assignment operators will at long last be included in Python 2.0. Comparison
operators may act on any object, and will behave differently depending on the
context. Comparisons between integers will do arithmetic comparisons;
comparisons between strings will do lexical comparisons; comparisons between
tuples will perform element-by-element comparisons; and so on. There's nothing
particularly magical about an if statement. The code block that follows
it is executed if the expression is evaluated to true.
Another iteration method you're probably familiar with is the while
statement:
??i = 10
??while i > 0:
????i = i - 1
And this does precisely what you'd expect. The while statement
evaluates the expression (in this case i > 0), and executes the
block of code that follows if it evaluates to true.
The syntax for defining functions is equally as simple. For example:
??def say_hello(who, what):
????print "Hello,", who, "! ", what
And calling say_hello("Fred", "How are you?") will output Hello
Fred! How are you?
Just in case you haven't noticed yet (I'm sure you have), variables in Python
aren't explicitly assigned types, as is the case with any loosely typed
language. The type, be it string, list, integer, or whatever, is bound to the
variable on the fly. If you're a Perl coder, this may seem a little backwards to
you. In Perl, the type of a variable is determined by the lvalue in an
expression (the part on the left side of the assignment, in this case). For
example:
??my %result = get_some_value();
Here the type of variable result is determined by lvalue of this
expression, in this case a hash table. If the return value of
get_some_value() is not a hash, then Perl will try to coerce it to one.
Also, in Perl, $result is distinct and different from %result and
@result. If you're not a Perl coder and don't know what any of this
means, don't worry. Just understand that in Python, the type of a variable is
determined by the rvalue of an expression ((the part on the right side of the
assignment). So if get_some_value() returns a tuple, the type of
result is bound to a tuple. If it returns a string, result becomes
a string type, and so on.
If you're coming from strictly a C++ background, you're about to discover the
way polymorphism was meant to be. I'm going to conveniently sidestep a heated
debate about whether static typing makes for better software engineering.
Personally, I find loosely-typed languages a pleasure to work with.
Doing Something Useful
Up until now we've mostly been looking at Python's syntax, and learning about
the basic building blocks. There's still plenty more to learn about, but at this
point we're ready to look at some code that actually does something. Let's look
at some code that reads /etc/passwd and prints out a list of users who
are in groups whose group id is greater than 100.
??import string
??pwdfile = open("/etc/passwd")
??lines =
pwdfile.readlines()
??for entry in lines:
????fields =
string.split(entry, ":")
????if int(fields[3]) > 100:
??????print
fields[0]
Only a few select functions are built into Python's core. In order to do
something useful, you'll need to use one or more modules. Python is distributed
with a standard library containing a vast number of
modules. In the first line, we import the string
module, which allows us to perform common string operations. In our example,
we're interested in the string module's split function.
After importing the string module, we then open the /etc/passwd file.
The open function is a built-in function that returns a file object. File
objects are one of the few built-in types. A second optional argument passed to
open specifies if the file should be open in read-only, or read-write
mode. In the absence of this argument, read-only is assumed. Next, we call the
readlines() function of the file object, which returns a tuple whose elements
correspond to the lines in the file. Then, in the for loop, we iterate
over each of the lines in the lines tuple, which we know corresponds to an entry
in the passwd file.
The first line in the for code block calls the string module's split
function. Perl coders will know right away what's happening here; this function
separates the given string by a separation string, in this case ":", and
returns a tuple of all the strings between (but not including) the separation
string. Now we have a tuple called fields that holds the individual
fields in the passwd entry. The group id is held in the fourth field, which is
at index 3 (indices start at 0, like in Perl or C). First we must coerce that
field to an integer (because it's a string right now), and do the comparison. If
it's greater than 100, we print the user name, which is the first field. "But
wait!" the Perl coder exclaims. Why do we have to explicitly convert the string
to an integer? In Perl, this is done for you behind the scenes. In Python, you
need to do this yourself.
A little bit of shorthand can be used in the above example. In particular, I
would write the three lines after the import statement as:
??for entry in open("/etc/passwd").readlines():
One feature of Python is that users needn't worry about freeing memory.
Internally, Python objects use reference counting as means of garbage
collection. When an object is created, its reference count is initialized to 1.
When an object is deleted, its reference count is decremented. When the
reference count reaches zero, the object is destroyed. The third tutorial in
this series will go into more detail on reference counting. It's more of an
implementation detail that you don't need to worry too much about, except that
you should be aware that it exists. In the compressed code snippet above, the
open call returns a file object that isn't being assigned anywhere. So,
internally the reference count of this object is decremented. Since no other
objects hold a reference to it, the file object is destroyed and the file is
closed.
A Classy Example
An object-oriented language like Python must provide a way to create classes
of objects. In essence, classes are a description about what methods (or member
functions) an instance of that class provides, and the semantics of those
descriptions.
First things first: we need to clear up some definitions. Python objects are
not just instances of classes. Instances are objects, but not all objects are
instances. Many of Python's built-in types are objects, but not classes. Earlier
I talked about string objects. There is no string class from which these string
objects are created. And to make matters even more confusing, classes are
objects too. So, for you C++ programmers, if I talk about an instance in Python,
I'm talking about what you'd call an object. But when I talk about an object,
this is not necessarily an instance, unless of course it's an instance object.
Clear as mud, right? This paragraph reads like Abbott & Costello's Who's on
First, I realize. Reading it once more couldn't hurt.
The simplest class definition looks like this:
??class coordinate:
????pass
The pass keyword is one we haven't seen before. This keyword is a
no-op. Where the Python syntax requires something and you don't actually want to
do anything, pass is what you want to use. You may be wondering what good
this coordinate class is? There are no methods for this class, but Python does
not require variable declarations. So, we can use this class to hold any
variable, or attribute, we want. Consider:
??plot = coordinate() # Create a new coordinate instance
??plot.x = 10
??plot.y = 5
If you're thinking this seems to be functionally similar to structs in C,
you'd be right. Empty classes provide a place to assign attributes. If we want a
3D coordinate, we can just assign a value to plot.z without any
additional work to the class definition. Let's see what a more fleshed out
version of the coordinate class might look like:
??class coordinate:
????def __init__(self, x, y, z = 0):
??????self.x, self.y, self.z = x, y, z
????def translate(self, xoff,
yoff, zoff = 0):
??????self.x = self.x + xoff
??????self.y = self.y +
yoff
??????self.z = self.z + zoff
????# rotation about the
origin
????def rotate(self, angle):
??????from math import * #
import the math module into the current scope
??????rad = angle * pi /
180 # convert to radians
??????self.x = self.x * cos(rad) + self.y *
-sin(rad)
??????self.y = self.x * sin(rad) + self.y * cos(rad)
This example introduces some new syntax to us. First, we see right away that
the coordinate class defines three methods: __init__, translate,
and rotate. The __init__ method, as you may have guessed, is
special. This is the constructor for the class -- the method that is called when
a new instance of a class is created. The z = 0 that appears in
__init__'s parameter list will be familiar to C++ programmers. It denotes
that this argument is optional, and if it is not specified, it will default to
0. Also, the first line in the body of the constructor is a tuple assignment. It
would more explicitly be written as (self.x, self.y, self.z) = (x, y, z),
which will friendlier to Perl programmers. This does an element-by-element
assignment; so self.x becomes x, self.y becomes y,
and self.z becomes z.
Draw your attention to the first parameter of each of the 3 methods. The
first parameter of any method is an instance object -- the instance from which
this method was invoked. The parameter name self is arbitrary. You could
call it this if you prefer. However, the name self is a Python
convention, and I strongly recommend against using anything else.
Scoping rules in Python will be discussed in detail in part 2 of this
tutorial. For now, it will have to suffice to say that in order to reference an
attribute of an instance, you must reference it through the instance object, or
self. So while in a C++ member function this->attr and
attr are the same (assuming attr has not been redeclared in the
member function's scope), in Python self.attr and attr are not.
Now your first instinct might be to say this is a silly way of doing it. Why not
just make a this keyword, like in C++? By passing the instance as a
parameter, we more clearly and explicitly couple a method to an instance.
Methods are also objects, and can be either bound to instances, or not bound to
anything (that is, unbound). Let's see what I mean:
??plot = coordinate(2, 2)
??print coordinate.rotate
??print
plot.rotate
Whose output is:
??<unbound method coordinate.rotate>
??<method
coordinate.rotate of coordinate instance at 80cabb0>
Here we demonstrate a neat feature of Python objects: they can represent
themselves as strings. This is quite useful for debugging purposes. The built-in
function repr returns a string containing such information; the
print statement will implicitly call repr on most objects for you.
So the first line of output tells us we have an unbound method. That is, it
isn't associated with any instance of the coordinate class. The second line
tells us it's a method of an instance, and so it is bound. The numbers at the
end of the string represent the address at which this instance is located.
Obviously it's going to be different for you.
In order to make my previous statement that "classes are objects" more
obvious, let's look at this snippet:
??print repr(coordinate)
??fooclass = coordinate
??print
repr(fooclass)
??plot = fooclass(2, 2)
??print plot
Which will output:
??<class __main__.coordinate at 80cbda8>
??<class
__main__.coordinate at 80cbda8>
??<__main__.coordinate instance at
80bf228>
The first two print statements output the same thing. And this really does
make sense, because the variable fooclass is bound to the class object
coordinate. When we call fooclass(), we're really invoking
coordinate's constructor, and the result is an instance of
coordinate.
The last piece of the puzzle is the occurrence __main__ in the output.
We'll be getting into scopes and namespaces in part 2 and we'll address this in
more detail then. For now, it's easiest to think of __main__ as the
top-most, or global scope. It's where all objects that aren't part of another
module or class end up.
An Exception to the Rule
Error handling in Python is done using exceptions. When a statement or
expression is executed, it may need to signal some sort of error message. It
does this by raising an exception. For example, suppose we try to open a file
that doesn't exist:
??file = open("/etc/password")
Executing this code will output:
??Traceback (innermost last):
????File "example.py", line 1, in ?
??????file = open("/etc/password")
??IOError: [Errno 2] No such file or
directory: '/etc/password'
Obviously it's not acceptable to have your program bail out every time it
encounters an error as simple as a non-existent file. In Python, you first
try the code, and can specify a block of code to be executed upon one,
many, or all exceptions:
??try:
????file = open("/etc/password")
??except:
????print
"Open failed!"
Now, if the open fails, it will print "Open failed!" and move on. It's
entirely possible (and likely) that an expression can generate more than one
exception. So in our example, we may just want to handle IOError exceptions. We
can do this by specifying the exception type after the except keyword:
??try:
????file = open("/etc/password")
??except IOError:
????print "Open failed!"
But there are different kinds of IOErrors; the file may not exist, or perhaps
it is not readable to the user trying to open it. Note in the exception error
generated above, the No such file or directory error number was 2.
Exceptions may have arguments associated with them, and we can fetch these
arguments like so:
??try:
????file = open("/etc/password")
??except IOError,
(errno, message):
????if errno == 2:
??????print "File does not exist,
cannot open"
????else:
??????print "Unhandled error", errno, message
The IOError is one of many built-in
exceptions. You can create your own custom exceptions, as well. In fact, the
exception may be identified by either strings or instance objects. (You can
identify an exception by a class object, but Python will just use that class
object to create an instance before raising the exception.) A simple example of
raising an exception identified by a string is:
??def bailout():
????raise "MyError", (1, 2, "x y z")
??try:
????bailout()
??except "MyError", info:
????print
"Handled MyError exception with", info
More to Come...
Now that wasn't so bad, was it? By now you hopefully have a reasonably good
feel for Python's syntax and some of the basics. If you're craving for more, Python's home page is the best starting point.
If you're an experienced Perl programmer, you're probably by now thinking,
"Okay, so what?" Part 1 is intended just as a gentle primer, so you may be left
wondering what Python can really do.
In the next part of this series, we'll explore some of the finer nuances of
Python. We'll take a look at packages, scope rules, string handling, and class
inheritance. We'll also crack open a few of the modules provided in the Python
library and work through some examples, including sockets, the XML parser,
Perl-compatible regular expressions (rejoice, Perl programmers!), and more.
After Part 2 you should have enough under your belt to tackle almost any project
in Python.
In Part 3, we're going to see how we can extend Python using the Python/C
API. A lot of Python's internals will be covered, including reference counting,
creating new Python types, and creating new class objects from C. We'll discover
that the XML parser we toyed with in Part 2 doesn't meet our performance
requirements, and create a module that uses gnome-xml to parse XML and make some
functions available to Python space for accessing data.
Who knows what's in store for Parts 4 and beyond. I may cover creating GNOME applications in Python using an
incredibly cool library called libglade. I may also talk about CORBA, and
examine a way to create CORBA objects in Python using the lightning fast CORBA
ORB ORBit. Please email me your comments and suggestions; these
tutorials are meant for you, a member of the Linux development community!
Jason Tackaberry (tack@linux.com)
works in Ontario, Canada as a Unix/Network administrator. He is the author of ORBit-Python, Python bindings
for ORBit, and several soon to be
released projects. Having over 12 years of development experience in C and
C++, and hacking with Perl for 4 years