This chapter discusses how to store data, and how to compute values from data. The topics are variables, memory, data types, expressions, and operators.
It’s useful to be able to store values in the computer’s memory for later use. A variable is a name that refers to a location in the computer’s memory where a piece of information can be stored. A simple example:
The statement meaning_of_life = 42
is called a
variable assignment and it causes Python to do three
things:
meaning_of_life
to that
location.Later, when the value of the variable is needed (in the
print( meaning_of_life )
statement) the central processing
unit of the computer (sometimes called the CPU) determines the location
in memory with which the name meaning_of_life
is
associated, goes to that location, and fetches the value it finds there,
making that value available to Python. Because this value is used in a
print
statement, Python prints the value 42.
In Python, each variable must be assigned a value before that variable can be used.
Strings have quotation marks around them so that it is clear that the string is to be considered a string value, and not a variable name. What does the following code print? Is there an error in the code?
We’ll see soon how to use variables to compute and store values. But
one important use of variables is to change the behavior of code. First,
we write the code in terms of some variables. Changing the values of
those variables then changes the behavior of the code. The example below
shows how this works. First, run the code. Then change the value of
x
to 60
and y
to 120
and run the code again. How does the behavior of the program change?
Objective: Use variables to allow behavior of code to be changed easily.
The code below will draw a smiley face on the screen, and you can use
the variables x
and y
to change where the
smiley face is drawn. But the smiley face is always the same size. Add a
new variable, scale
, that allows you to change the size of
the smiley face to make the face either larger or smaller. For example,
if scale
had the value 2
, then the code would
draw the smiley face twice as large (but still centered on
x
and y
).
Test your code by changing variable values a few times to draw the smiley face at different locations at both small and large scales.
Variable names are sequences of letters, digits, and underscores. The
first letter cannot be a digit, and variable names cannot contain
spaces. Python is case-sensitive: uppercase and
lowercase letters are considered to be different characters in variable
names, so that the names Earth
and earth
refer
to different variables. You should choose variable names that are
descriptive of the value that will be stored.
By convention, letters appearing in variable names in Python are all
lowercase. If you want to make up a variable name from multiple words,
use the underscore character _
to replace spaces, as in the
variable name meaning_of_life
. (Some programmers favor
“camel case”: meaningOfLife
. This is not the time to strike
a bold blow for independence – you should follow Python conventions in
your Python code, which includes using underscores to separate words in
variable names.)
The equals sign used to assign a value to a variable is not the same as the equals sign you see in mathematics.
In mathematics, the equals sign is used to write down facts: the expression or variable on the left-hand side of the equation is now, and always will be, equal to the expression or variable on the right-hand side.
In mathematics, x = 5 is a fine equation. So is x = 6. But if I gave you both equations, you’d say I screwed up, because then x would equal both 5 and 6, and it just can’t. But in Python, the following works just fine:
The =
operator in Python does not mean
“mathematically equals.” In Python, the assignment
operator, written with an equals sign, does several things:
So the first line of code copies the value 5 into the variable
x
. The second line of code copies the value 6 into the
variable x
. When the program is finished, x
has the value 6.
Here are a few statements that wouldn’t work in mathematics, but do work in Python:
In mathematics, x = x + 1 would mean something like “x is the number that is one greater than itself”. That sounds like something that Captain Kirk used to make a computer explode in Star Trek. There is no such number.
In Python, there’s no problem. Evaluate the
expression on the right-hand side. Fine—because x
has the
value 5, we know that x + 1
evaluates to 6.
Assign that value to the variable on the left-hand
side. Fine—now the variable x
has the value 6. The computer
does not explode.
When you see or write an equals sign in Python, do not think “mathematically equals.” Say in your mind, “assignment operator.” Compute the value on the right-hand side. Put the computed value into the variable on the left-hand side.
Will the code 5 = x
work in Python? No. The left-hand
side operand of the assignment operator must be a variable
name. 5 is not a variable name.
Memory in a computer is basically a long sequence of 0s and 1s, or binary digits: bits. You can think of a very long row of on/off switches. Since each switch value doesn’t give much information (only a 0 or a 1), it’s useful to refer to groups of bits. A group of eight sequential bits is called a byte.
On a Mac or PC, each byte has its own address, a number associated with it that we can use to refer to that byte. The byte at address 0 is the first byte in memory; the byte at address 1 is the second, and so forth. (Computer scientists often start counting at 0. Believe it or not, we often find it easier to start counting at 0 than starting at 1.)
Wait – how do we refer to a location in memory? Do we use variable names or addresses? The answer is both. You have a Hinman box number. I can refer to your mailbox directly by number (address), or I can use your name to refer to the box, assuming that I have a directory that tells me what box number is associated with your name. Python acts as the directory that keeps track of the correspondence between variable names and memory addresses where the contents of the variables can be found.
I just said that memory is a string of 0s and 1s. How in the world are we going to represent the number 42 by a string of 0s and 1s?
Python uses a special code, relying on how to represent numbers in base 2, or binary. Let’s not worry too much about how to represent numbers in binary, but I’ll just tell you that 42 in base 2 is 101010. Here are various integers in binary:
Integer | Binary |
---|---|
6 | 110 |
18 | 10010 |
42 | 101010 |
90 | 1011010 |
999 | 1111100111 |
The first thing to notice about these binary representations is that
their lengths differ. The integer 6 needs only three bits, but the
integer 999 needs ten bits. To be safe, Python allocates a fixed number
of bytes of space in memory for each variable of a normal integer type,
which is known as int
in Python. Typically, an integer
occupies four bytes, or 32 bits. Integers whose binary representations
require fewer than 32 bits are padded to the left with 0s.
Let’s say you had only one byte of memory. How many different patterns of 0s and 1s can represent integers in eight bits? Let’s count them:
00000000
00000001
00000010
00000011
00000100
00000101
...
11111011
11111100
11111101
11111110
11111111
It looks like there are 28 = 256 different patterns. I could use my one byte to represent 256 unique integer numbers, because each integer would need its own bit pattern. If I had two bytes, I could represent 216 = 65,536 different integer numbers.
With four bytes (the usual amount of memory allocated to each
int
variable), we could store 232 different integer numbers. If
the leftmost bit is a 1, the number is construed as negative. If the
leftmost bit is a 0, then the number is construed as either 0 (if all
the bits are 0) or positive (if the leftmost bit is 0 but there’s at
least one 1 somewhere). So we expect half of the int
values
to be negative, one of them to be 0, and the rest to be positive.
Therefore, we expect the largest positive int
to be 231 − 1, or 2,147,483,647, and the
most negative integer to be − 231, or -2,147,483,648.
Note: When you are typing in large integers, whether
it’s as part of your program, as console input, or as input anywhere
else, do not include commas. Do not type
100,000
; instead, type 100000
. I included the
commas above so that you could easily see that two bytes gives us a
little more than 65,000 different numbers and four bytes gives us
integer values with magnitudes of 2 billion and change.
If you want to store a number larger than 2,147,483,647, Python can
do it, using the long int
data type. Rather than allocating
a fixed four bytes of memory for a long int
variable,
Python decides how many bytes to assign based on the actual size of the
number. Larger integers will require more memory, since the shortest
representations (with the fewest bits) are assigned first to numbers
closer to 0. In addition to the memory cost, computations with long ints
are much slower than computations with ints.
In Python, every datum has a type. But we don’t have to say what the type is; Python can figure it out for itself.
The floating-point type is like scientific notation, e.g., 6.02 × 1023. Since you can’t type a superscript in plain text for Python, if an exponent on the 10 is needed, you would write it like this:
Notice that we omit the 10, but it’s understood; 6.02e23 is not 6.0223 (6.02 raised to the power 23), but instead is 6.02 × 1023.
Floating-point numbers are stored with three parts:
The floating-point type (or “float type” for short) is used for
numbers that have fractional parts or are too large to store in a
long int
that takes up a reasonable amount of memory.
Typically, eight bytes are used for the Python floating type. Notice
that this means that there are only 264 different
floating-point numbers that can be represented. This might seem like a
lot of numbers, but remember that the real number line has infinitely
many numbers. Floating-point numbers, therefore, allow only limited
precision.
The difference between 0.01 and 0.02 is, relatively, a lot (100%), but the relative difference between (6.02e23 + 0.01) and (6.02e23 + 0.02) is not a lot, at least compared with the size of the numbers involved.
For this reason, the 264 floating-point numbers are not evenly distributed on the real number line. More of them are allocated near 0 than near numbers with larger magnitudes. When you type in a number with a decimal point, or create one through some mathematical operation, the computer finds a floating-point number close to the correct number. Small fractions are likely to be lost in this rounding process for real numbers that are very large, since the nearest floating-point number that the computer can represent may be relatively quite far away.
You can use Python to compute. The program
The plus-sign is called an operator. An operator takes one or more operands, computes a result, and makes that result available to Python for further use.
In this example, the operand on the left is 18, and the operand on
the right is 24. An expression is an operator and its
operands; we say that the expression can be evaluated
to give a single value. In our example, the expression is
18 + 24
; when evaluated, this expression’s value is 42.
An expression can be an operand to an operator:
Here, the expression (3 * 6)
is the left operand to the
operator +
.
The character *
denotes multiplication in Python, to
avoid confusion with the letter x
. If Python needs the
value of an expression, Python computes that value. In this example, the
value of the expression (3 * 6)
is needed before the
addition can be done, and so the value 18
is computed
first, by the multiplication operator *
. That value, 18,
can then be used as an operand to the addition operator.
Arithmetic operators such as +
, -
,
*
, and /
(division) follow the same order of
operations as you are used to from mathematics. *
and
/
have higher precedence than
+
and -
, meaning that they are evaluated
first. Operators with the same precedence are evaluated left to right.
Parentheses make the order of operations explicit. When in doubt, use
parentheses to make your code as easy to read as possible.
For example:
There are two types of division, and each is sometimes useful. Integer division takes two integers and evaluates to an integer. Floating-point division takes two floating-point numbers (numbers with a decimal point) and evaluates to a floating-point number.
Remember word problems like this? At the Lake Morey Skate-athon, I skate around an oval ice track that is four miles long. I carry a card, and for every full lap I skate, I get my card stamped. If I start at 12:00 noon and skate at 10 miles per hour, how many stamps will I have at 3:00 pm? At 3:00 pm, I will have skated 3 × 10 miles, or 30 miles. Dividing 4 into 30 gives 7 laps, and hence 7 stamps, with half a lap (2/4 of a lap) left over.
In Python version 2, the program:
print( (3 * 10) / 4 )
gives the result ‘7’: integer division. In Python 3, the operator
/
indicates floating-point division, and the same code
would print the value 7.5. If you want integer division in Python 3, you
must use the operator `//’.
Using integer division when you want floating-point division is a very common mistake in Python 2. The best solution is to start your code with the statement:
from __future__ import division
+
,
-
, *
, /
, %
The operators defined on integer and floating-types types are
+
: addition-
: subtraction*
: multiplication/
: division, either integer or floating, as just
discussed%
: modulus (usually just called “mod”), which gives the
“remainder.” 9 % 4 equals 1.The mod operator in Python is unusual in that it can take
floating-point operands. (Most other programming languages that support
a mod operator insist on only integer operands.) For example,
8.0 % 2.5
evaluates to 0.5
, because 2.5 goes
into 8 three times, with 0.5 left over.
There are many different types of data to store. Integers are one
type; letters of the alphabet are another. The “string” type of data
represents one or more letters of the alphabet, symbols (such as
@
or ~
), or digits.
When you type a string value into Python, it must be surrounded by quotes, so that the string does not look like the name of a variable.
We call "Z"
a string literal, since it
should be interpreted by Python literally as the character “Z” and not
as some variable name or anything else. (In x = 42
, the
number 42 is an integer literal.)
The quotes say that “hello” is a string, and not a variable name. More examples:
The quotes just identify the data as a string; the quotes aren’t part of the string.
Python is unusual in that you can use either single quotes or double quotes around the string, as long as you use the same kind of quotes before and after a given string. So the following lines do the same thing:
But print( "hello' )
would be an error.
The plus sign (“+
”) behaves differently depending on the
types of the data that are on either side of it. If there are ints on
either side, the plus sign is the integer-addition operator, and it adds
the two ints to get another int. If there are floating-point numbers on
each side, they are added to get another floating-point number.
What if there are strings on each side? Then the plus sign is the string-concatenation operator. Concatenation means to “combine two strings together”.
Given that the memory of a computer stores only sequences of 0s and 1s, then how does Python store a string in memory? Python uses a code to convert a string into a binary representation of the string when the string is stored. When the string is retrieved, Python uses the code in reverse to convert back from binary into the string.
Which code does Python use? Python version 2.7 (the version we use
for this class), uses a popular code called the American Standard
Code for Information Interchange, or ASCII. Each
character (string of length 1) uses eight bits, or one byte. For
example, the ASCII code for the character A
is 01000001,
and the ASCII code for a
is 01100001.
If you’d like to see a full table of ASCII character codes, click here. You rarely need to know a character’s ASCII code.
There are a few special functions that convert between types of data:
int
, float
, str
are useful for
the types of data we’ve seen so far.
Sometimes conversions are performed automatically by Python behind the scenes, much as what happens when one of the operands for division is a floating-point number and the other is an int. For example, if you try to add a float and an int, the int is converted to a float before addition:
There’s one another type of data, called a boolean
.
There are only two possible values: True
and
False
. Notice that these are capitalized.
We’ve already discussed comments, which are one tool to make reading and understanding programs easier. Another way to improve understandability is to use meaningful names for variables. For instance, consider the following code:
What does it do? We could tell by running it, or we could add
comments. Or we could use more descriptive names than x
,
y
, and z
.
Without using comments, we’ve improved the understandability of the code considerably. This is not to say that comments should be jettisoned completely in favor of meaningful names! Rather, the two strategies work together.
Next, consider this code, with meaningful names:
It’s easy to see what it does because of the names we’ve chosen. But it could be better, especially the constant floating-point values we’ve stuck in there. Presumably, we’re confident in our ability to calculate 4.0 * 3.0 / pi (notice that I have to make sure that at least one of 4 and 3 is a float so that I don’t get integer division), but if we make a mistake, it’s going to be very difficult to track down.
Here, we’ve replaced the number 3.14 with a variable pi
and wrote out 4.0 / 3.0 instead of precomputing its value.
Some constants are so widely-used that Python defines them for us. For instance:
An added benefit of using Python’s pi
is that the Python
designers have gone to the trouble to calculate a much more precise
approximation of π than we
did.
A function is itself a data type in Python. You can think of the name of the function as a variable that contains the address of the function’s lines of code.
What this means is that other variables can also store a reference to the function. Here is an example:
We have already seen another example. We wrote the
draw_house
function, and then gave that function to the
start_graphics
function. The start_graphics
function was then able to use draw_house
when it needed it.
Slick! The type of the data contained in draw_house
is a
“reference to a function.”