Strings
Some of the sections below use The Python Shell.
>>> "Hello" # This is a string
'Hello'
>>> message = "Hello" # Store the string in a variable
>>> message # Show the value in the variable
'Hello'
String videos
Python Tutorial for Beginners 2: Strings - Working with Textual Data
Storing strings in variables
Variable naming conventions
Escape character
\
len()
functionAccessing single characters with square brackets
[]
IndexError
Substrings with slicing
String methods
.lower()
.count()
.find()
.replace()
.format()
String concatenation
f-strings
dir()
help()
String Concatenation
Concatenation is a fancy word for sticking strings together. In Python we use the addition operator to accomplish this
>>> "Hello" + "World"
'HelloWorld'
Notice how it doesn’t insert a space character automatically. You have to explicitly add a space character.
>>> "Hello" + " " + "World"
'Hello World'
Or you can just add a space in one of the strings if you are able.
>>> "Hello " + "World"
'Hello World'
If strings are stored in variables, you can treat the variables like strings and concatenate them.
>>> first = "Dave"
>>> last = "Smith"
>>> first + last
'DaveSmith'
Access a single character
Each character in a string is assigned an index location.
The characters in "Hello"
are assigned indixies as shown below.
0 1 2 3 4 # index values
H e l l o # each character in the string
Note
In programming, all index values start at 0.
To access a single character at a specific index
location, we can use bracket notation []
.
>>> message = "Hello"
>>> message[0]
'H'
>>> message[1]
'e'
>>> message[4]
'o'
>>> message[5]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
Warning
If you try to access an index value that doesn’t
exist in the string, you will get an IndexError
.
Get rear characters
Sometimes you need to grab the last, or second-last character in a string. The problem is, most of the time you don’t know exactly how long the string is. There are a couple approaches.
Use the len()
function
Imagine we have a program that asks the user to enter a word. It would
be impossible for the programmer writing the code to know exatly how
long the word will be. For this, Python comes with a bulit-in length
function len()
.
>>> message = "Hello, World!"
>>> len(message)
13
>>> message[len(message)] # essentially: message[13]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
Python knows the length of the string is 13
, using len(message)
.
When we try to use that number to access the last character, there is an
IndexError
. Because index values start at 0
, the last index value
for this string is 12
.
>>> message[len(message)-1] # essentially: message[12]
'!'
>>> message[len(message)-2] # message[11]
'd'
>>> my_string = "ABCDEFG"
>>> my_string[len(my_string)-1]
'G'
>>> my_string[len(my_string)-2]
'F'
>>> my_string[len(my_string)-3]
'E'
Negative Index
Because the pattern of [len(message)-1]
is so common,
the creators of Python have created a shortcut.
>>> my_string = "ABCDEFG"
>>> my_string[-1]
'G'
>>> my_string[-2]
'F'
>>> my_string[-3]
'E'
Substrings and slicing
We can get a substring by slicing. Slicing uses square-bracket notation, with multiple numbers.
>>> message = "Hello"
>>> message[0:3] # slice the first three characters
'Hel'
>>> message[1:4]
'ell'
>>> message[:3]
'Hel'
>>> message[2:]
'llo'
The first slice (message[0:3]
) essentially means, get a slice of
the string starting at index 0
up to (but not including) 3
.
Looking closely at the string and its index positions, we can see
how the result would be Hel
in this case:
|-------------- Start from
| |------- go up to, not including 3
| 0 1 2 | 3 4
| H e l | l o
There is another shortcut. If you leave out the first, the from index, it defaults to slicing from the beginning.
>>> some_string = "This is fun"
>>> some_string[:4]
'This'
If you leave out the second, the to index, it will slice to the end.
>>> some_string = "This is fun"
>>> some_string[8:]
'fun'
Slice by step
There is one more missing piece to Python slices. The third number, the step.
>>> alphabet = "abcdefghijklmnopqrstuvqxyz"
>>> alphabet[0:10] # the default step is 1
'abcdefghij'
>>> alphabet[0:10:1] # from 0 to (not including) 10, by 1
'abcdefghij'
>>> alphabet[0:10:2] # from 0 to (not including) 10, by 2
'acegi'
>>> alphabet[0:10:3] # from 0 to (not including) 10, by 3
'adgj'
>>> alphabet[::2] # from the beginning to the end, by 2
'acegikmoqsuqy'
>>> alphabet[::-1] # from the end to the beginning, by -1
'zyxqvutsrqponmlkjihgfedcba'
Convert to lower/upper case
Three important string methods are .lower()
, .upper()
,
and .capitalize()
.
>>> my_string = "HelLO WoRLD!"
>>> my_string.lower()
'hello world!'
>>> my_string
'HelLO WoRLD!' # Notice! The original string was not changed
>>> my_string.upper()
'HELLO WORLD!'
>>> my_string.capitalize()
'Hello world!'
Replace characters or substrings
The .replace()
method will return a copy of the string with
the first substring (if found) replaced with the second argument.
>>> song = "The Long and Winding Road"
>>> song.replace("Long", "Short")
'The Short and Winding Road'
>>> song
'The Long and Winding Road'
Notice how the original string is not altered. To alter it, just re-assign the variable to the new string:
>>> song = "The Long and Winding Road"
>>> song = song.replace("Long", "Short")
>>> song
'The Short and Winding Road'
Split a string into a list
Very useful for coding problems like Canadian Computing Competition (CCC). Notice how .split
removes the character you are splitting on. Examples are done in The Python Shell.
>>> some_string = "There's someone on the wing! Some THING!"
>>> some_string.split()
["There's", 'someone', 'on', 'the', 'wing!', 'Some', 'THING!']
>>> some_string.split("e")
['Th', 'r', "'s som", 'on', ' on th', ' wing! Som', ' THING!']
>>> some_string.split("!")
["There's someone on the wing", ' Some THING', '']
You can split on multiple characters. In this example, I am splitting
on comma-space ", "
.
>>> str_of_nums = "45, 123, 77, 323, 56"
>>> str_of_nums.split(", ")
['45', '123', '77', '323', '56']
Parsing
Sometimes you will have a string that contains valuable data as well as some unimportant things like formatting. The goal is to extract the important data from the string and leave the rest. This is called parsing the string.
Imagine we had a string like:
"x: 24, y: 35, z: 72"
We want to:
extract the 24 and place it in a variable called x.
extract the 35 and place it in a variable called y.
extract the 72 and place it in a variable called z.
For the sake of simplicity, we can assume that in every case, we will only get two-digit values for the numbers in our string. If that was not the case, we could make use of the str.index() method, or use regular expressions (advanced).
To extract the important data, we use string slicing.
formatted_info = "x: 24, y: 35, z: 72"
# index 0123456789111111111
# 012345678
x = int(formatted_info[3:5])
y = int(formatted_info[10:12])
z = int(formatted_info[17:19])
print(x) # 24
print(y) # 35
print(z) # 72
Aligning output
We can left-adjust (<
), right-adjust (>
) and center (^
) our values. Here is a brief example:
x = 0
# Left-justify
print('L {:<20} R'.format(x))
# Center
print('L {:^20} R'.format(x))
# Right-justify
print('L {:>20} R'.format(x))
The output of these examples is:
L x R
L x R
L x R
Pretty cool. We told Python to leave 20
spaces for the text we wanted to enter, and depending on the symbol we specified, we were able to change the justification of our text.
You can even specify the character you want to use instead of empty spaces.
print('{:=<20}'.format('hello'))
print('{:_^20}'.format('hello'))
print('{:.>20}'.format('hello'))
The output of this example is:
hello===============
_______hello________
...............hello
Credit: TAKE CONTROL OF YOUR PYTHON PRINT() STATEMENTS: PART 3
Aligning with f-strings
x = 0
print(f"L {x:<20} R")
print(f"L {x:^20} R")
print(f"L {x:>20} R")
Useful string methods for alignment
str.ljust()
: Left-justifystr.rjust()
: Right-justifystr.center()
: Center textstr.zfill()
: Fill with zeros on the left
>>> "hello".ljust(10)
'hello '
>>> "hello".ljust(10, ".")
'hello.....'
>>> "hello".rjust(10)
' hello'
>>> "hello".rjust(10, "-")
'-----hello'
>>> "hello".center(10)
' hello '
>>> "hello".center(10, "-")
'--hello---'
>>> str(99).zfill(5)
'00099'