Strings ======= Some of the sections below use :ref:`about-python-shell`. >>> "Hello" # This is a string 'Hello' >>> message = "Hello" # Store the string in a variable >>> message # Show the value in the variable 'Hello' String videos ------------- - `Python Tutorial for Beginners 2: Strings - Working with Textual Data `_ - Storing strings in variables - Variable naming conventions - Escape character ``\`` - ``len()`` function - Accessing single characters with square brackets ``[]`` - ``IndexError`` - Substrings with slicing - String methods - ``.lower()`` - ``.count()`` - ``.find()`` - ``.replace()`` - ``.format()`` - String concatenation - f-strings - ``dir()`` - ``help()`` String Concatenation -------------------- *Concatenation* is a fancy word for sticking strings together. In Python we use the addition operator to accomplish this >>> "Hello" + "World" 'HelloWorld' Notice how it doesn't insert a space character automatically. You have to explicitly add a space character. >>> "Hello" + " " + "World" 'Hello World' Or you can just add a space in one of the strings if you are able. >>> "Hello " + "World" 'Hello World' If strings are stored in variables, you can treat the variables like strings and concatenate them. >>> first = "Dave" >>> last = "Smith" >>> first + last 'DaveSmith' Access a single character ------------------------- Each character in a string is assigned an index location. The characters in ``"Hello"`` are assigned indixies as shown below. :: 0 1 2 3 4 # index values H e l l o # each character in the string .. note:: In programming, all index values start at `0`. To access a single character at a specific index location, we can use bracket notation ``[]``. >>> message = "Hello" >>> message[0] 'H' >>> message[1] 'e' >>> message[4] 'o' >>> message[5] Traceback (most recent call last): File "", line 1, in IndexError: string index out of range .. warning:: If you try to access an index value that doesn't exist in the string, you will get an ``IndexError``. Get rear characters ^^^^^^^^^^^^^^^^^^^ Sometimes you need to grab the last, or second-last character in a string. The problem is, most of the time you don't know exactly how long the string is. There are a couple approaches. Use the ``len()`` function ************************** Imagine we have a program that asks the user to enter a word. It would be impossible for the programmer writing the code to know exatly how long the word will be. For this, Python comes with a bulit-in length function ``len()``. >>> message = "Hello, World!" >>> len(message) 13 >>> message[len(message)] # essentially: message[13] Traceback (most recent call last): File "", line 1, in IndexError: string index out of range Python knows the length of the string is ``13``, using ``len(message)``. When we try to use that number to access the last character, there is an ``IndexError``. Because index values start at ``0``, the last index value for this string is ``12``. >>> message[len(message)-1] # essentially: message[12] '!' >>> message[len(message)-2] # message[11] 'd' >>> my_string = "ABCDEFG" >>> my_string[len(my_string)-1] 'G' >>> my_string[len(my_string)-2] 'F' >>> my_string[len(my_string)-3] 'E' Negative Index ************** Because the pattern of ``[len(message)-1]`` is so common, the creators of Python have created a shortcut. >>> my_string = "ABCDEFG" >>> my_string[-1] 'G' >>> my_string[-2] 'F' >>> my_string[-3] 'E' Substrings and slicing ---------------------- We can get a substring by slicing. Slicing uses square-bracket notation, with multiple numbers. >>> message = "Hello" >>> message[0:3] # slice the first three characters 'Hel' >>> message[1:4] 'ell' >>> message[:3] 'Hel' >>> message[2:] 'llo' The first slice (``message[0:3]``) essentially means, get a slice of the string starting at index ``0`` up to (but not including) ``3``. Looking closely at the string and its index positions, we can see how the result would be ``Hel`` in this case:: |-------------- Start from | |------- go up to, not including 3 | 0 1 2 | 3 4 | H e l | l o There is another shortcut. If you leave out the first, the *from* index, it defaults to slicing from the beginning. >>> some_string = "This is fun" >>> some_string[:4] 'This' If you leave out the second, the *to* index, it will slice to the end. >>> some_string = "This is fun" >>> some_string[8:] 'fun' Slice by step ^^^^^^^^^^^^^ There is one more missing piece to Python slices. The third number, the *step*. >>> alphabet = "abcdefghijklmnopqrstuvqxyz" >>> alphabet[0:10] # the default step is 1 'abcdefghij' >>> alphabet[0:10:1] # from 0 to (not including) 10, by 1 'abcdefghij' >>> alphabet[0:10:2] # from 0 to (not including) 10, by 2 'acegi' >>> alphabet[0:10:3] # from 0 to (not including) 10, by 3 'adgj' >>> alphabet[::2] # from the beginning to the end, by 2 'acegikmoqsuqy' >>> alphabet[::-1] # from the end to the beginning, by -1 'zyxqvutsrqponmlkjihgfedcba' Convert to lower/upper case --------------------------- Three important string methods are ``.lower()``, ``.upper()``, and ``.capitalize()``. >>> my_string = "HelLO WoRLD!" >>> my_string.lower() 'hello world!' >>> my_string 'HelLO WoRLD!' # Notice! The original string was not changed >>> my_string.upper() 'HELLO WORLD!' >>> my_string.capitalize() 'Hello world!' Replace characters or substrings -------------------------------- The ``.replace()`` method will return a *copy* of the string with the first substring (if found) replaced with the second argument. >>> song = "The Long and Winding Road" >>> song.replace("Long", "Short") 'The Short and Winding Road' >>> song 'The Long and Winding Road' Notice how the original string is not altered. To alter it, just re-assign the variable to the new string: >>> song = "The Long and Winding Road" >>> song = song.replace("Long", "Short") >>> song 'The Short and Winding Road' Split a string into a list -------------------------- Very useful for coding problems like :ref:`ccc/index:canadian computing competition (ccc)`. Notice how ``.split`` removes the character you are splitting on. Examples are done in :ref:`about-python-shell`. >>> some_string = "There's someone on the wing! Some THING!" >>> some_string.split() ["There's", 'someone', 'on', 'the', 'wing!', 'Some', 'THING!'] >>> some_string.split("e") ['Th', 'r', "'s som", 'on', ' on th', ' wing! Som', ' THING!'] >>> some_string.split("!") ["There's someone on the wing", ' Some THING', ''] You can split on multiple characters. In this example, I am splitting on comma-space ``", "``. >>> str_of_nums = "45, 123, 77, 323, 56" >>> str_of_nums.split(", ") ['45', '123', '77', '323', '56'] Parsing ------- Sometimes you will have a string that contains valuable data as well as some unimportant things like formatting. The goal is to extract the important data from the string and leave the rest. This is called *parsing* the string. Imagine we had a string like:: "x: 24, y: 35, z: 72" We want to: - extract the `24` and place it in a variable called `x`. - extract the `35` and place it in a variable called `y`. - extract the `72` and place it in a variable called `z`. For the sake of simplicity, we can assume that in every case, we will only get two-digit values for the numbers in our string. If that was not the case, we could make use of the `str.index() `_ method, or use `regular expressions `_ (advanced). To extract the important data, we use string slicing. .. code-block:: python formatted_info = "x: 24, y: 35, z: 72" # index 0123456789111111111 # 012345678 x = int(formatted_info[3:5]) y = int(formatted_info[10:12]) z = int(formatted_info[17:19]) print(x) # 24 print(y) # 35 print(z) # 72 Aligning output --------------- We can left-adjust (``<``), right-adjust (``>``) and center (``^``) our values. Here is a brief example: .. code-block:: python x = 0 # Left-justify print('L {:<20} R'.format(x)) # Center print('L {:^20} R'.format(x)) # Right-justify print('L {:>20} R'.format(x)) The output of these examples is:: L x R L x R L x R Pretty cool. We told Python to leave ``20`` spaces for the text we wanted to enter, and depending on the symbol we specified, we were able to change the justification of our text. You can even specify the character you want to use instead of empty spaces. .. code-block:: python print('{:=<20}'.format('hello')) print('{:_^20}'.format('hello')) print('{:.>20}'.format('hello')) The output of this example is:: hello=============== _______hello________ ...............hello Credit: `TAKE CONTROL OF YOUR PYTHON PRINT() STATEMENTS: PART 3 `_ Aligning with f-strings ^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python x = 0 print(f"L {x:<20} R") print(f"L {x:^20} R") print(f"L {x:>20} R") Useful string methods for alignment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - ``str.ljust()``: Left-justify - ``str.rjust()``: Right-justify - ``str.center()``: Center text - ``str.zfill()``: Fill with zeros on the left >>> "hello".ljust(10) 'hello ' >>> "hello".ljust(10, ".") 'hello.....' >>> "hello".rjust(10) ' hello' >>> "hello".rjust(10, "-") '-----hello' >>> "hello".center(10) ' hello ' >>> "hello".center(10, "-") '--hello---' >>> str(99).zfill(5) '00099'