8 Working with Composite Data Types
- Be able to work with lists, dictionaries, sets, and tuples
- Understand shallow and deep copies
- Understand the importance of benchmarking code
8.1 Overview of Composite data types
In Python, composite data types allow you to store multiple values in a single variable. They can hold a collection of items, which can be of different data types. The main composite data types in Python are:
Lists
Definition: Ordered, mutable collections that allow duplicate items.
Syntax: Defined using square brackets
[]
.
Tuples
Definition: Ordered, immutable collections that can also contain duplicates.
Syntax: Defined using parentheses
()
.
Dictionaries
Definition: Unordered collections of key-value pairs, where each key is unique and must be immutable.
Syntax: Defined using curly braces
{}
and using a colon:
to separate keys from values.
Sets
Definition: Unordered collections of unique items.
Syntax: Defined using curly braces
{}
or theset()
function.
Example Initialization and Differences:
Let’s initialize one of each composite data type with a list of 10 animals, and then demonstrate some of the main differences.
# Initialize a list of animals
animal_list = ["Dog", "Cat", "Elephant", "Lion", "Tiger", "Giraffe", "Zebra", "Monkey", "Snake", "Rabbit"]
# Initialize a tuple of animals
animal_tuple = ("Dog", "Cat", "Elephant", "Lion", "Tiger", "Giraffe", "Zebra", "Monkey", "Snake", "Rabbit")
# Initialize a dictionary of animals with their classifications
animal_dict = {
"Dog": "Mammal",
"Cat": "Mammal",
"Elephant": "Mammal",
"Lion": "Mammal",
"Tiger": "Mammal",
"Giraffe": "Mammal",
"Zebra": "Mammal",
"Monkey": "Mammal",
"Snake": "Reptile",
"Rabbit": "Mammal"
}
# Initialize a set of animals
animal_set = {"Dog", "Cat", "Elephant", "Lion", "Tiger", "Giraffe", "Zebra", "Monkey", "Bear", "Rabbit"}
# Show differences between data types
print("Original List:", animal_list)
print("Original Tuple:", animal_tuple)
print("Original Dictionary:", animal_dict)
print("Original Set:", animal_set)
# Mutability Demonstration
# Modifying the list (mutable)
animal_list[0] = "Wolf" # Change "Dog" to "Wolf"
print("Modified List:", animal_list)
# Attempting to modify the tuple (immutable)
animal_tuple[0] = "Wolf"
# This will raise an error
# Adding a new key-value pair to the dictionary (mutable)
animal_dict["Fox"] = "Mammal"
print("Modified Dictionary:", animal_dict)
# Attempting to add a duplicate to the set (will be ignored)
animal_set.add("Dog") # This will not change the set
print("Modified Set (after trying to add 'Dog'):", animal_set)
Remember the key differences:
Mutability
Lists: Mutable - items can be changed or updated.
Tuples: Immutable - once defined, items cannot be changed.
Dictionaries: Mutable - keys and values can be added or updated.
Sets: Mutable - can add or remove items, but cannot contain duplicates.
Order
Lists and Tuples: Ordered collections (the order of elements matters).
Dictionaries: Maintains insertion order as of Python 3.7.
Sets: Unordered - no guaranteed order of elements
Duplicates
Lists and Tuples: Allow duplicates.
Dictionaries: Keys must be unique, but values can be duplicated.
Sets: Do not allow duplicates.
8.2 Composite Data types and Memory Management
The way memory is managed is important. When coding we want to minimise memory use. One way of doing this is by not making needless duplications.
Understanding the behaviour of objects and how they are copied is crucial for managing memory and ensuring that your programs behave as expected. This is even more crucial when working with composite data types.
- In the next sections we will work with different composite data types and investigate deep and shallow copies.
We will use the id()
function and lists to demonstrate this.
The id()
function returns the unique identifier (memory address) of an object. This is an integer guaranteed to be unique and constant for the object during its lifetime.
Python Object Behavior: Objects and References
In Python, variables are references to objects. When you assign a variable to another variable, you’re copying the reference, not the actual object.
8.3 Working with Lists in Python
Lists in Python are versatile and support various operations that allow you to manipulate and interact with the data they contain. Below are some common operations you can perform on lists.
Accessing Elements
You can access list elements using indexing (zero-based).
first_element = my_list[0] # 1
last_element = my_list[-1] # 5
Slicing a List
You can retrieve a portion of a list using slicing.
sub_list = my_list[1:4] # [2, 3, 4]
Adding Elements
Append: Add an element to the end of the list.
my_list.append(6) # [1, 2, 3, 4, 5, 6]
Insert: Insert an element at a specific index.
my_list.insert(2, 2.5) # [1, 2, 2.5, 3, 4, 5, 6]
Removing Elements
Remove: Remove the first occurrence of a specified value.
my_list.remove(2.5) # [1, 2, 3, 4, 5, 6]
Pop: Remove and return an element at a specified index (default is the last element).
last_item = my_list.pop() # list is now [1, 2, 3, 4, 5]
Modifying Elements
You can change the value of an element using its index.
my_list[1] = 20 # [1, 20, 3, 4, 5]
Extending a List
You can add multiple elements to the end of the list using extend()
.
my_list.extend([6, 7, 8]) # [1, 20, 3, 4, 5, 6, 7, 8]
Sorting a List
Sort the list in ascending order.
my_list.sort() # [1, 3, 4, 5, 6, 7, 8, 20]
Reversing a List
You can reverse the order of elements in a list.
my_list.reverse() # [20, 8, 7, 6, 5, 4, 3, 1]
Finding the Length
Get the number of elements in the list.
length = len(my_list) # 8
Numeric operators e.g. Append multiple repeats
my_list * 2 # [20, 8, 7, 6, 5, 4, 3, 1, 20, 8, 7, 6, 5, 4, 3, 1]
Conclusion
Lists are a fundamental data structure in Python that allow for a variety of operations. By understanding these operations, you can effectively manage and manipulate data in your programs.
8.4 Shallow Copy verses Deep Copy
Shallow Copy A shallow copy creates a new object but inserts references into it to the objects found in the original. This means that if the original object contains other mutable objects (like lists or dictionaries), those nested objects are not copied; they are shared between the original and the copied object.
Deep Copy A deep copy creates a new object and recursively adds copies of nested objects found in the original. This means that all levels of the object hierarchy are duplicated, and changes made to the copied object do not affect the original.
You can create a deep copy using the copy
module:
import copy
original = [1, 2, [3, 4]]
deep_copied = copy.deepcopy(original)
Memory Usage
Shallow Copy: Uses less memory as it only copies references.
Deep Copy: Uses more memory because it creates full copies of all objects.
Performance
Shallow Copy: Faster due to less overhead in copying.
Deep Copy: Slower because of the recursive copying of all elements.
Overview
Understanding shallow and deep copies is essential for effective memory management and to avoid unintended side effects when working with mutable objects in Python. Choose the appropriate type of copy based on whether you need to maintain shared references to nested objects or require complete independence between copies.
8.5 Working with Dictionaries
Dictionaries are efficient ways to manage and access data. Keys must be immutable, but values can be anything.
Examples are presented below:
gene_info = {
"gene": "BRCA1",
"sequence": "ATCGGCCGTAAGCTAGCTAGCTAGC",
"function": "DNA repair",
"organism": "Homo sapiens"
}
Accessing Values
You can access a value by referring to its key.
print("Gene Name:", gene_info["gene"]) # Output: Gene Name: BRCA1
Adding a New Key-Value Pair
You can add new key-value pairs to a dictionary.
gene_info["chromosome"] = "17"
print("Updated gene info dictionary:", gene_info)
Modifying Existing Values
You can change the value of an existing key.
gene_info["function"] = "DNA repair and regulation"
print("Updated function:", gene_info["function"]) # Output: Updated function: DNA repair and regulation
Removing a Key-Value Pair
Use the del statement to remove a key-value pair.
del gene_info["organism"]
print("After removing organism:", gene_info)
Using get() Method
Use the get() method to avoid KeyError if the key does not exist.
chromosome = gene_info.get("chromosome", "Not available")
print("Chromosome:", chromosome) # Output: Chromosome: 17
8.6 Benchmarking
As there are often different methods to implement code. It is important to test the performance of different code through Benchmarking.
Memory use and how fast your code runs are two things that can be optimised.
cprofiler and timeit are both modules that can be used. Here we will use timeit to compare the dictionaries and lists.
import timeit
gene_info_list = ["BRCA1", "ATCGGCCGTAAGCTAGCTAGCTAGC", "DNA repair", "Homo sapiens", "17"]
gene_info_dictionary = {
"gene": "BRCA1",
"sequence": "ATCGGCCGTAAGCTAGCTAGCTAGC",
"function": "DNA repair",
"organism": "Homo sapiens",
"chromosome": "17"
}
# Define functions to access list and dictionary elements - we will go over this later
def access_list_element():
gene_info_list[1] # Accessing element using lists
def access_dictionary_element():
gene_info_dictionary["sequence"] # Accessing element using dictionaries
# Measure execution time
execution_time_list = timeit.timeit(access_list_element, number=100000)
execution_time_dictionary = timeit.timeit(access_dictionary_element, number=100000)
print("execution time list", execution_time_list)
print("execution time dictionary", execution_time_dictionary)
Output
execution time list 0.013154594002116937
execution time dictionary 0.009285816002375213
Try it yourself! *For extra reading you can read about optimising compilers, how they apply to python and other coding languages, and what they can and cannot do.
8.7 Sets and tuples
Although we won’t go through sets and tuples in detail, information can be found here:
https://www.w3schools.com/python/python_sets.asp
https://www.w3schools.com/python/python_tuples.asp
8.8 Summary
Composite data types are data structures that group multiple elements, potentially of different types, into a single unit. Examples include arrays, lists, tuples, and records (like structs or objects). They allow for more complex data organization by storing and managing collections of related information. There are multiple ways to implement things and as we have seen benchmarking can be an effective way to choose the best method.
- Different composite data types have different properties
- Benchmarking and understanding the different properties of the data types can be used to choose the best approach