So, we are nearly 1/5 of the way through this Python series! Today, we’re going to focus on sets, strings and ranges. These are very versatile tools that play a big role in problem-solving and managing data efficiently.
Sets: Unique and Unordered Collections
A set is a collection of unique and unordered elements. Unlike lists or tuples, sets automatically discard duplicate values and they don’t preserve the order of elements.
Key Features of Sets
Unique Elements: Duplicates are automatically removed.
Unordered: There’s no concept of indexing.
Mutable: You can add/remove items, like lists and dictionaries.
Creating sets:
# Set of unique integers
numbers = {1, 2, 3, 4, 4, 5}
print(numbers) # Output: {1, 2, 3, 4, 5}
# Empty set
empty = {}
# Creating a set from a list
set_from_list = set([1, 2, 3])
Adding and removing items:
numbers.add(6) # Adds 6 numbers.remove(3) # Removes 3 (Errors if 3 doesn't exists) numbers.discard(10) # Removes 10 (However, no error if 10 doesn't exist)
Set operations:
evens = {2, 4, 6} odds = {1, 3, 5} print(numbers.union(evens)) # Combines sets print(numbers.intersection(odds)) # Common elements print(numbers.difference(odds)) # Elements in numbers not in odds
Strings: Sequence of Characters
A string is a sequence of characters enclosed in quotes. Strings are essential for handling text data in Python
Key Properties of Strings
Immutable: You can’t modify a string “directly”.
Ordered: You can access characters by index.
Creating strings:
name = "Python"
sentence = 'This is also a string.'
mutli_line = """ This is string notation that can span
mutiple lines in your code."""
Accessing characters:
print(name[0]) # P print(name[-1] # n
Slicing strings:
print(name[0:3] # Pyt print(name[::2] # Pto (Every second character)
Methods to manipulate strings:
print(name.lower()) # python print(name.upper()) # PYTHON print(name.replace("P", "J") # Replace P with J
Checking substrings:
print("Py" in name) # Checks if Py is in name, outputs: True
Ranges: Generating Sequences of Numbers
The range() function generates a sequence of numbers. While it’s normally used in loops, ranges are also useful for managing ordered, numerical data.
Creating ranges:
r1 = range(5) # 0 to 4
r2 = range(1, 10, 2) # 1 to 9, steps of 2
Iterating through a range:
for i in range(1, 6): print(i) # Outputs: 1,2,3,4,5
Convert to a list:
numbers = list(range(3, 9)) print(numbers)
Use Cases for Data Engineers
Sets: Great for identifying unique values or performing operations like union or interaction on datasets
Strings: Used for parsing, cleaing and transforming raw text data. As well as setting configurations like environment →
env = “Prod”
.Ranges: Often helpful when creating test datasets or lopping over numerical sequences.
Hands on Exercises
Create a set of numbers from 1 to 5, add the number 6, and remove 3.
Assign a sentence to a variable called sentence, convert it to uppercase, remove all spaces and check if a word is in your sentence.
Create a list of even numbers from 2 to 20 using range().
Any issues let me know in the comments or drop what you did within a comment so I can see how you’re all progressing!
Next Up: Day 10 - List Comprehensions
Tomorrow we’ll explore list and dictionary comprehensions to quick and efficiently create and manipulate data structures!
Keep practicing what you’ve learnt in the meantime, and remember…. Happy coding!