Day 36 of 50 Days of Python: Introduction to Machine Learning Concepts

Part of Week 6: Advanced Topics

Jun 24, 2025

Welcome to Week 6 of the 50 Days of Python journey, our Advanced Topics week!

We’ve explored foundational Python, data manipulation, visualisation and real-world tooling. Now it’s time to push into more powerful, cutting-edge capabilities, starting with Machine Learning.

What is Machine Learning?

Machine Learning (ML) is about enabling computers to learn from data and make decisions or predictions without being explicitly programmed.

Simply, machine learning is like teaching through example. - Zero to Neural: Exploring AI

Instead of writing rule-based logic, ML systems learn patterns from past data. For example:

Predicting house prices
Detecting spam emails
Recommending what movie to watch next

ML typically involves these stages:

Data Collection – Gather and clean structured or unstructured data
Feature Engineering – Convert raw data into meaningful inputs
Model Selection & Training – Choose an algorithm and teach it to learn patterns
Evaluation – Measure accuracy and performance
Deployment – Put the model into production for real use

Machine Learning Concepts

Here are some essential concepts to understand:

Features: These are the input variables (columns) used by a model to learn. For example, in predicting house prices, features might include square footage, number of bedrooms, or location.
Labels: The output or target value the model aims to predict. In the house price example, the label would be the actual price.
Overfitting: This occurs when a model learns the training data too well, including noise or outliers and performs poorly on new, unseen data. It's like memorising answers instead of understanding concepts.
Model Evaluation: To measure how well a model performs, we use metrics such as:
- Accuracy: For classification tasks
- Mean Squared Error (MSE): For regression tasks
- Precision/Recall/F1 Score: When dealing with imbalanced classes

Understanding these fundamentals, sets the stage for building, tuning, and deploying reliable machine learning models.

Machine Learning with Python and PySpark

Python has become the de facto language for machine learning because of its prebuilt library ecosystem and ease of use. Whether you're prototyping locally or scaling in distributed environments, Python can help you.

Local ML: Python + Libraries

With packages like scikit-learn and TensorFlow, Python makes building and evaluating models a lot easier than you would think.

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

The great thing about these packages, is that it enables you to use prebuilt modules containing models like LinearRegression, RandomForest and many more.

Distributed ML: PySpark

When working with big data, PySpark (the Python API for Apache Spark) allows you to train models on large-scale datasets distributed across clusters:

from pyspark.ml.regression import LinearRegression

lr = LinearRegression(featuresCol="features", labelCol="label")
model = lr.fit(training_data)
predictions = model.transform(test_data)

If you're working on a laptop or a data lake cluster, Python + PySpark gives you flexibility and power. Which falls inline with typical testing strategies which is:

Make sure you can run the model locally, on a small set of data.
Once its able to run locally you can package and add to a CI/CD pipeline and deploy to your cloud environments.

Top ML Libraries in Python

I’ll be going into detail around two major libraries that you’ll encounter:

Scikit-learn

A go-to for classical ML tasks: regression, classification, clustering, and more. It works great for small to medium-sized datasets.

Easy to use
Integrates well with pandas and NumPy
Supports model pipelines, evaluation, and preprocessing

TensorFlow

Ideal for deep learning and neural networks, especially with large-scale or unstructured data like images, audio, and text.

Built by Google
Scalable to GPUs/TPUs
Used in both research and production

Together, they provide a full spectrum of tools for your ML workflow.

What’s Up Next?

This week, we’ll dive deeper into advanced Python topics with a focus on real-world ML, deployment, and automation.

Here’s what’s coming up:

Day 37: Using Scikit-Learn for Data Preprocessing and Modeling
Day 38: Deep Learning Basics with TensorFlow
Day 39: Building a Simple Neural Network in TensorFlow
Day 40: Model Deployment with FastAPI
Day 41: Working with Job Scheduling Libraries like Airflow
Day 42: Setting Up CI/CD for Python Data Projects

This is where Python starts powering real-world systems, production-ready models, APIs, pipelines, and automation.

So stay tuned as this is a good week of content, and as always… Happy coding!

Join Jonathon Kindred’s subscriber chat

Available in the Substack app and on web

Data Bytes & Insights

Discussion about this post