Supervised, Unsupervised, and Reinforcement Learning: Practical Guide • Meteora Web Agency

You have a dataset with thousands of rows and need to make decisions. Classify customers? Predict sales? Group products without knowing the categories? Each problem requires a different approach. We at Meteora Web have seen companies waste time on wrong algorithms because they didn't understand the difference between supervised, unsupervised, and reinforcement learning. In this guide we explain how they work, with ready-to-run code examples.

Supervised Learning — When You Have the Right Answers

Supervised learning works with labeled data: for each example you know the target (e.g., house price, email category). The algorithm learns to map inputs to outputs. Two main types: regression (continuous values) and classification (discrete values).

Regression: Predicting a Number

Imagine estimating house prices based on square footage, rooms, year built. Use a linear model. With scikit-learn it's a few lines:

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import numpy as np

# Sample data: area (sqft) and price ($)
X = np.array([[50], [80], [120], [150], [200]])
y = np.array([100000, 160000, 240000, 300000, 400000])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
print(f'Estimated price for 100 sqft: {model.predict([[100]])[0]:.2f} $')

Common mistake: using linear regression when the relationship is non-linear. Always check residuals and try polynomial models if needed. Key metric: Mean Absolute Error (MAE) or R². Don't rely solely on R², examine error distribution.

Classification: Assigning a Category

A typical case: classifying emails as spam or not spam. Random Forest is robust and handles missing data well. Example on a synthetic dataset:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.metrics import classification_report

X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))

Watch out for class imbalance: if you have 95% 'not spam' and 5% 'spam', a dumb baseline achieves 95%. Always use a confusion matrix and metrics like precision, recall, F1-score.

What to do now: Take a small dataset from your domain (e.g., e-commerce sales) and try linear regression. Evaluate the error. Then move to a more complex model (Random Forest) and compare.

Unsupervised Learning — When You Have No Labels

The data has no predefined answer. You want to discover hidden patterns: customer segments, anomalies, product groupings. Two fundamental techniques: clustering and dimensionality reduction.

Clustering: Group Without Knowing the Groups

K-means is the most common algorithm, but beware: you must specify the number of clusters (k) a priori. A classic mistake is setting k too high or too low. Use the Elbow method to find the optimal k. Code:

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

X, _ = make_blobs(n_samples=300, centers=4, random_state=42)
inertias = []
for k in range(1, 11):
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(X)
    inertias.append(kmeans.inertia_)

plt.plot(range(1, 11), inertias, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
plt.show()

If your data doesn't have spherical clusters, K-means fails. Try DBSCAN which doesn't require k and handles outliers. For high-dimensional data, reduce dimension first with PCA.

Dimensionality Reduction: Seeing the Forest

PCA (Principal Component Analysis) transforms features into components that capture maximum variance. Useful for visualizing data in 2D/3D or as preprocessing for other models.

from sklearn.decomposition import PCA
import numpy as np

X = np.random.rand(100, 10)  # 100 samples, 10 features
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
print(f'Explained variance: {pca.explained_variance_ratio_.sum():.2%}')

What to do now: Collect a set of unlabeled data (e.g., site navigation logs) and apply K-means with the elbow method. Interpret the clusters: do they represent user segments? If so, you can personalize the experience.

Reinforcement Learning — When the Agent Learns from Mistakes

In reinforcement learning, an agent interacts with an environment, receives rewards, and learns a policy to maximize cumulative reward. It's different: no static data, but a trial-and-error process. Used in robotics, games, process optimization.

Q-learning: The Tabular Base

For small discrete state and action spaces, Q-learning is simple and effective. Example on a 3x3 Gridworld where the agent starts top-left and must reach the goal bottom-right avoiding a trap:

import numpy as np

# Environment 3x3, states 0..8, actions 0=up,1=down,2=left,3=right
n_states = 9
n_actions = 4
Q = np.zeros((n_states, n_actions))

# Reward: -1 per step, -10 for trap (state 4), +10 for goal (state 8)
reward = np.full(n_states, -1)
reward[4] = -10
reward[8] = 10
done_states = [8]

# Transitions (simplified, walls bounce back)
def next_state(s, a):
    row, col = divmod(s, 3)
    if a == 0: row = max(0, row-1)
    elif a == 1: row = min(2, row+1)
    elif a == 2: col = max(0, col-1)
    elif a == 3: col = min(2, col+1)
    return row*3 + col

alpha = 0.1
gamma = 0.9
epsilon = 0.1
for episode in range(500):
    state = 0
    while state not in done_states:
        if np.random.rand() < epsilon:
            action = np.random.randint(n_actions)
        else:
            action = np.argmax(Q[state])
        new_state = next_state(state, action)
        Q[state][action] += alpha * (reward[new_state] + gamma * np.max(Q[new_state]) - Q[state][action])
        state = new_state

print(Q.round(2))

Caution: Works only if state space is small. For larger problems (images, continuous sensors) you need deep neural networks (Deep Q-Networks) or policy gradient methods.

What to do now: No robot needed. Simulate a scheduling problem (e.g., processing orders) as an RL environment. Even simple Q-learning helps you understand the dynamics.

Summary — What to do now

We've covered three paradigms. Every problem has a path: if you have labels use supervised; if no labels try unsupervised; if the agent interacts with the environment use reinforcement learning.

Identify the problem type: What data do you have? What's the goal? Predict (supervised), discover patterns (unsupervised), or optimize sequential decisions (RL)?
Choose the right algorithm: For supervised start with linear models or decision trees. For unsupervised try K-means and PCA. For RL start with tabular Q-learning on a simple environment.
Evaluate metrics: Don't stop at accuracy. Use MAE for regression, F1 for classification, silhouette score for clustering, total reward for RL.
Check computational complexity: If the dataset is huge (millions of rows), scikit-learn may not suffice. Consider PyTorch/TensorFlow or distributed frameworks.

Remember: the best model is the one that solves the problem at the right cost. We at Meteora Web see it every day: a high percentage of ML project failures comes from choosing the wrong paradigm. Start here.

For deeper insights on advanced AI applications, see our guide on Gemini 2.5 Pro vs Flash or the comparison of Claude Opus, Sonnet and Haiku. To understand how prompting techniques apply to language models, read Zero-shot, One-shot, Few-shot Prompting.