import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
PART 1: Finding the Bottom of a Valley¶
Exercise 1.1 — Look Before You Leap¶
Consider this function:
$$f(x) = x^4 - 4x + 10$$
Your first job is simply to see it.
Use the plotting helper below to draw the function over the range $x \in [-3, 3]$.
# PLOTTING HELPER — run this cell as-is, you'll use plot_function() throughout the notebook
def plot_function(f, x_range=(-3, 3), num_points=500, title="f(x)", mark_x=None):
"""
Plots a function f over x_range.
Optionally marks a specific x value with a red dot.
Parameters:
f : a Python function that takes a number and returns a number
x_range : tuple (x_min, x_max)
title : string label for the plot
mark_x : if provided, draws a red dot at (mark_x, f(mark_x))
"""
xs = np.linspace(x_range[0], x_range[1], num_points)
ys = [f(x) for x in xs]
plt.figure(figsize=(8, 4))
plt.plot(xs, ys, 'b-', linewidth=2)
if mark_x is not None:
plt.plot(mark_x, f(mark_x), 'ro', markersize=10, label=f'x={mark_x:.4f}, f(x)={f(mark_x):.4f}')
plt.legend()
plt.title(title)
plt.xlabel('x')
plt.ylabel('f(x)')
plt.grid(True)
plt.show()
Tasks:
- Run the plotting helper cell to define
plot_function. - Define $f(x) = x^4 - 4x + 10$ as a Python function.
- Call
plot_function(f, x_range=(-3, 3), title="x^4 - 4x + 10"). - Visually estimate: where does the minimum appear to be? Write your guess as a comment.
# PLOTTING HELPER — run this cell as-is
def plot_function(f, x_range=(-3, 3), num_points=500, title="f(x)", mark_x=None):
xs = np.linspace(x_range[0], x_range[1], num_points)
ys = [f(x) for x in xs]
plt.figure(figsize=(8, 4))
plt.plot(xs, ys, 'b-', linewidth=2)
if mark_x is not None:
plt.plot(mark_x, f(mark_x), 'ro', markersize=10, label=f'x={mark_x:.4f}, f(x)={f(mark_x):.4f}')
plt.legend()
plt.title(title)
plt.xlabel('x')
plt.ylabel('f(x)')
plt.grid(True)
plt.show()
# YOUR CODE HERE
# Step 1: Define f(x) = x^4 - 4x + 10
def f(x):
pass # replace this
# Step 2: Plot it
# plot_function(...)
# Step 3: Write your visual estimate of the minimum below
# My guess: x ≈ ???
Exercise 1.2 — Hit and Trial¶
Now let's find the minimum by hand — no calculus, just trying values.
The plotting helper accepts a mark_x argument. Use it to mark your guesses on the plot.
Tasks:
- Start with your visual estimate from Exercise 1.1. Call
plot_function(f, mark_x=YOUR_GUESS). - Look at the plot. Is the marked point at the bottom? If not, which direction should you move?
- Try a new value. Re-plot.
- Repeat until you are confident you have found the minimum to 2 decimal places.
- Record every guess you made and the function value at that guess in a table (use a Python list of tuples).
# Template for recording your guesses:
guesses = [
# (x_value, f(x_value))
# e.g. (1.0, f(1.0)),
]
Reflection questions (answer in a markdown cell below):
- How many guesses did it take?
- What strategy did you use to decide the next guess?
- Can you describe your strategy in plain English, like a recipe?
# YOUR CODE HERE — try different values of mark_x
plot_function(f, mark_x=1.0, title="My guess: x=1.0")
# Record all your guesses here
guesses = [
# (x_value, f(x_value))
]
# Print them nicely
print(f"{'Guess #':<10} {'x':<15} {'f(x)':<15}")
print("-" * 40)
for i, (x, fx) in enumerate(guesses):
print(f"{i+1:<10} {x:<15.6f} {fx:<15.6f}")
Your reflection here:
- How many guesses did it take?
- What was your strategy?
- (Write in plain English)
PART 2: Making the Machine Do the Searching¶
Exercise 2.1 — The Slope Tells You Which Way to Walk¶
You had a strategy when doing hit-and-trial. Let's make it precise.
Think about this: when you are standing on the curve at some point $x$, the slope of the curve tells you:
- If slope is positive → the function is going up to the right → you should move left (decrease $x$)
- If slope is negative → the function is going up to the left → you should move right (increase $x$)
- If slope is zero → you are at the bottom!
We can approximate the slope (derivative) at any point using:
$$\text{slope at } x \approx \frac{f(x + h) - f(x - h)}{2h} \quad \text{for a very small } h$$
This is called the numerical derivative.
Tasks:
- Write a function
numerical_derivative(f, x, h=1e-5)that computes the slope at pointx. - Test it on $f(x) = x^4 - 4x + 10$ at a few points: $x = 0, 1, 1.5, 2$.
- Print both the slope value and whether it's positive, negative, or near-zero.
- Manually verify one of your answers: the true derivative is $f'(x) = 4x^3 - 4$. Does your numerical result match?
# YOUR CODE HERE
def numerical_derivative(f, x, h=1e-5):
"""
Returns the approximate derivative of f at point x.
"""
pass # replace this
# Test it
test_points = [0, 1, 1.5, 2]
for x in test_points:
slope = numerical_derivative(f, x)
# print x, slope, and direction
Exercise 2.2 — One Step at a Time¶
Now let's build the update rule.
If you are at position $x$, and the slope there is $s$, your next position should be:
$$x_{\text{new}} = x - \alpha \cdot s$$
where $\alpha$ (alpha) is called the learning rate — it controls how big a step you take.
Tasks:
- Start at $x_0 = 2.0$.
- Compute the slope at $x_0$.
- Compute $x_1 = x_0 - \alpha \cdot \text{slope}$ with $\alpha = 0.01$.
- Print $x_0$, slope, and $x_1$.
- Is $f(x_1) < f(x_0)$? It should be! Why?
- Now do this manually for 5 steps. Print each step. Is $x$ getting closer to the minimum you found in Exercise 1.2?
# YOUR CODE HERE
alpha = 0.01
x = 2.0
print(f"{'Step':<8} {'x':<15} {'f(x)':<15} {'slope':<15}")
print("-" * 55)
for step in range(5):
slope = numerical_derivative(f, x)
print(f"{step:<8} {x:<15.6f} {f(x):<15.6f} {slope:<15.6f}")
x = x - alpha * slope # the update rule
Exercise 2.3 — Build find_minima_1d¶
Now automate the process. Write a function that keeps taking steps until the slope is close enough to zero.
Stopping condition: Stop when $|\text{slope}| < \epsilon$ (a small tolerance, e.g. $10^{-6}$), or when you've taken max_steps steps.
def find_minima_1d(f, x_start, alpha=0.01, epsilon=1e-6, max_steps=10000):
"""
Finds the x that minimizes f using gradient descent.
Parameters:
f : function to minimize
x_start : starting point
alpha : learning rate
epsilon : stop when |slope| < epsilon
max_steps : maximum number of steps
Returns:
x_min : the x value at the minimum
f_min : the function value at the minimum
history : list of (step, x, f(x)) tuples
"""
pass # YOUR IMPLEMENTATION
Tasks:
- Implement
find_minima_1d. - Test it on $f(x) = x^4 - 4x + 10$ starting from $x = 2.0$.
- Print the final answer and how many steps it took.
- Does it match your manual answer from Exercise 1.2?
# YOUR CODE HERE
def find_minima_1d(f, x_start, alpha=0.01, epsilon=1e-6, max_steps=10000):
pass
# Test it
x_min, f_min, history = find_minima_1d(f, x_start=2.0)
print(f"Minimum found at x = {x_min:.6f}")
print(f"f(x_min) = {f_min:.6f}")
print(f"Steps taken: {len(history)}")
Exercise 2.4 — Watch It Learn¶
Use the plotting helper below to visualize the descent path. Run it after you have history from Exercise 2.3.
# PLOTTING HELPER — descent path
def plot_descent(f, history, x_range=(-3, 3), title="Gradient Descent"):
"""
Plots the function and shows how x evolved during descent.
history: list of (step, x, f(x)) tuples
"""
xs = np.linspace(x_range[0], x_range[1], 500)
ys = [f(x) for x in xs]
steps, xvals, fvals = zip(*history)
plt.figure(figsize=(12, 4))
# Left: function with descent path
plt.subplot(1, 2, 1)
plt.plot(xs, ys, 'b-', linewidth=2, label='f(x)')
plt.plot(xvals, fvals, 'ro-', markersize=3, alpha=0.5, label='descent path')
plt.plot(xvals[-1], fvals[-1], 'g*', markersize=15, label=f'minimum: x={xvals[-1]:.4f}')
plt.title(title)
plt.xlabel('x'); plt.ylabel('f(x)'); plt.legend(); plt.grid(True)
# Right: f(x) over steps
plt.subplot(1, 2, 2)
plt.plot(steps, fvals, 'b-')
plt.title('f(x) value over steps')
plt.xlabel('step'); plt.ylabel('f(x)'); plt.grid(True)
plt.tight_layout()
plt.show()
Tasks:
- Define and run
plot_descentwith your history. - What do you observe in the right plot? Is it always decreasing?
- Try starting from $x = -2.0$ instead. Does it converge to the same minimum? Why or why not?
- Try
alpha = 0.1. What happens? Tryalpha = 1.0. What happens? Record your observations.
# PLOTTING HELPER — run this cell as-is
def plot_descent(f, history, x_range=(-3, 3), title="Gradient Descent"):
xs = np.linspace(x_range[0], x_range[1], 500)
ys = [f(x) for x in xs]
steps, xvals, fvals = zip(*history)
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(xs, ys, 'b-', linewidth=2, label='f(x)')
plt.plot(xvals, fvals, 'ro-', markersize=3, alpha=0.5, label='descent path')
plt.plot(xvals[-1], fvals[-1], 'g*', markersize=15, label=f'minimum: x={xvals[-1]:.4f}')
plt.title(title); plt.xlabel('x'); plt.ylabel('f(x)'); plt.legend(); plt.grid(True)
plt.subplot(1, 2, 2)
plt.plot(steps, fvals, 'b-')
plt.title('f(x) over steps'); plt.xlabel('step'); plt.ylabel('f(x)'); plt.grid(True)
plt.tight_layout()
plt.show()
# YOUR CODE HERE
# Experiment 1: Default settings
x_min, f_min, history = find_minima_1d(f, x_start=2.0)
plot_descent(f, history)
# Experiment 2: Different starting point
# ...
# Experiment 3: Different learning rates
# ...
Your observations (fill in):
- Starting from x = -2.0: ...
- With alpha = 0.1: ...
- With alpha = 1.0: ...
PART 3: Stress-Testing Your Algorithm¶
Exercise 3.1 — New Functions, Same Algorithm¶
Your find_minima_1d is general — it should work on any function. Let's test it.
For each function below:
- Define it in Python.
- Plot it to visually estimate the minimum.
- Run
find_minima_1dand record the result. - Plot the descent path.
- Note any problems or surprises.
Functions to test:
| # | Function | x_range to plot | Suggested x_start |
|---|---|---|---|
| A | $f(x) = x^6 - 4x^2 + 10$ | $(-2, 2)$ | $1.5$ |
| B | $f(x) = (x - 3)^2 + 5$ | $(0, 6)$ | $0.0$ |
| C | $f(x) = x^2 + 3\|x\|$ | $(-4, 4)$ | $2.0$ |
| D | $f(x) = e^x - 5x$ | $(0, 4)$ | $3.0$ |
| E | $f(x) = x^4 - 3x^3 + 2$ | $(-1, 3)$ | $2.5$ |
Caution on Function A: Plot it first. Does it have one minimum or more? What does that mean for your algorithm?
# Function A
def fA(x):
return x**6 - 4*x**2 + 10
plot_function(fA, x_range=(-2, 2), title="fA: x^6 - 4x^2 + 10")
# YOUR CODE: run find_minima_1d and plot_descent
# What do you notice? Are there multiple minima?
# Function B — YOUR CODE
def fB(x):
pass
# plot, find_minima_1d, plot_descent
# Function C — YOUR CODE
def fC(x):
pass
# plot, find_minima_1d, plot_descent
# Function D — YOUR CODE
def fD(x):
pass
# plot, find_minima_1d, plot_descent
# Function E — YOUR CODE
def fE(x):
pass
# plot, find_minima_1d, plot_descent
Exercise 3.2 — Function A Is Sneaky¶
Plot Function A carefully. You should notice it has two local minima — one on the left and one on the right.
Tasks:
- Run
find_minima_1don Function A starting from $x = 1.5$. Where does it end up? - Run it starting from $x = -1.5$. Where does it end up?
- Run it starting from $x = 0.0$. Where does it end up?
- Which of the two minima is the global minimum (lowest overall)? Which is just a local minimum?
- Can your algorithm always find the global minimum? What would you need to do to be more confident?
Write your answers in a markdown cell.
# YOUR CODE HERE
for x_start in [1.5, -1.5, 0.0]:
x_min, f_min, history = find_minima_1d(fA, x_start=x_start, alpha=0.01)
print(f"Start: x={x_start:.1f} → Minimum at x={x_min:.6f}, f(x)={f_min:.6f}")
Your analysis:
- Global minimum: ...
- Local minimum: ...
- Can the algorithm always find the global minimum? ...
PART 4: Two Variables — A Landscape Instead of a Curve¶
Exercise 4.1 — See the Landscape¶
Now consider a function of two variables:
$$g(x, y) = x^2 + y^2$$
This is a bowl shape in 3D. The minimum is clearly at $(x, y) = (0, 0)$.
Use the 2D plotting helper below to visualize it.
# PLOTTING HELPER — 2D contour plot
def plot_function_2d(f2, x_range=(-3, 3), y_range=(-3, 3), num_points=100, title="f(x,y)", mark_xy=None):
"""
Plots a 2-variable function as a contour map.
Parameters:
f2 : function that takes (x, y) and returns a scalar
x_range : (x_min, x_max)
y_range : (y_min, y_max)
mark_xy : if provided as (x, y), draws a red dot at that point
"""
xs = np.linspace(x_range[0], x_range[1], num_points)
ys = np.linspace(y_range[0], y_range[1], num_points)
X, Y = np.meshgrid(xs, ys)
Z = np.array([[f2(x, y) for x in xs] for y in ys])
plt.figure(figsize=(6, 5))
contour = plt.contourf(X, Y, Z, levels=30, cmap='viridis')
plt.colorbar(contour)
if mark_xy is not None:
plt.plot(mark_xy[0], mark_xy[1], 'r*', markersize=15,
label=f'({mark_xy[0]:.3f}, {mark_xy[1]:.3f})')
plt.legend()
plt.title(title)
plt.xlabel('x'); plt.ylabel('y')
plt.show()
Tasks:
- Define and run the
plot_function_2dhelper. - Define $g(x, y) = x^2 + y^2$ and plot it.
- What color represents the minimum? What color represents high values?
- Now plot these functions and describe the shape of each landscape:
- $h_1(x, y) = (x - 1)^2 + (y + 2)^2$
- $h_2(x, y) = x^2 + 4y^2$
- $h_3(x, y) = x^2 + y^2 + x*y$
- Visually identify the minimum of each.
# PLOTTING HELPER — run this cell as-is
def plot_function_2d(f2, x_range=(-3, 3), y_range=(-3, 3), num_points=100, title="f(x,y)", mark_xy=None):
xs = np.linspace(x_range[0], x_range[1], num_points)
ys = np.linspace(y_range[0], y_range[1], num_points)
X, Y = np.meshgrid(xs, ys)
Z = np.array([[f2(x, y) for x in xs] for y in ys])
plt.figure(figsize=(6, 5))
contour = plt.contourf(X, Y, Z, levels=30, cmap='viridis')
plt.colorbar(contour)
if mark_xy is not None:
plt.plot(mark_xy[0], mark_xy[1], 'r*', markersize=15,
label=f'({mark_xy[0]:.3f}, {mark_xy[1]:.3f})')
plt.legend()
plt.title(title); plt.xlabel('x'); plt.ylabel('y')
plt.show()
# YOUR CODE HERE
def g(x, y):
return x**2 + y**2
plot_function_2d(g, title="g(x,y) = x^2 + y^2")
# Now define and plot h1, h2, h3
# ...
Exercise 4.2 — Partial Derivatives¶
With two variables, the slope has two components — one for each variable.
The partial derivative with respect to $x$ tells you the slope if you move only in the $x$ direction (holding $y$ fixed).
We can compute them numerically the same way:
$$\frac{\partial f}{\partial x} \approx \frac{f(x+h, y) - f(x-h, y)}{2h}$$
$$\frac{\partial f}{\partial y} \approx \frac{f(x, y+h) - f(x, y-h)}{2h}$$
Together, $(\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y})$ is called the gradient — it points in the direction of steepest increase.
Tasks:
- Write a function
partial_x(f2, x, y, h=1e-5)that returns $\partial f / \partial x$. - Write a function
partial_y(f2, x, y, h=1e-5)that returns $\partial f / \partial y$. - Write a function
gradient_2d(f2, x, y, h=1e-5)that returns the tuple(df_dx, df_dy). - Test on $g(x, y) = x^2 + y^2$ at several points:
- $(1, 1)$ → expected gradient: $(2, 2)$
- $(3, 0)$ → expected gradient: $(6, 0)$
- $(0, 0)$ → expected gradient: $(0, 0)$
- For $h_2(x, y) = x^2 + 4y^2$, compute the gradient at $(1, 1)$ and $(0, 2)$. Do the answers make intuitive sense? (Which direction does the gradient point — toward or away from the minimum?)
# YOUR CODE HERE
def partial_x(f2, x, y, h=1e-5):
pass
def partial_y(f2, x, y, h=1e-5):
pass
def gradient_2d(f2, x, y, h=1e-5):
pass
# Test on g(x,y) = x^2 + y^2
test_points = [(1, 1), (3, 0), (0, 0)]
for (x, y) in test_points:
grad = gradient_2d(g, x, y)
print(f"Gradient at ({x}, {y}) = {grad}")
Exercise 4.3 — Build find_minima_2d¶
Now extend the 1D algorithm to 2D. The update rule becomes:
$$x_{\text{new}} = x - \alpha \cdot \frac{\partial f}{\partial x}$$ $$y_{\text{new}} = y - \alpha \cdot \frac{\partial f}{\partial y}$$
Both variables are updated simultaneously.
Stopping condition: Stop when $\sqrt{(\partial f/\partial x)^2 + (\partial f/\partial y)^2} < \epsilon$. This is the magnitude of the gradient (how steep the hill is overall).
Tasks:
- Implement
find_minima_2d(f2, x_start, y_start, alpha=0.1, epsilon=1e-6, max_steps=10000).- It should return
(x_min, y_min, f_min, history)wherehistoryis a list of(step, x, y, f(x,y))tuples.
- It should return
- Test it on $g(x, y) = x^2 + y^2$ starting from $(2, 3)$. It should find $(0, 0)$.
- Test it on $h_1(x, y) = (x-1)^2 + (y+2)^2$ starting from $(3, 3)$. Expected minimum: $(1, -2)$.
- Test it on $h_2(x, y) = x^2 + 4y^2$ starting from $(2, 2)$. Expected minimum: $(0, 0)$.
Use plot_function_2d with mark_xy=(x_min, y_min) to verify your answers visually.
# YOUR CODE HERE
def find_minima_2d(f2, x_start, y_start, alpha=0.1, epsilon=1e-6, max_steps=10000):
pass
# Test on g
x_min, y_min, f_min, history = find_minima_2d(g, x_start=2.0, y_start=3.0)
print(f"g: minimum at ({x_min:.4f}, {y_min:.4f}), f={f_min:.6f}, steps={len(history)}")
plot_function_2d(g, mark_xy=(x_min, y_min), title="g: descent result")
# Test on h1
def h1(x, y):
return (x - 1)**2 + (y + 2)**2
# YOUR CODE: run find_minima_2d and verify
# Test on h2
def h2(x, y):
return x**2 + 4*y**2
# YOUR CODE: run find_minima_2d and verify
# Try alpha=0.1. Does it converge? Try alpha=0.3. What happens?
Exercise 4.4 — Visualize the Path (2D)¶
Use the helper below to draw the descent path on the contour map.
# PLOTTING HELPER — 2D descent path
def plot_descent_2d(f2, history, x_range=(-4, 4), y_range=(-4, 4), title="2D Gradient Descent"):
"""
history: list of (step, x, y, f(x,y)) tuples
"""
xs = np.linspace(x_range[0], x_range[1], 100)
ys = np.linspace(y_range[0], y_range[1], 100)
X, Y = np.meshgrid(xs, ys)
Z = np.array([[f2(x, y) for x in xs] for y in ys])
steps, xvals, yvals, fvals = zip(*history)
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.contourf(X, Y, Z, levels=30, cmap='viridis')
plt.colorbar()
plt.plot(xvals, yvals, 'w-o', markersize=3, alpha=0.6, label='path')
plt.plot(xvals[0], yvals[0], 'rs', markersize=10, label='start')
plt.plot(xvals[-1], yvals[-1], 'g*', markersize=15, label='end')
plt.legend(); plt.title(title); plt.xlabel('x'); plt.ylabel('y')
plt.subplot(1, 2, 2)
plt.plot(steps, fvals, 'b-')
plt.title('f(x,y) over steps'); plt.xlabel('step'); plt.ylabel('f(x,y)'); plt.grid(True)
plt.tight_layout()
plt.show()
Tasks:
- Define and run
plot_descent_2dfor at least two of your test functions. - Does the path head straight to the minimum, or does it zigzag? Why?
- For $h_2(x, y) = x^2 + 4y^2$: try
alpha=0.3andalpha=0.5. Describe what you see. This is a famous problem in gradient descent called oscillation.
# PLOTTING HELPER — run this cell as-is
def plot_descent_2d(f2, history, x_range=(-4, 4), y_range=(-4, 4), title="2D Gradient Descent"):
xs = np.linspace(x_range[0], x_range[1], 100)
ys = np.linspace(y_range[0], y_range[1], 100)
X, Y = np.meshgrid(xs, ys)
Z = np.array([[f2(x, y) for x in xs] for y in ys])
steps, xvals, yvals, fvals = zip(*history)
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.contourf(X, Y, Z, levels=30, cmap='viridis')
plt.colorbar()
plt.plot(xvals, yvals, 'w-o', markersize=3, alpha=0.6, label='path')
plt.plot(xvals[0], yvals[0], 'rs', markersize=10, label='start')
plt.plot(xvals[-1], yvals[-1], 'g*', markersize=15, label='end')
plt.legend(); plt.title(title); plt.xlabel('x'); plt.ylabel('y')
plt.subplot(1, 2, 2)
plt.plot(steps, fvals, 'b-')
plt.title('f(x,y) over steps'); plt.xlabel('step'); plt.ylabel('f(x,y)'); plt.grid(True)
plt.tight_layout()
plt.show()
# YOUR CODE HERE
# Run descent on h2 with different alphas and visualize
PART 5: Going Generic¶
Exercise 5.1 — N Variables¶
Your 1D and 2D algorithms are very similar. Can you unify them?
Think about it this way:
- In 1D, your position is a single number $x$. The gradient is a single number.
- In 2D, your position is a pair $(x, y)$. The gradient is a pair $(\partial f/\partial x, \partial f/\partial y)$.
- In N dimensions, your position is a list of N numbers. The gradient is also a list of N numbers.
The update rule is always the same:
$$\text{variables}_{\text{new}} = \text{variables} - \alpha \cdot \text{gradient}$$
where both are lists.
The signature you are building:
def find_minima(f, variables, alpha=0.01, epsilon=1e-6, max_steps=100000):
"""
Finds the values of 'variables' that minimize f.
Parameters:
f : a function that takes a LIST of values and returns a scalar
e.g., if variables = [x, y], then f([x, y]) should work
variables : initial guess, a list of numbers [v1, v2, ..., vN]
alpha : learning rate
epsilon : stop when gradient magnitude < epsilon
max_steps : maximum number of steps
Returns:
min_vars : list of variable values at the minimum
f_min : function value at the minimum
history : list of (step, variables_copy, f_value) tuples
"""
pass
Key challenge: Computing the gradient for a list of N variables.
For variable $i$, the partial derivative is: $$\frac{\partial f}{\partial v_i} \approx \frac{f(v_1, ..., v_i + h, ..., v_N) - f(v_1, ..., v_i - h, ..., v_N)}{2h}$$
You need to perturb one variable at a time.
Tasks:
- First, write a helper
compute_gradient(f, variables, h=1e-5)that returns the gradient as a list. - Then write
find_minima(f, variables, ...). - Test by re-running your earlier examples, but now with the new interface:
- 1D:
find_minima(lambda v: v[0]**4 - 4*v[0] + 10, [2.0]) - 2D:
find_minima(lambda v: v[0]**2 + v[1]**2, [2.0, 3.0])
- 1D:
# YOUR CODE HERE
def compute_gradient(f, variables, h=1e-5):
"""
Computes numerical gradient of f at 'variables'.
variables: list of N numbers
Returns: list of N partial derivatives
"""
pass
def find_minima(f, variables, alpha=0.01, epsilon=1e-6, max_steps=100000):
pass
# Test 1D
min_vars, f_min, history = find_minima(lambda v: v[0]**4 - 4*v[0] + 10, [2.0])
print(f"1D minimum: x = {min_vars[0]:.6f}, f = {f_min:.6f}")
# Test 2D
min_vars, f_min, history = find_minima(lambda v: v[0]**2 + v[1]**2, [2.0, 3.0])
print(f"2D minimum: x = {min_vars[0]:.6f}, y = {min_vars[1]:.6f}, f = {f_min:.6f}")
Exercise 5.2 — Three Variables¶
Test your generic find_minima on 3D functions:
$$p(x, y, z) = (x - 1)^2 + (y - 2)^2 + (z + 3)^2$$
The minimum is obviously at $(1, 2, -3)$.
$$q(x, y, z) = x^2 + 2y^2 + 3z^2 - 4x + 6z$$
For $q$: what is the minimum analytically? (Hint: complete the square for each variable.)
Tasks:
- Define $p$ and $q$ as functions that take a list
[x, y, z]. - Run
find_minimaon both, starting from[0, 0, 0]. - Verify your answer for $p$ is $(1, 2, -3)$.
- For $q$: first compute the answer analytically, then verify your algorithm found it.
# YOUR CODE HERE
def p(v):
x, y, z = v
return (x - 1)**2 + (y - 2)**2 + (z + 3)**2
def q(v):
x, y, z = v
return x**2 + 2*y**2 + 3*z**2 - 4*x + 6*z
# Run find_minima on p
min_vars, f_min, history = find_minima(p, [0.0, 0.0, 0.0])
print(f"p minimum: {min_vars}, f = {f_min:.6f}")
# Run find_minima on q
# YOUR CODE
# What is the analytical answer for q?
PART 6: The Real Problem — Fitting a Line to Data¶
Exercise 6.1 — What Does It Mean to Fit a Line?¶
Suppose you have a dataset: some $x$ values and corresponding $y$ values. You want to find the straight line $\hat{y} = mx + c$ that best fits the data.
But what does "best fit" mean?
For each data point $(x_i, y_i)$, your line predicts $\hat{y}_i = m x_i + c$. The error for that point is $y_i - \hat{y}_i$.
Mean Squared Error (MSE) is defined as:
$$\text{MSE}(m, c) = \frac{1}{N} \sum_{i=1}^{N} (y_i - (m x_i + c))^2$$
A smaller MSE means the line fits the data better.
Tasks:
- Here is some sample data — run the cell below.
- Plot the data points using the helper below.
- Try drawing lines with different $(m, c)$ on the same plot. Which looks like the best fit visually?
# DATA — run this cell as-is
np.random.seed(42)
X_data = np.linspace(0, 10, 30)
y_data = 2.5 * X_data + 4.0 + np.random.randn(30) * 3 # true line: m=2.5, c=4.0 + noise
print(f"Number of data points: {len(X_data)}")
print(f"X range: [{X_data.min():.1f}, {X_data.max():.1f}]")
print(f"y range: [{y_data.min():.1f}, {y_data.max():.1f}]")
# PLOTTING HELPER — data + line overlay
def plot_data_and_line(X, y, m=None, c=None, title="Data"):
"""
Plots data points. If m and c are given, also draws the line y = mx + c.
"""
plt.figure(figsize=(8, 5))
plt.scatter(X, y, color='blue', label='data', zorder=5)
if m is not None and c is not None:
x_line = np.linspace(X.min(), X.max(), 200)
y_line = m * x_line + c
plt.plot(x_line, y_line, 'r-', linewidth=2, label=f'y = {m:.2f}x + {c:.2f}')
plt.legend()
plt.title(title)
plt.xlabel('X'); plt.ylabel('y')
plt.grid(True)
plt.show()
# YOUR CODE HERE
# 1. Plot just the data
plot_data_and_line(X_data, y_data, title="Raw Data")
# 2. Try some lines — which looks best?
plot_data_and_line(X_data, y_data, m=1.0, c=5.0, title="m=1.0, c=5.0")
# try more...
Exercise 6.2 — Write the MSE Function¶
Now write a function that quantifies how well a line fits the data.
def mse(X, y, m, c):
"""
Computes Mean Squared Error for predicting y from X using line y = mx + c.
Parameters:
X : array of input values (shape: N,)
y : array of true output values (shape: N,)
m : slope of the line
c : intercept of the line
Returns:
mse_value : a single number
"""
pass
Tasks:
- Implement
mse. Do NOT use any library function — write the formula yourself using a loop or numpy operations. - Test it:
- A perfect line should have MSE = 0. Create a tiny dataset where you know the exact answer:
X=[1,2,3],y=[3,5,7]is perfectly fit by $m=2, c=1$. Verifymse(X, y, 2, 1) == 0. - For our main dataset, compute MSE for:
m=2.5, c=4.0(the true line used to generate data)m=1.0, c=5.0m=5.0, c=0.0
- A perfect line should have MSE = 0. Create a tiny dataset where you know the exact answer:
- Which $(m, c)$ gives the lowest MSE? Does this match your visual judgment from Exercise 6.1?
# YOUR CODE HERE
def mse(X, y, m, c):
pass
# Sanity check
X_tiny = np.array([1.0, 2.0, 3.0])
y_tiny = np.array([3.0, 5.0, 7.0])
print(f"MSE for perfect line (should be 0): {mse(X_tiny, y_tiny, 2.0, 1.0):.6f}")
# Compare on main dataset
lines_to_test = [
(2.5, 4.0, "true line"),
(1.0, 5.0, "guess 1"),
(5.0, 0.0, "guess 2"),
]
for m, c, label in lines_to_test:
error = mse(X_data, y_data, m, c)
print(f"{label:15s}: m={m}, c={c} → MSE={error:.4f}")
Exercise 6.3 — The MSE Landscape¶
MSE is a function of $m$ and $c$. Let's visualize it as a 2D landscape.
Tasks:
- Use
plot_function_2d(from Part 4) to plot MSE over a grid of $(m, c)$ values.- Let $m$ range from 0 to 5, and $c$ from -5 to 15.
- You'll need to wrap
mseso it takes a form compatible withplot_function_2d.
- What shape does the MSE landscape have? Is it convex? Does it have a clear minimum?
- Mark the point $(m, c) = (2.5, 4.0)$ on the plot. Is it at or near the minimum?
Hint for wrapping:
def mse_landscape(m, c):
return mse(X_data, y_data, m, c)
plot_function_2d(mse_landscape, x_range=(0, 5), y_range=(-5, 15), title="MSE Landscape (m, c)")
# YOUR CODE HERE
def mse_landscape(m, c):
return mse(X_data, y_data, m, c)
plot_function_2d(mse_landscape, x_range=(0, 5), y_range=(-5, 15),
title="MSE Landscape",
mark_xy=(2.5, 4.0))
PART 7: Putting It All Together — Linear Regression via Gradient Descent¶
Exercise 7.1 — Minimize MSE with find_minima¶
You now have:
find_minima(f, variables)— a general minimizermse(X, y, m, c)— the function to minimize
Put them together.
The key step: To use find_minima, you need to express MSE as a function of a list [m, c]:
def mse_for_optimizer(params):
m, c = params
return mse(X_data, y_data, m, c)
Tasks:
- Define
mse_for_optimizeras above. - Call
find_minima(mse_for_optimizer, [0.0, 0.0])with an appropriate learning rate.- Warning: You may need to tune
alpha. Try0.001first, then adjust.
- Warning: You may need to tune
- Print the final
mandcfound by the algorithm. - Compare to the true values
m=2.5, c=4.0. How close did you get? - Plot the resulting line over the data using
plot_data_and_line.
# YOUR CODE HERE
def mse_for_optimizer(params):
m, c = params
return mse(X_data, y_data, m, c)
# Run the optimizer — tune alpha if needed
min_params, f_min, history = find_minima(mse_for_optimizer, [0.0, 0.0], alpha=0.001)
m_found, c_found = min_params
print(f"Found: m = {m_found:.4f}, c = {c_found:.4f}")
print(f"True: m = 2.5000, c = 4.0000")
print(f"MSE at found values: {mse(X_data, y_data, m_found, c_found):.4f}")
# Plot the result
plot_data_and_line(X_data, y_data, m=m_found, c=c_found, title=f"Best fit: m={m_found:.2f}, c={c_found:.2f}")
Exercise 7.2 — Watch the Line Learn¶
Let's visualize how the line evolves during gradient descent.
Use the plotting helper below:
# PLOTTING HELPER — animated learning
def plot_learning_progress(X, y, history, steps_to_show=[0, 5, 20, 100, 500, -1]):
"""
Shows the fitted line at different stages of training.
history: list of (step, [m, c], mse_value) tuples from find_minima
"""
n = len(steps_to_show)
fig, axes = plt.subplots(1, n, figsize=(4*n, 4))
history_len = len(history)
x_line = np.linspace(X.min(), X.max(), 100)
for ax, step_idx in zip(axes, steps_to_show):
if step_idx == -1:
step_idx = history_len - 1
step_idx = min(step_idx, history_len - 1)
step_num, params, mse_val = history[step_idx]
m, c = params
y_line = m * x_line + c
ax.scatter(X, y, color='blue', s=15, alpha=0.6)
ax.plot(x_line, y_line, 'r-', linewidth=2)
ax.set_title(f'Step {step_num}\nm={m:.2f}, c={c:.2f}\nMSE={mse_val:.1f}')
ax.set_xlabel('X'); ax.set_ylabel('y')
ax.grid(True)
plt.tight_layout()
plt.show()
Tasks:
- Run
find_minimawithalpha=0.001starting from[0.0, 0.0]. - Call
plot_learning_progresswith your history. - Describe what you see: how does the line change from step 0 to the final step?
- Also plot the MSE over steps using the 2D descent helper.
Note: For this to work, your
find_minimahistory must store[m, c]as a list (not just the step index). Make sure your implementation stores a copy of the variables at each step.
# PLOTTING HELPER — run this cell as-is
def plot_learning_progress(X, y, history, steps_to_show=[0, 5, 20, 100, 500, -1]):
n = len(steps_to_show)
fig, axes = plt.subplots(1, n, figsize=(4*n, 4))
history_len = len(history)
x_line = np.linspace(X.min(), X.max(), 100)
for ax, step_idx in zip(axes, steps_to_show):
if step_idx == -1:
step_idx = history_len - 1
step_idx = min(step_idx, history_len - 1)
step_num, params, mse_val = history[step_idx]
m, c = params
y_line = m * x_line + c
ax.scatter(X, y, color='blue', s=15, alpha=0.6)
ax.plot(x_line, y_line, 'r-', linewidth=2)
ax.set_title(f'Step {step_num}\nm={m:.2f}, c={c:.2f}\nMSE={mse_val:.1f}')
ax.set_xlabel('X'); ax.set_ylabel('y'); ax.grid(True)
plt.tight_layout()
plt.show()
# YOUR CODE HERE
min_params, f_min, history = find_minima(mse_for_optimizer, [0.0, 0.0], alpha=0.001)
# Visualize the learning process
plot_learning_progress(X_data, y_data, history, steps_to_show=[0, 5, 20, 100, 500, -1])
Exercise 7.3 — New Dataset, Same Code¶
Here are three new datasets. For each:
- Plot the raw data.
- Use your
find_minima+mseto find the best-fit line. - Plot the fitted line over the data.
- Report the final $m$, $c$, and MSE.
No new code is needed — you're reusing everything you built.
# DATASETS — run this cell as-is
np.random.seed(7)
# Dataset 1: negative slope
X1 = np.linspace(0, 10, 40)
y1 = -1.5 * X1 + 20 + np.random.randn(40) * 2
# Dataset 2: steep positive slope
X2 = np.linspace(-5, 5, 50)
y2 = 5.0 * X2 - 10 + np.random.randn(50) * 4
# Dataset 3: nearly flat
X3 = np.linspace(0, 20, 60)
y3 = 0.3 * X3 + 2.0 + np.random.randn(60) * 5
print("Datasets ready: X1/y1, X2/y2, X3/y3")
# YOUR CODE HERE — Dataset 1
plot_data_and_line(X1, y1, title="Dataset 1 — Raw")
def mse_d1(params):
m, c = params
return mse(X1, y1, m, c)
# find_minima and plot result
# YOUR CODE HERE — Dataset 2
# YOUR CODE HERE — Dataset 3
Exercise 7.4 — Sanity Check with NumPy¶
NumPy has a function np.polyfit(X, y, 1) that finds the best-fit line using an exact mathematical formula (not gradient descent).
Tasks:
- Run
np.polyfit(X_data, y_data, 1)— it returns[m, c]. - Compare the result to what your gradient descent found.
- Run the same comparison for all three datasets in Exercise 7.3.
- Are the answers the same? Should they be exactly the same, or just approximately the same? Why?
Bonus: If your answers disagree significantly (more than 0.01), investigate why. Check:
- Did gradient descent converge? (Check the final gradient magnitude.)
- Is your learning rate too large or too small?
- Did you run enough steps?
# YOUR CODE HERE
m_np, c_np = np.polyfit(X_data, y_data, 1)
print(f"NumPy: m = {m_np:.4f}, c = {c_np:.4f}")
# Compare with gradient descent result from Exercise 7.1
print(f"Gradient Desc: m = {m_found:.4f}, c = {c_found:.4f}")
# Repeat for other datasets
# ...
PART 8: Bonus Challenges¶
If you've made it this far, you've independently discovered gradient descent and used it to implement linear regression. These are the foundations of almost every machine learning algorithm.
Here are some open-ended challenges to push further:
Challenge A — What if the data isn't linear?¶
Generate this dataset:
X_quad = np.linspace(-3, 3, 50)
y_quad = 1.5 * X_quad**2 - 2 * X_quad + 1 + np.random.randn(50) * 1.5
The data was generated with a quadratic function $y = 1.5x^2 - 2x + 1$.
- Try fitting a straight line to it. What MSE do you get?
- Now fit a quadratic: $\hat{y} = ax^2 + bx + c$. You now have 3 parameters to optimize: $[a, b, c]$.
- Extend your
msefunction to handle any polynomial and minimize it withfind_minima. - Compare the fitted $a, b, c$ to the true values $1.5, -2, 1$.
# Challenge A — YOUR CODE
X_quad = np.linspace(-3, 3, 50)
y_quad = 1.5 * X_quad**2 - 2 * X_quad + 1 + np.random.randn(50) * 1.5
# Step 1: fit a straight line, report MSE
# Step 2: define mse_quadratic(params) where params = [a, b, c]
# Step 3: find_minima(mse_quadratic, [0, 0, 0])
Challenge B — Learning Rate Scheduler¶
You've seen that a large $\alpha$ causes oscillation and a small $\alpha$ is slow.
One trick: decay the learning rate over time. Start with a large $\alpha$ and make it smaller each step:
$$\alpha_t = \frac{\alpha_0}{1 + t \cdot \text{decay\_rate}}$$
Modify your find_minima to support an optional decay_rate parameter. Compare:
- Fixed
alpha=0.1on the MSE problem - Decaying alpha starting at
0.1withdecay_rate=0.01
Which converges faster? Plot both loss curves on the same graph.
# Challenge B — YOUR CODE
Challenge C — Stochastic Gradient Descent¶
In real ML, datasets have millions of points. Computing MSE over all of them every step is expensive.
Stochastic Gradient Descent (SGD): Instead of using all data to compute the gradient, use a random single point (or a small batch of points) each step.
- Modify
find_minima(or writefind_minima_sgd) to:- Each step, pick one random data point $(x_i, y_i)$
- Compute the gradient of the error for that single point
- Update $m$ and $c$
- Run it on
X_data, y_data. - Plot the loss curve. Is it smooth or noisy? Why?
- How does the final answer compare to full gradient descent?
# Challenge C — YOUR CODE
Reflection — What Did You Just Build?¶
Take a moment to answer these questions in your own words:
What is gradient descent? Explain it in 3 sentences as if you're describing it to someone who has never heard of it.
What is the learning rate? What happens if it's too large? Too small?
What is MSE? Why do we minimize it instead of just minimizing the average error?
What is linear regression? What problem does it solve?
What is the connection between minimizing MSE and fitting a line?
What's the biggest limitation of the approach you built? What would you want to improve?
Your answers:
Gradient descent is...
The learning rate is...
MSE is...
Linear regression is...
The connection is...
The biggest limitation is...
You've just invented gradient descent and linear regression from scratch. That's not nothing — it's the foundation of deep learning.