Questions about gradient descent and linear regression ?
Today we have yet another notebook-from-jim ™ .
With an animation even.
If we get through that, let's talk about what happens when there are more features ... that is, more x columns :
\[ y = a_1 x_1 + a_2 x_2 + b \]
... and what changes would you have to make to the code?
Your mission, should you decide to accept it: do something like this yourself, with one or two features (x columns), on any data that has a reasonable correlation (and so might reasonably be described by a line).
You're welcome to use or adapt my notebooks, or use the functions from the book. In either case, explain what you're doing and quote sources.
The trickiest part is understanding convergence, when you're trying to hard and not getting anywhere (over fitting) vs when you haven't gone far enough (under fitting). DO NOT just iterate and hope for the best - you need to find ways to monitor what's going on.
I encourage you to invent your own stopping conditions and step sizes and see what you find.
Do make plots of how the algorithm is (or isn't) converging, including both test set errors and training set errors - even though using the test set before the model is set isn't quite kosher, as a learning tool it can be a good indicator of over and under fitting.
We went over this stuff. And I made a recording; check out the links over on the left if you want to see it.