Another day, another exercise in cementing my understanding on some of the algorithms covered in my Machine learning Course.
Today I will go over gradient decent.
$$\begin{align} \text{repeat}&\text{ until convergence:} \; \lbrace \newline \; w &= w – \alpha \frac{\partial J(w,b)}{\partial w} \; \newline b &= b – \alpha \frac{\partial J(w,b)}{\partial b} \newline \rbrace \end{align}$$
parameters w ( weights ) and b ( bias ) update simultaneously.
$$
\begin{align}
\frac{\partial J(w,b)}{\partial w} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) – y^{(i)})x^{(i)} \newline
\frac{\partial J(w,b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) – y^{(i)}) \
\end{align}
$$
First we set up the variable for dj_dw (w ) & dj_db ( b )
def compute_gradient(x, y, w, b):
f_wb = w * x[i] + b
# Same as len(). Finds the number
m = x.shape[0]
#setting up variables
dj_dw = 0
dj_db = 0
Lets start by breaking down the same function from the cost function:
$$(f_{w,b}(x^{(i)}) – y^{(i)}) $$
like in the cost function before
since
$$f_{w,b}(x^{(i)}) = wx^{(i)} + b$$
for gradient decent we would get
$$ w = ((wx^{(i)} + b) – y^{(i)})x^{(i)} \newline b = ((wx^{(i)} + b) – y^{(i)})\newline $$
# we set up a for loop going through the number of feature samples (x)
for i in range(m):
# we solve for f_wb like we did before in the cost function
f_wb = w * x[i] + b
#we then solve for the top which is w[i]
dj_dw_i = (f_wb - y[i]) * x[i]
#then onto solving for b[i]
dj_db_i = f_wb - y[i]
$$ w= \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) – y^{(i)})x^{(i)} \newline b = \sum\limits_{i = 0}^{m-1} ((wx^{(i)} + b) – y^{(i)})\newline $$
#then finding the sum for each
#below is basically the same as dj_dw = dj_dw + dj_dw_i (finding the sum)
dj_dw += dj_dw_i
dj_db += dj_db_i
Then finally we divide by feature length. In this case m = x.shape (the amount of features in the variable x).
Then return dj_dw( w ) and dj_db( b ) from the function:
#then finally we divide by feature length. In this case m = x.shape (the amount of features in the variable x)
dj_dw = dj_dw / m
dj_db = dj_db / m
#we return these two variable from this specific function
return dj_dw, dj_db
Altogether We have the calculation for the gradient!
def compute_gradient(x, y, w, b):
# Get the number of training examples
m = x.shape[0]
# Initialize gradients
dj_dw = 0
dj_db = 0
# Iterate through each training example
for i in range(m):
# Compute the predicted value f_wb for the i-th example
f_wb = w * x[i] + b
# Compute the gradient for the weight w
dj_dw_i = (f_wb - y[i]) * x[i]
# Compute the gradient for the bias b
dj_db_i = f_wb - y[i]
# Accumulate the gradients
dj_dw += dj_dw_i
dj_db += dj_db_i
# Average the gradients over all training examples
dj_dw = dj_dw / m
dj_db = dj_db / m
return dj_dw, dj_db
Leave a Reply