Gradient Decent : Into Python Part 1


Another day, another exercise in cementing my understanding on some of the algorithms covered in my Machine learning Course.

Today I will go over gradient decent.

$$\begin{align} \text{repeat}&\text{ until convergence:} \; \lbrace \newline \; w &= w – \alpha \frac{\partial J(w,b)}{\partial w} \; \newline b &= b – \alpha \frac{\partial J(w,b)}{\partial b} \newline \rbrace \end{align}$$

parameters w ( weights ) and b ( bias ) update simultaneously.

$$
\begin{align}
\frac{\partial J(w,b)}{\partial w} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) – y^{(i)})x^{(i)} \newline
\frac{\partial J(w,b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) – y^{(i)}) \
\end{align}
$$

First we set up the variable for dj_dw (w ) & dj_db ( b )

def compute_gradient(x, y, w, b): 
f_wb = w * x[i] + b 
    # Same as len(). Finds the number
    m = x.shape[0]  
 
    #setting up variables
    dj_dw = 0
    dj_db = 0

Lets start by breaking down the same function from the cost function:

$$(f_{w,b}(x^{(i)}) – y^{(i)}) $$

like in the cost function before

since

$$f_{w,b}(x^{(i)}) = wx^{(i)} + b$$

for gradient decent we would get

$$ w = ((wx^{(i)} + b) – y^{(i)})x^{(i)} \newline b = ((wx^{(i)} + b) – y^{(i)})\newline $$

# we set up a for loop going through the number of feature samples (x)
for i in range(m):

     # we solve for f_wb like we did before in the cost function
    f_wb = w * x[i] + b
    
    #we then solve for the top which is w[i]
    dj_dw_i = (f_wb - y[i]) * x[i]

    #then onto solving for b[i]
    dj_db_i = f_wb - y[i]

$$ w= \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) – y^{(i)})x^{(i)} \newline b = \sum\limits_{i = 0}^{m-1} ((wx^{(i)} + b) – y^{(i)})\newline $$

    #then finding the sum for each 
    #below is basically the same as dj_dw = dj_dw + dj_dw_i (finding the sum)
    dj_dw += dj_dw_i 
    dj_db += dj_db_i 

Then finally we divide by feature length. In this case m = x.shape (the amount of features in the variable x).

Then return dj_dw( w ) and dj_db( b ) from the function:

    #then finally we divide by feature length. In this case m = x.shape (the amount of features in the variable x)
    dj_dw = dj_dw / m 
    dj_db = dj_db / m 

#we return these two variable from this specific function
return dj_dw, dj_db

Altogether We have the calculation for the gradient!

def compute_gradient(x, y, w, b):
    # Get the number of training examples
    m = x.shape[0]

    # Initialize gradients
    dj_dw = 0
    dj_db = 0

    # Iterate through each training example
    for i in range(m):
        # Compute the predicted value f_wb for the i-th example
        f_wb = w * x[i] + b

        # Compute the gradient for the weight w
        dj_dw_i = (f_wb - y[i]) * x[i]

        # Compute the gradient for the bias b
        dj_db_i = f_wb - y[i]

        # Accumulate the gradients
        dj_dw += dj_dw_i
        dj_db += dj_db_i

    # Average the gradients over all training examples
    dj_dw = dj_dw / m
    dj_db = dj_db / m

    return dj_dw, dj_db

Leave a Reply

Your email address will not be published. Required fields are marked *