April 25, 2019

Deep Learning Coursera

Deep Learning and Neural Net
FALL SEMESTER 2019

INSTRUCTOR: DR. NG Andrew

no_reply@example.com

26 Feb 2019

https://github.com/Kulbear/deep-learning-coursera/tree/master/Neural%20Networks%20and%20Deep%20Learning

WEEK-1

Video 1: What is Neural Network

Relu: Rectified linear unit

Q1: Every Input layer is interconnected with every hidden layer feature - True

Video 2: Supervised Learning with Neural Network

Real Estate	Standard NN
Online Ad	Standard NN
Photo Tagging	CNN
Speech Recognition	RNN
Machine Translation	RNN
Autonomous Driving	Custom/ Hybrid NN

Structure data : table, DB

Unstructured Data: Audio, Image, Text

Q1: Which data has features like pixel or words - Unstructured

Video 3: Why DL taking off?

NN requires more data to get more performance.

With lesser data even SVM can perform better than NN

Below factors made DL take off

-Data

-Computation

-Algo

-Switching sigmoid to ReLu learning rate has increased and

gradient descent worked faster

Q1: What is m stand for - no. of records

WEEK-2

Video 1: Binary Classification

Binary Classification means problem of 1 or 0 Ex. where image is of cat or not

Image of 64x64

Steps: Get all RGB in one vector ie vector = 64x64x3 = 12288 (HxWxRGB)

Notations: single training ex - (X,Y) where X is nx Dimensional feature vector
and y is either 0 or 1 ie
(X,y) = x E R nx, y E{0,1}
Mtest = No.of test rows
Mtrain = No of training rows

X is Nx times m dimensional metrics
Y is 1xm dimesnional metric

Video 2: Logistic Regression

Given X what is a chance of y = 1 for given X values

ie Given X Y hat = P(y=1 |x)

But we have X is vector with numbers and parameter W , B

We can use regression algo “Y hat = W transpose + b” but this will be huge -ve or +ve number thus we have
to use sigmoid function to convert above algo numbers to 0 and 1

Z = regression algo given above then sig(z) = 1 / 1+e^-z is formula of sigmoid function

Q1: What is parameters of Logistic regression?

W is an x dimensional vector and B is real number

Video 3: Logistic Regression Cost Function

We hv to compare Y hat with actual y and calculate loss function
We have used in past squared error as a loss function which is L(Y hat, Y) = 1/2(Y hat-Y) ^2
but in NN we will use below function to calculate loss to get global minimum :
Loss function:L(Y hat, Y) = Ylog Y hat + (1-y) log(1-Y hat)
Cost Function : J(w,b) = 1/m mEi=1 L(Y hat, Y) Loss function is applies to single observation and cost function sum loss of all data and takes average
here we select parameter w,b where cost is minimum regression starts with random B value and and keep
changing it till we get lowest cost .

Q1 : What is the difference between the cost function and the loss function
for logistic regression>
ans - the loss function computes the error for a single training example ;
the cost function is the average of the loss function of the entire training set

Video 4 : Gradient Descent
in Gradient descent we try to find global minimum value for cost function J(w,b)

start with only one parameter w with random value
in loop below function Until converge i.e. finds min value ( alpha is learning rate)
w := w - alpha dJ(w)/dw
dJ(w)/dw is also shown as dw derivative w
w:= w-alpha into dw

With both parameter w, b
J(w,b) will be:
w:= w- alpha dJ(w,b) / dw
b:= b- alpha dJ(w,b) / db

When J is function of 2 or more variable use first symbol "partial derivative"
and for one variable use lowercase d for derivative as shown in image
Q : A convex function always has multiple local optima - false

Video 5 : Derivatives
Below chart is for f(a) = 3a

Lets say f(a) = 3a
if a=2 then f(a) = 6 here if we change a = 2.001 then f(a) will be 6.003, then from this two f(a) values on line
we can see triangle is forming in image
to calculate slope/derivative = height/width i.e. y/x in chart i.e. 0.003/0.001 = 3
thus df(a)/da = 3
Derivative formal definition is if a is changes by very tiny then does f(a) changes

Q: On a straight line , the function's derivative - Doesn't change

Video 6 : More Derivatives Example
1. f(a) = a^2 then df(a)/da = 2a/4
if a = 2 f(a) = 4 and if a = 2.001 f(a) = 4.004
2. f(a) = a^3 then df(a)/da = 3a^2/3.2^2
if a = 2 then f(a) = 8 and if a = 2.001 then f(a) = 8.012

Video 7 : Computational Graph

Above is the example of computational graph, Forward propagation Its comes in handy when some
specific variable need to be optimize like J here

Q: One Step of Backward Propagation on a computation graph yields derivative of final output variable

Video 8 : Derivatives with Computational Graph

In Above image we have our regression formula 3(a+ bc) = J then it can be shown as graph.
To calculate slop(derivative) of J we start with some input values of a,b,c and compute v then
we again change value of input variable in backward order so first v changed from 11 to 11.001
and see how J is getting changed from 33 to 33.003 so slop is dj/dv= 3 then we change a from 5 to 5.001 and
value of J doesn't change so slop is dj/da = 1 and ultimately dj/da = dj/dv * dj/da i.e. 3*1 = 3
So finally in backward propagation derivative formula is "d final Output / d var"
Below we have calculated derivative for b and c as well:

Q: What does the coding convention dvar represent?
-Derivative of a final output variable with respect to various intermediate quantities

Video 9 : Logistic Regression Gradient Descent

here we have z = regression formula Y hat is prediction and loss function
We are goion with two features X1 and X2
we calculate z first
then Y hat using sigmoid
and last the loss
we do it continuously by changing W1, W2 and b until we reach minimum loss ( global minimum)

Below we are calculating derivatives for each input parameter by changing values :

Here we are calculating backward.
first derivative of a ie da = dL(a,y)/da ( what is loss changes if a changes)
then derivative of z ie dz = dL(a,y)/dz that is also dL/da into da/dz ( what is loss changes if z changes)
and lastly we will see derivative of w1,w2, and b

Q: Calculus also provide simplified formula for derivative of loss function L with respect to z
Ans : a-y
This happens for one row lets see how it happens with all rows of training data

Video 10 : Gradient Descent on m examples

Here show the cost function formula in first row and secondly derivative average of loss of all records

In above image code is written for logistic regression for m example and for loop is running for each
observation in dataset. in the code we are adding all the derivatives of dw1, dw2, db and at the end dividing
by m to get average value which is cost value. in above example we only have 2 inputs variable thus
we are directly storing derivative in dw1, and dw2 or x1 and x2 but if we have more variables we can use for loop
but it is better to avoid for loop for performance and vectorization can be.

Q: in the for loop depicted in the video, why is there only one dw variable (ie no superscripts in th for loop)?
Ans - The value of dw in code is cumulative

Python and Vectorization
Video 11 : Vectorization

to calculate z = wTranspose xb where w and x are long vectors its time consuming by doing in for loop similar thing can be done in numpy as z = np.dot(w,x)

Vectorization runs 300 times faster when tested in below example loop took 480ms time and numpy took 1.5ms

Q: Vectorization can not be done without GPU - False

Video 12 : More Vectorization Example

Other example of vectorization given below
Always avoid for loops and use internal function like vectorization

Now we have to apply vectorization in logistic regression as shown below :
We have to remove dw1 and dw2 this case only 2 dw ther may be more if more features are there
and we have to replace dw with np vectorization

Video 13 : Vectorzing Logistic Regression
Here, we have calculate z and a (activation function) for each row we can do so using numpy if we arrange
X and W in metrics and we below formula:
z = np.dot(w.T,X)+ b
after we can also calculate sigmoid for all activations

Q: what are the dimensions of matrix X in this video
ans: (nx,m)

Video 14 : Vectorizing Logistic Regression's, Gradient Output
Here we have shown how we can calculate gradient descent with single Iteration of parameter w, b
without using loop:

Q: How do you compute the deerivative of b in one line of python numpy?
Ans: 1/m*(np.sum(dz))

Video 15 : Broadcasting in python
Broadcasting is technique to run python code faster
Here we have calculate % of total calories given in each fruit
which can be achieve using two lines with out using for loop:

If two matrix are of different dimension (mxn) then python make then same and then does operations
element wise:

As in first case 1x4 matrix is added to 1x1 so first it changed to 1x4 by repeating 100 same follows
in rest calculation. This is general principle of broadcasting in python
Q: which of the numpy code would sum the value in a matrix A vertically?
Ans: A.sum(axis=0)

Video 16 : A note on python numpy vectors
Always create a vector as matrix by defining 1 as another dimension:
a = np.random.randn(5,1) (This is col vector) or a = np.random.randn(1,5) (This is row vector)
and not a = np.random.randn(5) this will just create 1 dimension array (Rank 1 array) and not vector,
thus any matrix operation like transpose will not work.
We can also convert arrays to vector/matrix using a = a.shape((5,1))

Q: What kind of array has dimension in this format: (10,)
Ans: A rank one Array

Video 17 : A quick tour of python/iPython notebook
Use ctrl+shift to run cell
Video 18 : Explanation of logistic

Search This Blog

Data Science Study Group

Deep Learning Coursera

26 Feb 2019

https://github.com/Kulbear/deep-learning-coursera/tree/master/Neural%20Networks%20and%20Deep%20Learning

WEEK-1

WEEK-2

Comments

Post a Comment

Popular Posts

How to transpose data in postgresql