8/2/2019

Linear Least Squares

k datapoints : ($x_1$,$y_1$) $\cdots$ ($x_k$, $y_k$) s.t $x\in R^n$and $y\in R^n$

Model: (e.g a curve)

We want to predict y = f(x,$\beta$) where $\beta \in R^l$ note: l<k (therefore, this is a overdetermined problem)

The goal of the least squares problem is to find $\beta$ s.t is minimized where $r_i = y_i - f(x_i, \beta)$ (here f is said to be a generative model)

Example : $x$ could be IR sensor voltage, $\beta$ could be parameters for response and y is the distance from the object.

Linear in this section describes the function $y = f(x, \beta)$ being a linear combination of basis functions i.e here $\phi$ is not necessarily linear.

Example: let and $x\in R^2$ and $y \in R^1$ then for each point, We put these equations for each point in a matrix ($A\beta = y$), but since the problem is overdetermined so solving for an unique solution is not possible (matrix A is not square).

What do we do?

$A^TA$ is a symmetric matrix which is also full row rank (because there are no linearly dependent rows in A). Since it is square, it is also full column rank. Therefore, it is full rank and is invertible.

Non Linear Least squares

All of the above only works if $y = \sum\beta\phi(x)$. However, more generally, we want to minimize: where $r_i = y_i = f(x_i,\beta)$ and f has a non linear dependence on $\beta$. note: The minimum of S occurs when

2 problems :

  1. The above partial may not have a closed form derivative
  2. It might depend on $\beta$ itself.

The process through which the value of $\beta$ given values of x s.t. this function is minimized is a root finding algorithm. Approach : guess at $\beta$ parameters values and iteratively optimize.

Possible algorithms: Gauss-Newton, Gradient Descent, Levenberg-Marquordt etc (convex optimization methods)