封面:Singular Value Decomposition (SVD) in Python - AskPython

eigenvalue decomposition for symmetric matrix

ARn×nA\in\mathbb R^{n\times n} 为一个对称矩阵(i.e.A=AA^\top = A). 那么存在

A=UΛUA = U\Lambda U^\top

其中

U=[u1,u2,,un]Rn×n is orthogonalΛ=[λ1λ2λn]Rn×n is a diagonal matrix\begin{gather*} U = [u_1,u_2,\cdots,u_n]\in\mathbb R^{n\times n} ~\text{is orthogonal}\\ \Lambda = \begin{bmatrix}\lambda_1&&&\\&\lambda_2&&\\&&\ddots&\\&&&\lambda_n\end{bmatrix} \in\mathbb R^{n\times n}~\text{is a diagonal matrix} \end{gather*}

λi,i=1,,n\lambda_i,i=1,\cdots,nAA的特征值,ui,i=1,,nu_i,i=1,\cdots,n是对应的特征向量

proof

  1. 证明所有AA的特征值均为实数

  2. 通过归纳法证明存在A=UΛUA = U\Lambda U^\top

    如果n=1n=1

    A=1A1A = 1\cdot A\cdot 1

    假设对于任意AkRk×kA_k\in\mathbb R^{k\times k} 存在一个decomposition:Ak=UkΛkUkA_k=U_k\Lambda_kU_k^\top

    对于任意的Ak+1R(k+1)×(k+1)A_{k+1}\in\mathbb R^{(k+1)\times (k+1)},令(λ,u)(\lambda,u)Ak+1A_{k+1}的一个特征值-特征向量满足Ak+1u=λu,u2=1A_{k+1}u=\lambda u,\|u\|_2 =1

    因为uRk+1u\in\mathbb R^{k+1}是单位向量,因此可以将其扩展成一个Rk+1\mathbb R^{k+1}的正交基, denoted by Q=[u,P]Q = [u,P]

    QAk+1Q=[uP]Ak+1[uP]=[uAk+1uuAk+1PPAk+1uPAk+1P]\begin{gather*} Q^\top A_{k+1}Q = \begin{bmatrix}u^\top\\P^\top\end{bmatrix}A_{k+1} \begin{bmatrix}u&P\end{bmatrix}\\ = \begin{bmatrix}u^\top A_{k+1}u&u^\top A_{k+1}P\\ P^\top A_{k+1}u&P^\top A_{k+1}P\end{bmatrix} \end{gather*}

    其中,

    uAk+1u=u(λu)=λuu=λPAk+1u=P(λu)=λPu=0uAk+1P=(PAk+1u)=0=0(PAk+1P)=PAk+1P=PAk+1PPAk+1P is symmetric\begin{align*} u^\top A_{k+1}u&=u^\top(\lambda u)=\lambda u^\top u = \lambda\\ P^\top A_{k+1}u &= P^\top(\lambda u)=\lambda P^\top u=0\\ u^\top A_{k+1}P&=(P^\top A_{k+1}u)^\top=0^\top=0\\ (P^\top A_{k+1}P)^\top&=P^\top A_{k+1}^\top P=P^\top A_{k+1}P\Rightarrow P^\top A_{k+1}P~\text{is symmetric} \end{align*}

    根据归纳法的假设,存在S,DRn×nS,D\in\mathbb R^{n\times n }满足

    PAk+1P=SDS P^\top A_{k+1}P=SDS^\top

    因此

    QAk+1Q=[λ00SDS]=[1S][λD][1S]Ak+1=Q[1S][λD][1S]Q=Uk+1Λk+1Uk+1\begin{gather*} Q^\top A_{k+1}Q = \begin{bmatrix}\lambda&0\\0&SDS^\top\end{bmatrix}=\begin{bmatrix}1&\\&S\end{bmatrix}\begin{bmatrix}\lambda&\\&D\end{bmatrix}\begin{bmatrix}1&\\&S^\top\end{bmatrix}\\ A_{k+1} = Q\begin{bmatrix}1&\\&S\end{bmatrix}\begin{bmatrix}\lambda&\\&D\end{bmatrix}\begin{bmatrix}1&\\&S^\top\end{bmatrix}Q^\top\\ =U_{k+1}\Lambda_{k+1}U_{k+1}^\top \end{gather*}

  3. 接下来证明Λ\Lambda是特征值,UU是对应的特征向量

    A=UΛUAU=UΛA[u1,,un]=[u1,,un]Λ=[λ1u1,λnun]Aui=λiui\begin{gather*} A = U\Lambda U^\top \Leftrightarrow AU = U\Lambda\\ \Rightarrow A[u_1,\cdots,u_n] = [u_1,\cdots,u_n] \Lambda =[\lambda_1 u_1,\cdots \lambda_n u_n]\\ \Leftrightarrow Au_i = \lambda_iu_i \end{gather*}

Singular value decomposition

ARm×nA\in\mathbb R^{m\times n}, 其中 mnm\geq n,则AAA^\top A 是symmetric的

AA=VΛVA^\top A = V\Lambda V^\topAAAA^\top的一个特征值分解,其中

VRn×ns.t.VV=VV=IΛ=diag(λ1,,λn)Rn×n是一个对角矩阵,其中λ1λ2λn\begin{gather*} V\in\mathbb{R}^{n\times n}\quad {\rm s.t.}\quad V^\top V = VV^\top = I\\ \Lambda = {\rm diag}(\lambda_1,\cdots,\lambda_n)\in\mathbb R^{n\times n}\quad\text{是一个对角矩阵,其中}\lambda_1\geq\lambda_2\geq\cdots\geq \lambda_n \end{gather*}

  1. 由于AAA^\top A是SPSD的,因此λ1λ2λn0\lambda_1\geq\lambda_2\geq\cdots\geq \lambda_n\geq 0

  2. wiw_iAVRm×nAV\in\mathbb R^{m\times n}的第ii列,则

    wiwj=(AVei)(AVej)=eiVAAVej=ei(VV)Λ(VV)ej={λii=j0ij\begin{align*} w_i^\top w_j &= (AVe_i)^\top(AVe_j)=e_i^\top V^\top A^\top AVe_j\\ &=e_i^\top (V^\top V)\Lambda (V^\top V)e_j\\ &=\left\{\begin{aligned}&\lambda_i&&i=j\\&0&&i\neq j\end{aligned}\right. \end{align*}

    因此AVAV的列是正交的

  3. 定义σi=wi2=wiwi=λi\sigma_i = \|w_i\|_2 =\sqrt{w_i^\top w_i}= \sqrt{\lambda_i}ui=wi/wi2u_i=w_i/\|w_i\|_2

    U=[u1u2un]Rm×nΣ=diag(σ1,,σn)Rn×n\begin{gather*} U = \begin{bmatrix}u_1&u_2&\cdots&u_n\end{bmatrix}\in\mathbb R^{m\times n}\\ \Sigma = {\rm diag}(\sigma_1,\cdots,\sigma_n)\in\mathbb R^{n\times n} \end{gather*}

    UU=I(UUI unless m=n)U^\top U=I(UU^\top\neq I~{\rm unless}~m=n),且UΣ=[w1,,wn]=AVU\Sigma=[w_1,\cdots,w_n]=AV

    A=UΣV\rightarrow A = U\Sigma V^\top

综上,A=UΣVA = U\Sigma V^\top,定义p=min{m,n}p=\min\{m,n\}其中

URm×p s.t. UU=IVRp×p s.t. VV=IΣRn×p 是一个对角矩阵满足 σ1σ2σn0\begin{gather*} U\in\mathbb R^{m\times p}~{\rm s.t.}~U^\top U =I\\ V\in\mathbb R^{p\times p}~{\rm s.t.}~V^\top V=I\\ \Sigma\in\mathbb R^{n\times p}~\text{是一个对角矩阵满足}~\sigma_1\geq \sigma_2\geq\cdots\geq\sigma_n\geq0\\ \end{gather*}

properties of SVD

  1. {AV=UΣAU=VΣ{Avi=σiuiAui=σivi \begin{gather*}\left\{\begin{aligned}AV &= U\Sigma\\A^\top U&=V\Sigma\end{aligned}\right.\\ \left\{\begin{aligned}Av_i &= \sigma_i u_i\\A^\top u_i&=\sigma_iv_i\end{aligned}\right. \end{gather*}

  2. {AAvi=σi2viAAui=σi2ui\left\{\begin{aligned}A^\top Av_i &= \sigma_i^2v_i\\AA^\top u_i&=\sigma_i^2u_i\end{aligned}\right.

  3. SVD是矩阵分析中有用的工具

    • rank(A)={\rm rank}(A)= 非零特征值数量

    • Ran(A)=span(u1,,ur){\rm Ran}(A)={\rm span}(u_1,\cdots,u_r),其中r=rank(A)r={\rm rank}(A)

      proof

      Ran(A)={AxxRn}={UΣVTxxRn}={i=1pσiui(viTx)xRn}={i:σi0σi(viTx)uixRn}=span{u1,u2,,ur}}\begin{align*} {\rm Ran}(A)&=\{A\mathbf{x}|\mathbf{x}\in\mathbb{R}^{n}\}=\{U\Sigma V^{T}\mathbf{x}\big|\mathbf{x}\in\mathbb{R}^{n}\}\\ &=\left\{\sum_{i=1}^{p}\sigma_{i}u_{i}(v_{i}^{T}\mathbf{x})\big|\mathbf{x}\in\mathbb{R}^{n}\right\}\\ &=\left\{\sum_{i:\sigma_i\neq 0}\sigma_{i}(v_{i}^{T}\mathbf{x})\cdot u_{i}|\mathbf{x}\in\mathbb{R}^{n}\}={\rm span}\{u_{1},u_{2},\ldots,u_{r}\}\right\} \end{align*}

    • Ker(A)=span{v1,v2,,vr}{\rm Ker}(A)={\rm span}\{v_1,v_2,\cdots,v_r\}^{\perp}

    • A2=σ1, AF=i=1pσi\|A\|_2 =\sigma_1,~\|A\|_F = \sqrt{\sum_{i=1}^p \sigma_i}

      proof

      A2=maxx2=1Ax2=(maxx2=1Ax22)1/2=(maxx2=1xTATAx)1/2=(maxx2=1xTVΣ2Vx)1/2=(maxx2=1yΣ2y)1/2=(σ12)1/2=σ1\begin{align*} \|A\|_{2}&=\max_{\|x\|_{2}=1}\|Ax\|_{2}=\left(\max_{\|x\|_{2}=1}\|Ax\|_{2}^{2}\right)^{1/2}=\left(\max_{\|x\|_{2}=1}x^{T}A^{T}Ax\right)^{1/2}\\ &=\left(\max_{\|x\|_{2}=1}x^{T}V\Sigma ^2V^\top x\right)^{1/2}=\left(\max_{\|x\|_{2}=1}y^\top\Sigma ^2y\right)^{1/2}=(\sigma_{1}^{2})^{1/2}=\sigma_{1} \end{align*}

      由于VV是正交矩阵,因此Vx=1\|V^\top x\|=1

      AF=(A,A)1/2=(UΣV,UΣV)1/2=(tr(VΣUUΣV))1/2=(tr(VΣ2VT))1/2=(tr(Σ2VV))1/2=(tr(Σ2))1/2=(σ12+σ22++σp2)1/2.\begin{align*} \|A\|_{F}&=\left(\langle A,A\rangle\right)^{1/2}=\left(\langle U\Sigma V^{\top},U\Sigma V^{\top}\rangle\right)^{1/2}\\ &=\left({\rm tr}(V\Sigma U^{\top}U\Sigma V^{\top})\right)^{1/2}=\left({\rm tr}(V\Sigma^{2}V^{T})\right)^{1/2}=\left({\rm tr}(\Sigma^{2}V^{\top}V)\right)^{1/2}\\ &=\left({\rm tr}(\Sigma^{2})\right)^{1/2}=(\sigma_{1}^{2}+\sigma_{2}^{2}+\cdots+\sigma_{p}^{2})^{1/2}. \end{align*}

    • 几何意义

      image-20250313163525800

  4. SVD是关于一个矩阵的最优的低秩近似

    ARm×n,A=UΣVA\in\mathbb{R}^{m\times n}, A = U\Sigma V^\top,则可以将AA表示为:

    A=i=1pσiuiviA = \sum_{i=1}^p\sigma_iu_iv_i^\top

    kk是一个正整数满足kpk\leq p,定义

    Ak=i=1kσiuiviA_k = \sum_{i=1}^k\sigma_iu_iv_i^\top

    Ak=argminB:rank(B)kAB2Ak=argminB:rank(B)kABF\begin{gather*} A_k = \operatorname*{argmin}_{B:{\rm rank}(B)\leq k}\|A-B\|_2\\ A_k = \operatorname*{argmin}_{B:{\rm rank}(B)\leq k}\|A-B\|_F \end{gather*}

    proof: 2-norm case

    AAk2=i=1pσiuivii=1kσiuivi2=i=k+1pσiuivi2=σk+1\|A-A_k\|_2 = \left\|\sum_{i=1}^p\sigma_iu_iv_i^\top-\sum_{i=1}^k\sigma_iu_iv_i^\top\right\|_2=\left\|\sum_{i=k+1}^p\sigma_iu_iv_i^\top\right\|_2=\sigma_{k+1}

    接下来希望证明对于任意B:rank(B)kB:{\rm rank}(B)\leq k

    AB2σk+1\|A-B\|_2\geq \sigma_{k+1}

    不妨设p=n(i.e.nm)p=n\quad ({\rm i.e.}\quad n\leq m)。由于rank(B)k{\rm rank}(B)\leq k

    rank(B[v1,,vk+1])rank(B)k{\rm rank}(B[v_1,\cdots,v_{k+1}])\leq {\rm rank}(B)\leq k

    因此

    B[v1v2vk+1][c1c2ck+1]define w=0B\underbrace{\begin{bmatrix}v_1&v_2&\cdots&v_{k+1}\end{bmatrix}\begin{bmatrix}c_1\\c_2\\\vdots\\c_{k+1}\end{bmatrix}}_{\text{define}~ w}=0

    存在一个非0解。不妨设w2=1(i.e.i=1k+1ci2=1)\|w\|_2 = 1\quad ({\rm i.e.}\quad \sum_{i=1}^{k+1}c_i^2=1)

    AB22(1)(AB)w22=Aw22=i=1k+1ciAvi22=(2)i=1k+1ciσiui22=(3)i=1k+1ci2σi2i=1k+1ci2σk+12=σk+12\begin{align*} \|A-B\|^2_2 &\overset{(1)}{\geq} \|(A-B)w\|_2^2=\|Aw\|_2^2=\left\|\sum_{i=1}^{k+1}c_iAv_i\right\|_2^2\\ &\overset{(2)}{=}\left\|\sum_{i=1}^{k+1}c_i\sigma_iu_i\right\|_2^2\overset{(3)}{=}\sum_{i=1}^{k+1}c_i^2\sigma_i^2\geq \sum_{i=1}^{k+1}c_i^2\sigma_{k+1}^2=\sigma_{k+1^2} \end{align*}

    其中(1)使用了AB2A2B2\|AB\|_2\leq\|A\|_2\|B\|_2

    (2)使用了SVD的性质1

    (3)成立是因为UU的列向量是正交的

    i=1k+1ciσiui22=i=1k+1j=1k+1cicjσiσjui,uj\left\|\sum_{i=1}^{k+1}c_i\sigma_iu_i\right\|_2^2=\sum_{i=1}^{k+1}\sum_{j=1}^{k+1}c_ic_j\sigma_i\sigma_j\langle u_i,u_j\rangle

Application

ALRA\approx LR^\top

其中LRm×kL\in\mathbb R^{m\times k}RRk×nR\in\mathbb R^{k\times n}

PCA

minLRm×k,RRn×kALRF2minM:rank(M)kAMF2\min_{L\in\mathbb R^{m\times k},R\in\mathbb{R}^{n\times k}}\|A-LR^\top\|_F^2\Leftrightarrow \min_{M:{\rm rank}(M)\leq k}\|A-M\|_F^2

The solution is given by kk-truncated SVD of A

Column Subset Selection

LL: AA的一些列,RRn×kR\in\mathbb R^{n\times k}

minL:kcolumns of ARRn×kALRF2minS{1,,m}S=kRRn×kAAsAsAF2\min_{\begin{gathered}L:k-\text{columns of }A\\R\in\mathbb{R}^{n\times k}\end{gathered}}\|A-LR^\top\|_F^2\Leftrightarrow \min_{\begin{gathered}S\sub\{1,\cdots,m\}\\|S|=k\\R\in\mathbb{R}^{n\times k}\end{gathered}}\|A-A_sA_s^{\dagger}A\|_F^2

PRQR gives a good approximate solution

Non Negtive Matrix Factorization

minLR+m×kRR+k×nALRF2\min_{\begin{gathered}L\in\mathbb{R}^{m\times k}_+\\R\in\mathbb{R}^{k\times n}_+\end{gathered}}\|A-LR^\top\|_F^2

to be continue.