复习用,后续再添加内容

convex set

definition

x,ySαx+(1αy)S\forall x,y\in S\Rightarrow \alpha x+(1-\alpha y) \in S

一些凸集的例子

epi(f){\rm epi}(f)是convex set

epi(f):={(xa)Rn+1f(x)a}{\rm epi}(f):=\left.\left\{\begin{pmatrix}x\\a\end{pmatrix}\in\mathbb R^{n+1}\right|f(x)\leq a\right\}

即f(x)上方的全部区域

凸集的交集是凸的,但是并集不一定

如果ff是凸函数,则f(x)\partial f(x)是凸集

proof

根据f\partial f定义,有

f(y)f(x)+u,yxf(y)f(x)+v,yx\begin{gather*} f(y)\geq f(x)+\langle u,y-x\rangle\tag{1}\\ f(y)\geq f(x)+\langle v,y-x\rangle\tag{2}\\ \end{gather*}

对于任意α[0,1]\alpha \in[0,1]

α(1)+(1α)(2)f(y)f(x)+αu+(1α)v,yxαu+(1α)vf(x)\begin{gather*} \alpha\cdot(1)+(1-\alpha)\cdot(2)\\ \Rightarrow f(y)\geq f(x)+\langle\alpha u+(1-\alpha) v, y-x\rangle\\ \Rightarrow \alpha u+(1-\alpha) v\in\partial f(x) \end{gather*}

Lagrange function

version 1.

minxRn(f(x)+i=1pI(gi(x))+i=1qI0(hi(x)))\min_{x\in\mathbb R^n} \left(f(x)+\sum_{i=1}^p I_{-}(g_i(x))+\sum_{i=1}^q I_{0}(h_i(x))\right)

其中

I(a)={0a0+a>0I0(a)={0a=0+a0\begin{gather*} I_{-}(a)=\left\{\begin{aligned}&0&&a\leq 0\\&+\infty&&a>0\end{aligned}\right.\\ I_{0}(a)=\left\{\begin{aligned}&0&&a= 0\\&+\infty&&a\neq 0\end{aligned}\right. \end{gather*}

这样可以确保当满足条件的时候,即为原函数,否则函数值为正无穷

但是由于指示函数难以优化,所以使用以下函数进行代替

I(a)=maxλR+ (λa)I(a)=maxμR (μa)\begin{gather*} I_{-}(a) = \max_{\lambda \in \mathbb R_+}~(\lambda a)\\ I_{-}(a) = \max_{\mu \in \mathbb R}~(\mu a)\\ \end{gather*}

可以证明,这两者是等价的

Lagrange function

minxRnmaxλR+pμRq f(x)+i=1pλigi(x)+i=1qμihi(x)\min_{x\in\mathbb R^n}\max_{\substack{\lambda\in\mathbb R^p_+\\\mu\in\mathbb R^q}}~ f(x)+\sum_{i=1}^p \lambda_i g_i(x) + \sum_{i=1}^q \mu_ih_i(x)

primal and dual

p=minxRnmaxλR+pμRq L(x,λ,μ)d=maxλR+pμRqminxRn L(x,λ,μ)\begin{gather*} p^* = \min_{x\in\mathbb R^n}\max_{\substack{\lambda\in\mathbb R^p_+\\\mu\in\mathbb R^q}}~ L(x,\lambda,\mu)\\ d_* = \max_{\substack{\lambda\in\mathbb R^p_+\\\mu\in\mathbb R^q}}\min_{x\in\mathbb R^n}~ L(x,\lambda,\mu) \end{gather*}

pdp^*\geq d_*

proof

Let xx^* and (λ,μ)(\lambda_*,\mu_*) be the solution of primal and dual problems respectively, then

p=minxRnmaxλR+pμRq L(x,λ,μ)=maxλR+pμRqL(x,λ,μ)L(x,λ,μ)d=maxλR+pμRqminxRn L(x,λ,μ)=minxRn L(x,λ,μ)L(x,λ,μ)\begin{gather*} p^* = \min_{x\in\mathbb R^n}\max_{\substack{\lambda\in\mathbb R^p_+\\\mu\in\mathbb R^q}}~ L(x,\lambda,\mu)=\max_{\substack{\lambda\in\mathbb R^p_+\\\mu\in\mathbb R^q}}L(x^*,\lambda,\mu)\geq L(x^*,\lambda_*,\mu_*) \\ d_* = \max_{\substack{\lambda\in\mathbb R^p_+\\\mu\in\mathbb R^q}}\min_{x\in\mathbb R^n}~ L(x,\lambda,\mu) = \min_{x\in\mathbb R^n}~L(x,\lambda_*,\mu_*)\leq L(x^*,\lambda_*,\mu_*) \end{gather*}

Slater’s condition and KKT condition

Slater’s condition

若问题是凸问题ff是凸函数),且所有 gig_i 为凸函数

x0,{gi(x0)<0,i=1,,mCx0<dAx0=b\exists\, x_0, \quad \begin{cases} g_i(x_0) < 0, & i=1,\ldots, m \\ Cx_0 < d\\ Ax_0 = b \end{cases}

KKT condition

  1. 原始可行性(Primal feasibility)

fi(x)0,i=1,,mhj(x)=0,j=1,,p\begin{gather*} f_i(x^*) \leq 0, \quad i=1,\dots,m\\ h_j(x^*) = 0, \quad j=1,\dots,p \end{gather*}

  1. 对偶可行性(Dual feasibility)

λi0,i=1,,m\lambda_i^* \geq 0, \quad i=1,\dots,m

其中 λi\lambda_i 是拉格朗日乘子,对应不等式约束。

  1. 拉格朗日函数极值点(Stationarity)

f0(x)+i=1mλifi(x)+j=1pμjhj(x)=0\nabla f_0(x^*) + \sum_{i=1}^m \lambda_i^* \nabla f_i(x^*) + \sum_{j=1}^p \mu_j^* \nabla h_j(x^*) = 0

  1. 互补松弛性(Complementary slackness)

λifi(x)=0,i=1,,m\lambda_i^* f_i(x^*) = 0, \quad i=1,\dots,m

当优化问题是凸的时候,KKT条件是最优解的充要条件

Projected Gradient Descent

minxRn f(x)+IS(x)\min_{x\in\mathbb R^n}~ f(x) + I_S(x)

使用次梯度算法

x(k+1)=proxαIS(x(k)αf(x(k)))x^{(k+1)} = {\rm prox}_{\alpha I_S} \left(x^{(k)}-\alpha \nabla f\left(x^{(k)}\right)\right)

接下来求proxαIS{\rm prox}_{\alpha I_S}

proxαIS=argminxRn 12xy2+αIS(x)=argminxSxy=PS(y)\begin{align} {\rm prox}_{\alpha I_S}&=\operatorname*{argmin}_{x\in\mathbb R^n}~\frac 12 \| x-y\|^2+\alpha I_S(x)\\ &=\operatorname*{argmin}_{x\in S}\| x-y\|\\ &=P_S(y) \end{align}

PS(y)=argminxSxy{PS(y)SyPS(y),zPS(y)0,zSP_S(y)=\operatorname*{argmin}_{x\in S}\| x-y\|\Leftrightarrow \begin{cases}P_S(y)\in S\\ \langle y-P_S(y),z-P_S(y)\leq 0,\forall z\in S\end{cases}

PGD

x(k+1)=PS(x(k)αf(x(k)))x^{(k+1)} = P_S\left(x^{(k)}-\alpha \nabla f\left(x^{(k)}\right)\right)

Uzawa’s Algorithm

x(k+1)=argminxRL(x,λ(k),μ(k))λi(k+1)=[λi(k)+αkgi(x(k+1))]+,i=1,,pμi(k+1)=μi(k)+αkhi(x(k+1)),i=1,,q\boxed{ \begin{align*} & x^{(k+1)} = \operatorname*{argmin}_{x\in\mathbb R} L\left(x,\lambda^{(k)},\mu^{(k)}\right) \\ & \lambda_i^{(k+1)} = \Big[ \lambda_i^{(k)} + \alpha_k\, g_i\left(x^{(k+1)}\right) \Big]_+, \quad i=1,\ldots,p\\ & \mu_i^{(k+1)} = \mu_i^{(k)} + \alpha_kh_i\left(x^{(k+1)}\right),\quad i=1,\ldots,q \\ \end{align*} }

如果使用梯度法求解xx

x(k+1)=x(k)αkLx(x(k),λ(k),μ(k))λi(k+1)=[λi(k)+αkgi(x(k+1))]+,i=1,,pμi(k+1)=μi(k)+αkhi(x(k+1)),i=1,,qx(k+1)x(k+1)+θ(x(k+1)x(k))\boxed{ \begin{align*} & x^{(k+1)} = x^{(k)}-\alpha_k\frac{\partial L}{\partial x}\left(x^{(k)},\lambda^{(k)},\mu^{(k)}\right) \\ & \lambda_i^{(k+1)} = \Big[ \lambda_i^{(k)} + \alpha_k\, g_i\left(x^{(k+1)}\right) \Big]_+, \quad i=1,\ldots,p\\ & \mu_i^{(k+1)} = \mu_i^{(k)} + \alpha_kh_i\left(x^{(k+1)}\right),\quad i=1,\ldots,q \\ & x^{(k+1)}\leftarrow x^{(k+1)}+\theta\left(x^{(k+1)}-x^{(k)}\right) \end{align*} }

Augmented Lagrange Method

Lγ(x,μ)=f(x)+μ,Axb+γ2Axb2L_\gamma(x,\mu) = f(x) + \langle \mu,Ax-b\rangle+\frac \gamma 2 \|Ax-b\|^2

x(k+1)=argminxRn f(x)+γ2Ax(bμ(k)γ)2μ(k+1)=μ(k)+αk(Ax(k+1)b)\begin{align*} x^{(k+1)} &= \operatorname*{argmin}_{x\in\mathbb R^n} ~f(x) + \frac\gamma 2\left\|Ax-\left(b-\frac{\mu^{(k)}}{\gamma}\right)\right\|^2\\ \mu^{(k+1)} &=\mu^{(k)} + \alpha_k(Ax^{(k+1)}-b) \end{align*}

ADMM

对于如下优化问题:

minxRnyRn f(x)+g(y)\min_{\substack{x\in\mathbb R^n\\y\in\mathbb R^n}}~ f(x)+g(y)

x(k+1)=argminxRnLγ(x,y(k),μ(k))y(k+1)=argminyRpLγ(x(k+1),y,μ(k))μ(k+1)=μ(k)+αk(Dx(k+1)y(k+1))\boxed{ \begin{array}{l} x^{(k+1)} =\operatorname*{argmin}_{x\in\mathbb R^n} L_\gamma(x, y^{(k)}, \mu^{(k)}) \\ y^{(k+1)} = \operatorname*{argmin}_{y\in\mathbb R^p} L_\gamma(x^{(k+1)}, y, \mu^{(k)}) \\ \mu^{(k+1)} = \mu^{(k)} + \alpha_k (D x^{(k+1)} - y^{(k+1)}) \end{array} }