多元复合函数二阶导数与向量微积分的思考
引入
对于形似\(z=f(u_1,u_2,…,u_n),\)其中\(u_i=g_i(x_i)\)的多元复合函数,对其二阶导数的考察常常会经过繁琐而重复的运算,且容易在连续运用链式法则时犯错。本文将提出该类题型的通解以及理论推导过程供参考。
例1:设\(z=f(x^2-y^2,e^{xy})\),其中\(f\)具有二阶连续偏导数,求 \(\frac{\partial ^2z}{ \partial x \partial y}\).
通过链式法则,我们可以得到结果\(\frac{\partial ^2z}{ \partial x \partial y}=-4xyf^{”}_{11}+2(x^2-y^2)e^{xy}f^{”}_{12}+xye^{2xy}f{”}_{22}+e^{xy}(1+xy)f^{‘}_2\)
对于式子中的\(f^{”}_{11}、f^{”}_{12}\)的出现,我们可以联想到矩阵的下标,由此引发我们对该式子简化形式甚至该类题型通解的思考。
梯度矩阵
我们定义[1],对于一个函数\(f: ℝ^n\rightarrow ℝ ,\pmb{x} \rightarrow f(\pmb{x}),\pmb{x}\in ℝ^n\),即,\(\pmb{x}=[x_1,x_2,x_3,…,x_n]^T\),偏导数为:
\[\frac{\partial f}{\partial x_1}= \lim_{h \rightarrow 0} f\frac{(x_1+h,x_2,…,x_n)-f(\pmb{x})}{h}\\.\\.\\.\\\frac{\partial f}{\partial x_n}= \lim_{h \rightarrow 0}\frac{f(x_1,x_2,…,x_n+h)-f(\pmb{x})}{h} \tag{2.1} \]
我们写作行向量的形式,记作:
\[∇_{\pmb{x}}f=grad\ f=\left[\begin{matrix}\frac{\partial f(\pmb{x})}{\partial x_1} & \frac{\partial f(\pmb{x})}{\partial x_1} & … & \frac{\partial f(\pmb{x})}{\partial x_n}\\\end{matrix} \right] \inℝ^n \tag{2.2} \]
例如,对于函数\(f(x,y)=(x+2y^3)^2\),我们有:
\[∇f=\left[\begin{matrix}2(x+2y^3) & 12(x+2y^3)y^2\end{matrix} \right] \inℝ^{1×2} \tag{2.3} \]
为了探求文章开始所提出问题通解形式的探讨,绕不开的一个重要步骤是对梯度矩阵\(∇f\)进行求导,我们将在推导的过程中单独进行分析。
多元复合函数的二阶导数与黑塞矩阵
设\(z=f(u_1,u_2,…,u_n),\)其中\(u_i=g_i(x_i)\),求\(\frac{\partial ^2z}{ \partial x_i \partial x_j}\).
\[\frac{\partial z}{ \partial x_i}=\frac{\partial z}{ \partial \pmb{u} }·\frac{\partial \pmb{u}}{ \partial x_i} =\left[\begin{matrix}\frac{\partial f}{\partial u_1} & \frac{\partial f}{\partial u_2} & … & \frac{\partial f}{\partial u_n}\end{matrix} \right] \left[\begin{matrix}\frac{\partial u_1}{\partial x_i} \\ \frac{\partial u_2}{\partial x_i} \\ … \\ \frac{\partial u_n}{\partial x_i}\end{matrix} \right] \tag{3.1} \]
为了简化形式,我们令:
\[\pmb{X_i}=\left[\begin{matrix}\frac{\partial u_1}{\partial x_i} & \frac{\partial u_2}{\partial x_i} & … & \frac{\partial u_n}{\partial x_i}\end{matrix} \right]^T \tag{3.2} \]
那么:
\[\frac{\partial z}{ \partial x_i}=∇_{\pmb{u}}f·\pmb{X_i} \tag{3.3} \]
接下来,我们需要求解
\[\frac{\partial {}}{ \partial x_i}(∇_{\pmb{u}}f·\pmb{X_i} \tag{3.4}) \]
\[\frac{\partial {}}{ \partial x_j}(∇_{\pmb{u}}f·\pmb{X_i} )=\frac{\partial {}}{ \partial x_j}∇_{\pmb{u}}f·\pmb{X_i} + ∇_{\pmb{u}}f·\frac{\partial {}}{ \partial x_j}\pmb{X_i} \tag{3.5} \]
\(\frac{\partial {}}{ \partial x_j}\pmb{X_i}\)的答案容易得到的,我们着重于讨论\(\frac{\partial {}}{ \partial x_j}∇_{\pmb{u}}f·\pmb{X_i}\),尤其是\(\frac{\partial {}}{ \partial x_j}∇_{\pmb{u}}f\)的结果。
经分析:
\[\frac{\partial {}}{ \partial x_j}∇_{\pmb{u}}f=\frac{\partial {}}{ \partial \pmb{u}^T}·\frac{\partial {\pmb{u}^T}}{ \partial x_j}·∇_{\pmb{u}}f=\frac{\partial {\pmb{u}^T}}{ \partial x_j}·\frac{\partial {}}{ \partial \pmb{u}^T}·∇_{\pmb{u}}f \tag{3.6} \]
问题被简化转化为解决向量\((∇_{\pmb{u}}f)\)对向量\((\pmb{u}^T)\)求导的问题。
我们对这个运算进行进一步分析,这个运算的实质是梯度矩阵中的元素逐个对\(u_i\)分别求导,结果显然是一个\(2×2\)的方阵,而这个矩阵在数学上被定义为 黑塞矩阵(Hessian Matrix),记作\(H(f)\),它的具体形式是:
\[A= \left[\begin{matrix} \frac{\partial^2 f}{\partial x_1\partial x_1} & \frac{\partial^2 f}{\partial x_1\partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1\partial x_n}\\ \frac{\partial^2 f}{\partial x_2\partial x_1} & \frac{\partial^2 f}{\partial x_2\partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_2\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n\partial x_1} & \frac{\partial^2 f}{\partial x_n\partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1\partial x_n} \end{matrix} \right]\tag{3.7} \]
其规律是显而易见的。
于是,引入\(H(f)\)后,我们可以继续化简:
\[\frac{\partial {\pmb{u}^T}}{ \partial x_j}·\frac{\partial {}}{ \partial \pmb{u}^T}·∇_{\pmb{u}}f=\frac{\partial {\pmb{u}^T}}{ \partial x_j}·\left[\begin{matrix} \frac{\partial^2 f}{\partial u_1\partial u_1} & \frac{\partial^2 f}{\partial u_1\partial u_2} & \cdots & \frac{\partial^2 f}{\partial x_1\partial u_n}\\ \frac{\partial^2 f}{\partial x_2\partial x_1} & \frac{\partial^2 f}{\partial u_2\partial u_2} & \cdots & \frac{\partial^2 f}{\partial u_2\partial u_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial u_n\partial u_1} & \frac{\partial^2 f}{\partial u_n\partial u_2} & \cdots & \frac{\partial^2 f}{\partial u_1\partial u_n} \end{matrix} \right]=\pmb{X_j}^T·H_{\pmb{u}}(f)\tag{3.8} \]
所以
\[\frac{\partial ^2z}{ \partial x_i \partial x_j}=\pmb{X_j}^T·H_{\pmb{u}}(f)·\pmb{X_i}+∇_{\pmb{u}}f·\frac{\partial {}}{ \partial x_j}\pmb{X_i}=\pmb{X_j}^T·H_{\pmb{u}}(f)·\pmb{X_i}+∇_{\pmb{u}}f·\pmb{X_{ij}}\tag{3.9} \]
其中
\[\pmb{X_{ij}}=\left[\begin{matrix}\frac{\partial^2 u_1}{\partial x_i\partial x_j} & \frac{\partial^2 u_2}{\partial x_i\partial x_j} & … & \frac{\partial^2 u_n}{\partial x_i\partial x_j}\end{matrix} \right]^T\tag{3.10} \]
当然在实际计算过程中,由于\(\pmb{X_i}\)的值已经被计算,所以直接计算\(\frac{\partial {}}{ \partial x_j}\pmb{X_i}\)或许更为便捷。
总结
设\(z=f(u_1,u_2,…,u_n),\)其中\(u_i=g_i(x_i)\),求\(\frac{\partial ^2z}{ \partial x_i \partial x_j}\).
\[\frac{\partial z}{ \partial x_i}=∇_{\pmb{u}}f·\pmb{X_i} \\\frac{\partial ^2z}{ \partial x_i \partial x_j}=\pmb{X_j}^T·H_{\pmb{u}}(f)·\pmb{X_i}+∇_{\pmb{u}}f·\pmb{X_{ij}}\tag{end} \]
参考
- [1] 《MATHEMATICS FOR MACHINE LEARNING》(Marc Peter Deisenroth,A. Aldo Faisal ,Cheng Soon Ong)