4.6: CONVEX FUNCTIONS AND DERIVATIVES (2023)

  1. Last updated
  2. Save as PDF
  • Page ID
    49118
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}}}\) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    We discuss in this section a class of functions that plays an important role in optimization problems.

    4.6: CONVEX FUNCTIONS AND DERIVATIVES (2)

    Figure \(4.6\): A Convex Function.

    Definition \(\PageIndex{1}\)

    Let \(I\) be an interval of \(\mathbb{R}\) and let \(f: I \rightarrow \mathbb{R}\). We say that \(f\) is convex on \(I\) if \[f(\lambda u+(1-\lambda) v) \leq \lambda f(u)+(1-\lambda) f(v)\] for all \(u, v \in I\) and for all \(\lambda \in (0, 1)\).

    Example \(\PageIndex{1}\)

    The following functions are convex.

    1. \(f: \mathbb{R} \rightarrow \mathbb{R}\), \(f(x)=x\).
    2. \(f: \mathbb{R} \rightarrow \mathbb{R}\), \(f(x)=x^{2}\).
    3. \(f: \mathbb{R} \rightarrow \mathbb{R}\), \(f(x)=|x|\).

    Solution

    1. This is straightforward.
    2. Here not first that \(2 x y \leq x^{2}+y^{2}\) for all real numbers \(x\), \(y\). Then, if \(0<\lambda<1\) and \(x, y \in \mathbb{R}\), we get \[\begin{aligned}
      f(\lambda x+(1-\lambda) y) &=(\lambda x+(1-\lambda) y)^{2} \\
      &=\lambda^{2} x^{2}+2 \lambda(1-\lambda) x y+(1-\lambda)^{2} y^{2} \\
      & \leq \lambda^{2} x^{2}+\lambda(1-\lambda)\left(x^{2}+y^{2}\right)+(1-\lambda)^{2} y^{2} \\
      &=\lambda\left(\lambda x^{2}+(1-\lambda) x^{2}\right)+(1-\lambda)\left(\lambda y^{2}+(1-\lambda) y^{2}\right) \\
      &=\lambda x^{2}+(1-\lambda) y^{2} \\
      &=\lambda f(x)+(1-\lambda) f(y) \text {.}
      \end{aligned}\]
    3. This follows from the triangle inequality and other basic properties of absolute value.

    Theorem \(\PageIndex{1}\)

    Let \(I\) be an interval of \(\mathbb{R}\). A function \(f: I \rightarrow \mathbb{R}\) is convex if and only if for every \(\lambda_{i} \geq 0, i=1, \ldots, n\), with \(\sum_{i=1}^{n} \lambda_{i}=1\) \((n \geq 2)\) and for every \(x_{i} \in I\), \(\i=1, \dots, n\), \[f\left(\sum_{i=1}^{n} \lambda_{i} x_{i}\right) \leq \sum_{i=1}^{n} \lambda_{i} f\left(x_{i}\right) .\]

    Proof

    Since the converse holds trivially, we only need to prove that implication by induction. The conclusions holds for \(n = 2\) by the definition of convexity. Let \(k\) be such that the conclusion holds for any \(n\) with \(2 \leq n \leq k\). We will show that it also holds for \(n = k + 1\). Fix \(\lambda_{i} \geq 0, i=1, \ldots, k+1\), with \(\sum_{i=1}^{k+1} \lambda_{i}=1\) and fix every \(x_{i} \in I, i=1, \ldots, k+1\). Then \[\sum_{i=1}^{k} \lambda_{i}=1-\lambda_{k+1} .\] If \(\lambda_{k+1}=1\), then \(\lambda_{i}=0\) for all \(i=1, \ldots, k\), and (4.6.2) holds. Suppose \(0 \leq \lambda_{k+1}<1\). Then, for each \(i=1, \ldots, k\), \(\lambda_{i} /\left(1-\lambda_{k+1}\right) \geq 0\) and \[\sum_{i=1}^{k} \frac{\lambda_{i}}{1-\lambda_{k+1}}=1 .\] It follows that \[\begin{aligned}
    f\left(\sum_{i=1}^{k+1} \lambda_{i} x_{i}\right) &=f\left[\left(1-\lambda_{k+1}\right) \frac{\sum_{i=1}^{k} \lambda_{i} x_{i}}{1-\lambda_{k+1}}+\lambda_{k+1} x_{k+1}\right] \\
    & \leq\left(1-\lambda_{k+1}\right) f\left(\frac{\sum_{i=1}^{k} \lambda_{i} x_{i}}{1-\lambda_{k+1}}\right)+\lambda_{k+1} f\left(x_{k+1}\right) \\
    &=\left(1-\lambda_{k+1}\right) f\left(\sum_{i=1}^{k} \frac{\lambda_{i}}{1-\lambda_{k+1}} x_{i}\right)+\lambda_{k+1} f\left(x_{k+1}\right) \\
    & \leq\left(1-\lambda_{k+1}\right) \sum_{i=1}^{k} \frac{\lambda_{i}}{1-\lambda_{k+1}} f\left(x_{i}\right)+\lambda_{k+1} f\left(x_{k+1}\right) \\
    &=\sum_{i=1}^{k+1} \lambda_{i} f\left(x_{i}\right) ,
    \end{aligned}\] where the first inequality follows from the definition of convexity (or is trivial if \(\lambda_{k+1}=0\)) and the last inequality follows from the inductive assumption. The proof is now complete. \(\square\)

    (Video) 17 - Convex functions

    Theorem \(\PageIndex{2}\)

    Let \(I\) be an interval and let \(f: I \rightarrow \mathbb{R}\) be a convex function. Then \(f\) has a local minimum at \(\bar{x}\) if and only if \(f\) has an absolute minimum at \(\bar{x}\).

    Proof

    Clearly if \(f\) has a global minimum at \(\bar{x}\), then it also has a local minimum at \(\bar{x}\).

    Conversely, suppose that \(f\) has a local minimum at \(\bar{x}\). Then there exists \(\delta > 0\) such that \[f(u) \geq f(\bar{x}) \text { for all } u \in B(\bar{x} ; \delta) \cap I .\] For any \(x \in I\), we have \(x_{n}=\left(1-\frac{1}{n}\right) \bar{x}+\frac{1}{n} x \rightarrow \bar{x}\). Thus, \(x_{n} \in B(\bar{x} ; \delta) \cap I\) when \(n\) is sufficiently large. Thus, for such \(n\), \[f(\bar{x}) \leq f\left(x_{n}\right) \leq\left(1-\frac{1}{n}\right) f(\bar{x})+\frac{1}{n} f(x) .\] This implies that for a sufficient large \(n\), we have \[\frac{1}{n} f(\bar{x}) \leq \frac{1}{n} f(x)\] and, hence, \(f(\bar{x}) \leq f(x)\). Since \(x\) was arbitrary, this shows \(f\) has an absolute minimum at \(\bar{x}\). \(square\)

    Theorem \(\PageIndex{3}\)

    Let \(I\) be an open interval and let \(f: I \rightarrow \mathbb{R}\) be a convex function. Suppose \(f\) is differentiable at \(\bar{x}\). Then \[f^{\prime}(\bar{x})(x-\bar{x}) \leq f(x)-f(\bar{x}) \text { for all } x \in I .\]

    Proof

    For any \(x \in I\) and \(t \in (0, 1)\), we have \[\begin{aligned}
    \frac{f(\bar{x}+t(x-\bar{x}))-f(\bar{x})}{t} &=\frac{f(t x+(1-t) \bar{x})-f(\bar{x})}{t} \\
    & \leq \frac{t f(x)+(1-t) f(\bar{x})-f(\bar{x})}{t} \\
    &=f(x)-f(\bar{x}) \text {.}
    \end{aligned}\] Since \(f\) is differentiable at \(\bar{x}\), \[f^{\prime}(\bar{x})(x-\bar{x})=\lim _{t \rightarrow 0^{+}} \frac{f(\bar{x}+t(x-\bar{x}))-f(\bar{x})}{t} \leq f(x)-f(\bar{x}) ,\] which completes the proof. \(\square\)

    Corollary \(\PageIndex{4}\)

    Let \(I\) be open interval and let \(f: I \rightarrow \mathbb{R}\) be a convex function. Suppose \(f\) is differentiable at \(\bar{x}\). Then \(f\) has an absolute minimum at \(\bar{x}\) if and only if \(f^{\prime}(\bar{x})=0\).

    Proof

    Suppose \(f\) has an absolute minimum at \(\bar{x}\). By Theorem 4.2.1, \(f^{\prime}(\bar{x})=0\). Let us prove the converse. Suppose \(f^{\prime}(\bar{x})=0\). It follows from Theorem 4.6.3 that \[0=f^{\prime}(\bar{x})(x-\bar{x}) \leq f(x)-f(\bar{x}) \text { for all } x \in I .\] This implies \[f(x) \geq f(\bar{x}) \text { for all } x \in I .\] Thus, \(f\) has an absolute minimum at \(\bar{x}\). \(\square\)

    (Video) Understanding Concave and Convex Functions

    Lemma \(\PageIndex{5}\)

    Let \(I\) be an open interval and suppose \(f: I \rightarrow \mathbb{R}\) is a convex function. Fix \(a, b, x \in I\) with \(a < x < b\). Then \[\frac{f(x)-f(a)}{x-a} \leq \frac{f(b)-f(a)}{b-a} \leq \frac{f(b)-f(x)}{b-x} .\]

    Proof

    Let \[t=\frac{x-a}{b-a} .\] Then \(t \in (0, 1) \) and \[f(x)=f(a+(x-a))=f\left(a+\frac{x-a}{b-a}(b-a)\right)=f(a+t(b-a))=f(t b+(1-t) a) .\] By convexity of \(f\), we obtain \[f(x) \leq t f(b)+(1-t) f(a) .\] Thus, \[f(x)-f(a) \leq t f(b)+(1-t) f(a)-f(a)=t[f(b)-f(a)]=\frac{x-a}{b-a}(f(b)-f(a)) .\] Equivalently, \[\frac{f(x)-f(a)}{x-a} \leq \frac{f(b)-f(a)}{b-a} .\] Similarly, \[f(x)-f(b) \leq t f(b)+(1-t) f(a)-f(b)=(1-t)[f(a)-f(b)]=\frac{x-b}{b-a}[f(b)-f(a)] .\] It follows that \[\frac{f(b)-f(a)}{b-a} \leq \frac{f(b)-f(x)}{b-x} .\] The proof is now complete. \(\square\)

    Theorem \(\PageIndex{6}\)

    Let \(I\) be an open interval and let \(f: I \rightarrow \mathbb{R}\) be a differentiable function. Then \(f\) is convex if and only if \(f^{\prime}\) is increasing on \(I\).

    Proof

    Suppose \(f\) is convex. Fix \(a < b\) with \(a, b \in I\). By Lemma 4.6.5, for any \(x \in (a, b)\), we have \[\frac{f(x)-f(a)}{x-a} \leq \frac{f(b)-f(a)}{b-a} .\] This implies, taking limits, that \[f^{\prime}(a) \leq \frac{f(b)-f(a)}{b-a} .\] Similarly, \[\frac{f(b)-f(a)}{b-a} \leq f^{\prime}(b) .\] Therefore, \(f^{\prime}(a) \leq f^{\prime}(b)\), and \(f^{\prime}\) is an increasing function.

    Let us prove the converse. Suppose \(f^{\prime}\) is increasig. Fix \(x_{1}<x_{2}\) and \(t \in (0, 1)\). Then \[x_{1}<x_{t}<x_{2} ,\] where \(x_{t}=t x_{1}+(1-t) x_{2}\). By the Mean Value Theorem (Theorem 4.2.3), there exists \(c_{1}\) and \(c_{2}\) such that \[x_{1}<c_{1}<x_{t}<c_{2}<x_{2}\] with \[\begin{array}{l}
    f\left(x_{t}\right)-f\left(x_{1}\right)=f^{\prime}\left(c_{1}\right)\left(x_{t}-x_{1}\right)=f^{\prime}\left(c_{1}\right)(1-t)\left(x_{2}-x_{1}\right) \text {;} \\
    f\left(x_{t}\right)-f\left(x_{2}\right)=f^{\prime}\left(c_{2}\right)\left(x_{t}-x_{2}\right)=f^{\prime}\left(c_{2}\right) t\left(x_{1}-x_{2}\right) \text {.}
    \end{array}\]l This implies \[\begin{array}{l}
    t f\left(x_{t}\right)-t f\left(x_{1}\right)=f^{\prime}\left(c_{1}\right) t(1-t)\left(x_{2}-x_{1}\right) \text {;} \\
    (1-t) f\left(x_{t}\right)-(1-t) f\left(x_{2}\right)=f^{\prime}\left(c_{2}\right) t(1-t)\left(x_{1}-x_{2}\right) \text {.}
    \end{array}\] Since \(f^{\prime}\left(c_{1}\right) \leq f^{\prime}\left(c_{2}\right)\), we have \[t f\left(x_{t}\right)-t f\left(x_{1}\right)=f^{\prime}\left(c_{1}\right) t(1-t)\left(x_{2}-x_{1}\right) \leq f^{\prime}\left(c_{2}\right) t(1-t)\left(x_{2}-x_{1}\right)=(1-t) f\left(x_{2}\right)-(1-t) f\left(x_{t}\right) .\] Rearranging terms, we get \[f\left(x_{t}\right) \leq t f\left(x_{1}\right)+(1-t) f\left(x_{2}\right) .\] Therefore, \(f\) is convex. The proof is now complete. \(\square\)

    Corollary \(\PageIndex{7}\)

    Let \(I\) be an open interval and let \(f: I \rightarrow \mathbb{R}\) be a function. Suppose \(f\) is twice differentiable on \(I\). Then \(f\) is convex if and only if \(f^{\prime \prime}(x) \geq 0\) for all \(x \in I\).

    Proof

    It follows from Proposition 4.3.2 that \(f^{\prime \prime}(x) \geq 0\) for all \(x \in I\) if and only if the derivative function \(f^{\prime}\) is increasing on \(I\). The conclusion then follows directlye from Theorem 4.6.6. \(\square\)

    (Video) First-order optimality conditions for unconstrained optimization

    Example \(\PageIndex{2}\)

    Consider the function \(f: \mathbb{R} \rightarrow \mathbb{R}\) given by \(f(x)=\sqrt{x^{2}+1}\).

    Solution

    Now, \(f^{\prime}(x)=x / \sqrt{x^{2}+1}\) and \(f^{\prime \prime}(x)=1 /\left(x^{2}+1\right)^{3 / 2}\). Since \(f^{\prime \prime}(x) \geq 0\) for all \(x\), it follows from the corollary that \(f\) is convex.

    Theorem \(\PageIndex{8}\)

    Let \(I\) be an open interval and let \(f: I \rightarrow \mathbb{R}\) be a convex function. Then it is locally Lipschitz continuous in the sense that for any \(\bar{x} \in I\), there exists \(\ell \geq 0\) and \(\delta > 0\) such that \[|f(u)-f(v)| \leq \ell|u-v| \text { for all } u, v \in B(\bar{x} ; \delta) .\] In particular, \(f\) is continuous.

    Proof

    Fix any \(\bar{x} \in I\). Choose four numbers \(a, b, c, d\) satisfying \[a<b<\bar{x}<c<d \text { with } a, d \in I .\] Choose \(\delta > 0\) such that \(B(\bar{x} ; \delta) \subset(b, c)\). Let \(u, v \in B(\bar{x} ; \delta)\) with \(v < u\). Then by Lemma 4.6.5, we see that \[\frac{f(b)-f(a)}{b-a} \leq \frac{f(u)-f(a)}{u-a} \leq \frac{f(u)-f(v)}{u-v} \leq \frac{f(d)-f(v)}{d-v} \leq \frac{f(d)-f(c)}{d-c} .\] Using a similar approach for the case \(u < v\), we get \[\frac{f(b)-f(a)}{b-a} \leq \frac{f(u)-f(v)}{u-v} \leq \frac{f(d)-f(c)}{d-c} \text { for all } u, v \in B(\bar{x} ; \delta) .\] Choose \(\ell \geq 0\) sufficiently large so that \[-\ell \leq \frac{f(b)-f(a)}{b-a} \leq \frac{f(u)-f(v)}{u-v} \leq \frac{f(d)-f(c)}{d-c} \leq \ell \text { for all } u, v \in B(\bar{x} ; \delta) .\] Then (4.6.29 holds. The proof is now complete. \(\square\)

    Exercise \(\PageIndex{1}\)

    1. Let \(I\) be an interval and let \(f, g: I \rightarrow \mathbb{R}\) be convex functions. Prove that \(cf\), \(f + g\), and \(\max \{f, g\}\) are convex functions on \(I\), where \(c \geq 0\) is a constant.
    2. Find two convex functions \(f\) and \(g\) on an interval \(I\) such that \(f \cdot g\) is not convex.
    Answer

    Add texts here. Do not delete this text first.

    (Video) Lecture 3 | Convex Functions | Convex Optimization by Dr. Ahmad Bazzi

    Exercise \(\PageIndex{2}\)

    Let \(f: \mathbb{R} \rightarrow \mathbb{R}\) be a convex function. Given \(a, b \in \mathbb{R}\), prove that the function defined by \[g(x)=f(a x+b), \text { for } x \in \mathbb{R}\] is also a convex function on \(\mathbb{R}\).

    Answer

    Add texts here. Do not delete this text first.

    Exercise \(\PageIndex{3}\)

    Let \(I\) be an interval and let \(f: I \rightarrow \mathbb{R}\) be a convex function. Suppose that \(\phi\) is a convex, increasing function on an interval \(J\) that contains \(f(I)\). Prove that \(\phi \circ f\) is convex on \(I\).

    Answer

    Add texts here. Do not delete this text first.

    Exercise \(\PageIndex{4}\)

    Prove that each of the following functions is convex on the given domain:

    1. \(f(x)=e^{b x}\), \(x \in \mathbb{R}\), where \(b\) is a constant.
    2. \(f(x)=x^{k}\), \(x \in[0, \infty)\) and \(k \geq 1\) is a constant.
    3. \(f(x)=-\ln (1-x)\), \(x \in(-\infty, 1)\).
    4. \(f(x)=-\ln \left(\frac{e^{x}}{1+e^{x}}\right)\), \(x \in \mathbb{R}\).
    5. \(f(x)=x \sin x\), \(x \in\left(-\frac{\pi}{4}, \frac{\pi}{4}\right)\).
    Answer

    Add texts here. Do not delete this text first.

    Exercise \(\PageIndex{5}\)

    Prove the following:

    1. If \(a, b\) are nonnegative real numbers, then \[\frac{a+b}{2} \geq \sqrt{a b} .\]
    2. If \(a_{1}, a_{2}, \ldots, a_{n}\), where \(n \geq 2\), are nonnegative real numbers, then \[\frac{a_{1}+a_{2}+\cdots+a_{n}}{n} \geq\left(a_{1} \cdot a_{2} \cdots a_{n}\right)^{1 / n} .\]
    Answer

    Add texts here. Do not delete this text first.

    Videos

    1. Intermediate Microeconomics: Convex and Concave functions
    (Department of Economics)
    2. MA9714: 47 - KKT and Convexity
    (Michael Ritter)
    3. Math400 - Functional Analysis - Section 4.6 and 4.7
    (Mathematics by Dr. Gebran)
    4. 4.6 r-Arborescences
    (Constantine Caramanis)
    5. 3.2 Smooth and Strongly Convex Functions
    (Constantine Caramanis)
    6. Lecture 3 | Convex Optimization I (Stanford)
    (Stanford)
    Top Articles
    Latest Posts
    Article information

    Author: Maia Crooks Jr

    Last Updated: 02/28/2023

    Views: 5970

    Rating: 4.2 / 5 (43 voted)

    Reviews: 90% of readers found this page helpful

    Author information

    Name: Maia Crooks Jr

    Birthday: 1997-09-21

    Address: 93119 Joseph Street, Peggyfurt, NC 11582

    Phone: +2983088926881

    Job: Principal Design Liaison

    Hobby: Web surfing, Skiing, role-playing games, Sketching, Polo, Sewing, Genealogy

    Introduction: My name is Maia Crooks Jr, I am a homely, joyous, shiny, successful, hilarious, thoughtful, joyous person who loves writing and wants to share my knowledge and understanding with you.