Understanding PAC Learnability on Axis-aligned Rectangle Concepts

Overview

In this post, we will explore the concept of PAC (Probably Approximately Correct) learnability, specifically focusing on the learnability of axis-aligned rectangles in a two-dimensional space. We will provide a simplified definition of PAC learnability, describe the rectangle concept class, and prove that it is PAC-learnable using a specific learning algorithm. This proof will illustrate key techniques commonly used in PAC learning proofs.

1. Simplified Definition of PAC Learnability (Sample Complexity Only)

1.1. Generalization Error

Definition 1. (Generalization Error) Given a hypothesis \(h \in \mathcal{H}\), a target concept \(c \in \mathcal{C}\), and an underlying distribution \(\mathcal{D}\) over the input space \(\mathcal{X}\), the generalization error (or risk) of \(h\) with respect to \(c\) is defined as:

\[Risk(h) = \mathbb{P}_{x \sim \mathcal{D}} [h(x) \neq c(x)] = \mathbb{E}_{x \sim \mathcal{D}} [\mathbb{I}(h(x) \neq c(x))]\]

1.2. PAC Learnability

Definition 2. (PAC Learnability) A concept class \(\mathcal{C}\) is said to be PAC-learnable if there exists a learning algorithm \(\mathcal{A}\) and a polynomial function \(poly(\cdot,\cdot)\), such that for all distributions \(\mathcal{D}\) over the input space \(\mathcal{X}\), for any concept \(c \in \mathcal{C}\), for any \(\epsilon, \delta > 0\), the following holds for any sample size \(m \geq poly(\frac{1}{\epsilon}, \frac{1}{\delta})\):

\[\Pr_{S \sim \mathcal{D}^m} [Risk(h_S) \leq \epsilon] \geq 1 - \delta\]

where \(h_S\) is the hypothesis learned from sample \(S\) through \(\mathcal{A}\), and \(Risk(h_S)\) is the risk of the hypothesis.

To put it simply, a concept class is PAC-learnable if there exists a learning algorithm that can approximate any concept in the class within a small error margin with high probability, given a sufficiently large sample size (but not exceeding the polynomial bounds).

2. The Rectangle Concept Class Case

Figure 1. Target concept R and the learned hypothesis R'.

Example (Learning axisaligned rectangles) As shown in Figure 1, consider the case where the set of instances are points in the \(\mathbb{R}^2\) space, and the concept class \(\mathcal{C}\) is the set of all axis-aligned rectangles lying in \(\mathbb{R}^2\). Thus, each concept \(c\) is the set of points inside a particular axis-aligned rectangle. The learning problem consists of determining with small error a target axis-aligned rectangle using the labeled training sample. Note that, we assume there exists a target rectangle \(c^*\) that can discriminate the positive and negative examples perfectly.

We will show that the concept class of axis-aligned rectangles is PAC-learnable.

2.1. Learning Algorithm

First, we define the learning algorithm $\mathcal{A}$, which takes as input a labeled sample $\mathcal{S} \sim \mathcal{D}^m$ and outputs a hypothesis $h$ corresponding to the tightest axis-aligned rectangle enclosing the positive examples.

\[\begin{array}{l} \textbf{Algorithm 1: Find the Consistent Rectangle} \\ \textbf{Input:} \text{Sample } S = \{(x_1, y_1), \ldots, (x_n, y_n)\}, x_i \in \mathbb{R}^2, y_i \in \{0,1\} \\ \textbf{Output:} \text{Axis-aligned rectangle } R \\[1ex] 1.\quad R \gets \text{Smallest axis-aligned rectangle enclosing all } x_i \text{ such that } y_i = 1 \\ 2.\quad \textbf{return } R \end{array}\]

2.2. Proving PAC Learnability

Now, we will prove that this algorithm PAC-learns the axis-aligned rectangle concept class.

Let \(R \in \mathcal{C}\) be a target concept, and let \(R'\) be the learned hypothesis from the sample \(\mathcal{S}\) using Algorithm 1.

By definition, we know the following statement is true:

Figure 2. Illustration of the hypothesis R' returned by algorithm 1.

For any fixed \(\epsilon > 0\), we want to know the probability of generalization error \(Risk(R') \le \epsilon\). If the probability of the target rectangle \(\mathbb{P}[R] \le \epsilon\), then the generalization error \(Risk(R')\) will never exceed \(\epsilon\), which means \(\mathbb{P}[Risk(R') \le \epsilon] = 1\), it is trivially true. So, we need assume that the target rectangle \(R\) is not too small, i.e., \(\mathbb{P}[R] > \epsilon\). Now, since \(\mathbb{P}[R] > \epsilon\), we proceed to bound the probability of the generalization error \(Risk(R')\).

As shown in Figure 3, we define four rectangular regions \(r_1\), \(r_2\), \(r_3\), and \(r_4\) along each side of the target rectangle \(R\), as shown in Figure 3. Moreover, we ensure each of these regions with probability at least \(\frac{\epsilon}{4}\), i.e., \(\forall i \in [4], \mathbb{P}[r_i] \ge \frac{\epsilon}{4}\). Meanwhile, \(\mathbb{P} [\bigcup_{i=1}^{4} r_i] < \epsilon\). It is worth noting that these regions can indeed be constructed, as \(\mathbb{P}[R] > \epsilon\). (one region can be constructed by starting with the full rectangle \(R\) and then decreasing the size by moving one side as much as possible while keeping the probability at least \(\frac{\epsilon}{4}\)).

Figure 3. Illustration of the constructed regions r_i.

Now, we know that the proposition \(P \to Q\) is true, where

as \(\mathbb{P} [\bigcup_{i=1}^{4} r_i] < \epsilon\), if \(P\) is true, the error region of the hypothesis \(R'\) will be a subset of the union of \(\bigcup_{i=1}^{4} r_i\). Thus, the probability of the error region will be less than \(\epsilon\), i.e., \(Risk(R') < \epsilon\), which means \(Q\) is true. Additionally, we define event \(\bar P\): There exists at least one region among \(r_1, r_2, r_3, r_4\) that contains no sample, which denotes the complement of event \(P\). Let event \(A_i\) be the event that the region \(r_i\) contains no sample, and \(\delta > 0\). Then, We have:

\[\begin{aligned} \mathbb{P}[Risk(R') \le \epsilon] &= \mathbb{P}[Q] && \text{(by definition of Q)} \\ &\overset{(i)}{\ge} \mathbb{P}[P] \\ &\overset{(ii)}{=} 1 - \mathbb{P}[\bar P] \\ &\overset{(iii)}{=} 1 - \mathbb{P}\left [\bigcup_{i=1}^{4} A_i \right ] \\ &\overset{(iv)}{\ge} 1 - \sum_{i=1}^{4} \mathbb{P}\left [A_i \right ] \\ &\overset{(v)}{=} 1 - 4 (1 - \frac{\epsilon}{4})^m \\ &\overset{(vi)}{\ge} 1 - 4 exp(\frac{-m\cdot \epsilon}{4}) \\ \end{aligned}\]

where in (i) we use the fact that \(P \subseteq Q\), in (ii) we use the fact that \(\mathbb{P}[P] = 1 - \mathbb{P}[\bar P]\), in (iii) we rewrite the term, in (iv) we use the the union bound, in (v) we use the fact that \(\mathbb{P}[A_i] = (1 - \frac{\epsilon}{4})^m\), and in (vi) we use the fact that \(1 - x \le exp(-x)\) for any \(x > 0\).

If we impose that \(\mathbb{P}[Risk(R') \le \epsilon] = 1 - 4 exp(\frac{-m\cdot \epsilon}{4}) \ge 1 - \delta\), which will be satisfied if:

\[4 exp(\frac{-m\cdot \epsilon}{4}) \le \delta \\ \Leftrightarrow m \ge \frac{4}{\epsilon} \log(\frac{4}{\delta})\]

Therefore, we can conclude that the concept class of axis-aligned rectangles is PAC-learnable with the sample complexity of \(m \ge \frac{4}{\epsilon} \log(\frac{4}{\delta})\).

Q.E.D.

2.3. Tricks for PAC Learning Proofs

3. Aknowledgements

This post is written with reference to: