Review
Lemma:
Let \(\left\{ X_i, i=1,\ldots,n \right\}\) be a monotone non-decreasing sequence1 \(X_i \leq X_j\) for \(i < j\). Then, \[ \begin{align*} \max_{\mathbf{S} + \mathbf{T} = \mathbf{N}} \overline{X}_{\mathbf{S}} + \overline{X}_{\mathbf{T}} = \overline{X}_{\mathbf{N}-n} + X_n, \end{align*} \] where \(\mathbf{N} = [n]\).
Proof:
It suffices to prove that \[ \begin{align} X_n + \overline{X}_{\mathbf{N}-n} \geq \overline{X}_{\mathbf{S}} + \overline{X}_{\mathbf{T}} \label{eqn:ineq} \end{align} \] for all \(\mathbf{S} + \mathbf{T} = \mathbf{N}\). Rewrite \(\eqref{eqn:ineq}\) as \[ \begin{align*} X_n + \frac{X_{\mathbf{S}-n} + X_{\mathbf{T}}}{n-1} &\geq \frac{X_n}{s} + \frac{X_{\mathbf{S}-n}}{s} + \frac{X_{\mathbf{T}}}{t} \\ st X_n + \frac{st}{n-1} X_{\mathbf{S}-n} + \frac{st}{n-1} X_{\mathbf{T}} &\geq t X_n + t X_{\mathbf{S}-n} + s X_{\mathbf{T}} \\ (s-1)t X_n &\geq t\left( 1 - \frac{s}{n-1} \right) X_{\mathbf{S}-n} + s\left( 1 - \frac{t}{n-1} \right) X_{\mathbf{T}} \\ X_n &\geq \left( 1 - \frac{s}{n-1} \right) \overline{X}_{\mathbf{S}-n} + \frac{s}{s-1}\left( 1 - \frac{t}{n-1} \right) \overline{X}_{\mathbf{T}} \end{align*} \] To show this inequality holds, it suffices to show that the sum of the coefficients on the RHS add up to at most 1, since then it’s simply an average of averages of numbers less than or equal to \(X_n\). But \[ \begin{align*} 1 - \frac{s}{n-1} + \frac{s}{s-1}\left( 1 - \frac{t}{n-1} \right) &= 1 + \left( \frac{s}{s-1} - \frac{s}{n-1} - \frac{st}{(s-1)(n-1)} \right) \\ &= 1 + \frac{s}{s-1}\left( 1 - \frac{s-1}{n-1} - \frac{t}{(n-1)} \right)\\ &= 1 + 0 = 1 \end{align*} \]
Computational Efficiencies
This is inspired by Theorem 1 in the Appendix of Su et al. (2009Su, Xiaogang, Chih-Ling Tsai, Hansheng Wang, David M Nickerson, and Bogong Li. 2009. “Subgroup Analysis via Recursive Partitioning.” Journal of Machine Learning Research 10 (December): 141–58.), which in turn was inspired by Theorem 9.6 of Breiman et al. (1984Breiman, Leo, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC press.).
Recall that our optimization function is \[ \begin{align} |L|\max(\overline{y}_{L^{A}}, \overline{y}_{L^{B}}) + |R|\max(\overline{y}_{R^{A}}, \overline{y}_{R^{B}}) \label{eqn:opt_fun} \end{align} \] For now, we’re going to work with a different optimization function, without the size of the sets: \[ \begin{align} \max(\overline{y}_{L^{A}}, \overline{y}_{L^{B}}) + \max(\overline{y}_{R^{A}}, \overline{y}_{R^{B}}) \label{eqn:opt_fun_simp} \end{align} \] Case 1: Suppose \(y_{i^{A}}- y_{i^{B}} > 0\) for all \(i\). Then, the max functions will all return the \(A\) side. Thus, \(\eqref{eqn:opt_fun_simp}\) reduces to \[ \begin{align*} \overline{y}_{L^{A}} + \overline{y}_{R^{A}} \end{align*} \] Thus, the optimization problem is reduced to finding the 2-partitions that maximises the sum of the averages in the \(A\) treatment. We will show that this optimization problem is equivalent to sorting the \(y^{A}\) and finding a split (like a continuous split).
Let \(L\), \(R\) be the optimal split, and without loss of generality, let \(R\) be the bigger side. That is, \[ \begin{align} \overline{y}_{L} + \overline{y}_{R} \geq \overline{y}_{S} + \overline{y}_{T} \label{eqn:simple_ineq} \end{align} \] for any other 2-partitions \((S,T)\). Note that we have dropped the \(A\) for notational brevity. We want to prove that \[ \begin{align*} y_l \leq y_r \end{align*} \] for all \(l \in L\) and \(r \in R\), which is equivalent to our claim. It suffices to prove \[ \begin{align*} y_{l*} = \max_{l \in L} y_l \leq \min_{r \in R} y_r = y_{r*} \end{align*} \] Here’s another notational simplicity. From \(\eqref{eqn:simple_ineq}\), \[ \begin{align*} \overline{y}_{12} + \overline{y}_{34} &\geq \overline{y}_{1} + \overline{y}_{234} \\ &= \frac{\overline{y}_{12}n_{12}-\overline{y}_{2}n_{2}}{n_{1}} + \frac{\overline{y}_{34}n_{34}+\overline{y}_{2}n_{2}}{n_{234}} \end{align*} \] which simplifies to \[ \begin{align*} \overline{y}_{12} (n_{234}n_{1} - n_{12}n_{234} ) + \overline{y}_{34} (n_{234}n_{1} - n_{34}n_{1} ) &\geq \overline{y}_{2}n_2(n_1 - n_{234}) \\ -\overline{y}_{12} n_{234} + \overline{y}_{34} n_{1} &\geq \overline{y}_{2}(n_1 - n_{234}) \end{align*} \] Similary, we have \[ \begin{align*} -\overline{y}_{12} n_{123} + \overline{y}_{34} n_{4} &\geq \overline{y}_{3}(n_4 - n_{123}) \end{align*} \] Now, we know that \(\overline{y}_{34} > \overline{y}_{12}\)