Simple Calculations
Expectations
Previously, we showed that under certain conditions and assumptions1 Sampling with replacement; ignoring second order effects., the two expectations from the two methods are given by \[ \begin{align} p\sum_{i \in E} \varepsilon_i \label{eqn:stan} \end{align} \] for the standard approach, and by \[ \begin{align} \sum_{i \in E} \varepsilon_{i} p_{\varepsilon_i} \label{eqn:strat} \end{align} \] for the stratified approach, where \(\varepsilon_{i}\) is the embeddedness level of edge \(i\). What’s shocking is that \(\eqref{eqn:strat}\) simplifies to \[ \begin{align*} \sum_{i \in E_{n}} \varepsilon_{i}, \end{align*} \] where \(E_{n}\) is the set of negative edges in the original graph. To see that, rewrite the summands of over embeddedness levels instead of edges, giving \[ \begin{align} \sum_{k} n_k \cdot p_k \cdot k, \label{eqn:strat2} \end{align} \] where \(n_k = |\left\{ i: \varepsilon_i = k \right\}|\) is the number of edges of embeddedness \(k\), and \(p_k = \frac{b_k}{n_k}\), where \(b_k\) is the number of negative edges of embeddedness \(k\) in the original graph2 \(b\) is for bad.. We can thus simplify further, \[ \begin{align*} \sum_{k} b_k \cdot k. \end{align*} \] But this is just the sum of the embeddedness of all negative ties. So actually the two expectations are \[ \begin{align*} p \sum_{i \in E} \varepsilon_{i} =&\frac{|E_n|}{|E|} \sum_{i \in E} \varepsilon_{i} \\=& \frac{|E_n|}{|E|} \left[ \sum_{i \in E_n} \varepsilon_{i} + \sum_{i \in E_n^{c}} \varepsilon_{i} \right] \end{align*} \] and \[ \begin{align*} \sum_{i \in E_{n}} \varepsilon_{i}. \end{align*} \] Consider the differences, \(\eqref{eqn:stan} - \eqref{eqn:strat}\): \[ \begin{align*} \frac{|E_n|}{|E|} \sum_{i \in E_n^{c}} \varepsilon_{i} - \frac{|E_n^{c}|}{|E|} \sum_{i \in E_{n}} \varepsilon_{i}, \end{align*} \] which reduces to \[ \begin{align*} \frac{|E_n||E_n^{c}|}{|E|}\left( \frac{ \sum_{i \in E_n^{c}} \varepsilon_{i} }{ |E_{n}^{c}| } - \frac{ \sum_{i \in E_n} \varepsilon_{i} }{ |E_{n}| } \right). \end{align*} \] But the term inside the brackets is simply the average embeddedness of positive and negative edges, respectively: \[ \begin{align*} \frac{|E|}{|E_n||E_n^{c}|} \left( \overline{\mathcal{E}_{p}} - \overline{\mathcal{E}_{n}} \right). \end{align*} \] Thus, the expectation of standard is larger than that for stratified when the average embeddedness of positive ties is larger than that for negative ties.