Please note that it is \textit{assumed} that the original call to \onepass consists of a call on an input circuit \circuit such that the values of members \prt, \lwght and \rwght have been initialized to Null across all gates.
Let $\etree$ encode the expression $(X + Y)(X - Y)+ Y^2$. After one pass, \Cref{alg:one-pass-iter} would have computed the following weight distribution. For the two inputs of the sink gate $\circuit$, $\circuit.\lwght=\frac{4}{5}$ and $\circuit.\rwght=\frac{1}{5}$. Similarly, for $\stree$ denoting the left input $\circuit_{\lchild}$ of \circuit, $\stree.\lwght=\stree.\rwght=\frac{1}{2}$. This is depicted in \Cref{fig:expr-tree-T-wght}.
We prove the correct computation of \prt, \lwght, \rwght values on \circuit by induction over the number of iterations in the topological order \topord (line~\ref{alg:one-pass-loop}) of the input circuit \circuit. \topord follows the standard definition of a topological ordering over the DAG structure of \circuit.
For the base case, we have only one gate, which by definition is a source gate and must be either \var or \tnum. In this case, as per \cref{eq:T-all-ones}, lines~\ref{alg:one-pass-var} and~\ref{alg:one-pass-num} correctly compute \circuit.\prt as $1$.
For the inductive hypothesis, assume that \onepass correctly computes \subcircuit.\prt, \subcircuit.\lwght, and \subcircuit.\rwght for all gates \gate in \circuit with $k \geq0$ iterations over \topord.
We now prove for $k +1$ iterations that \onepass correctly computes the \prt, \lwght, and \rwght values for each gate $\gate_\vari{i}$ in \circuit for $i \in[k +1]$.
The $\gate_\vari{k +1}$ must be in the last ordering of all gates $\gate_\vari{i}$. When \size(\circuit) > 1, if $\gate_{k+1}$ is a leaf node, we are back to the base case. Otherwise $\gate_{k +1}$ is an internal node
When $\gate_{k+1}.\type=\circplus$, then by line~\ref{alg:one-pass-plus}$\gate_{k+1}$.\prt$=\gate_{{k+1}_\lchild}$.\prt$+\gate_{{k+1}_\rchild}$.\prt, a correct computation, as per \cref{eq:T-all-ones}. Further, lines~\ref{alg:one-pass-lwght} and~\ref{alg:one-pass-rwght} compute $\gate_{{k+1}}.\lwght=\frac{\gate_{{k+1}_\lchild}.\prt}{\gate_{{k+1}}.\prt}$ and analogously for $\gate_{{k+1}}.\rwght$. All values needed for each computation have been correctly computed by the inductive hypothesis.
When $\gate_{k+1}.\type=\circmult$, then line~\ref{alg:one-pass-mult} computes $\gate_{k+1}.\prt=\gate_{{k+1}_\lchild.\prt}\circmult\gate_{{k+1}_\rchild}.\prt$, which indeed by \cref{eq:T-all-ones} is correct. This concludes the proof of correctness.
It is known that $\topord(G)$ is computable in linear time. There are $\size(\circuit)$ iterations. Each iteration has runtime $O\left(\multc{\log\left(\abs{\circuit(1\ldots, 1)}\right)}{\log\inparen{\size(\circuit)}}\right)$ time. This can be seen since each of all the numbers which the algorithm computes is at most $\abs{\circuit}(1,\dots,1)$. Hence, by definition each such operation takes $\multc{\log\left(\abs{\circuit(1\ldots, 1)}\right)}{\log{\size(\circuit)}}$ time, which proves the claimed runtime.
For our runtime results to be relevant, it must be the case that the sum of the coefficients computed by \onepass is indeed size $O(N)$ since there are $O(\log{N})$ bits in the RAM model where $N$ is the size of the input. The size of the input here is \size(\circuit). We show that when \size$(\circuit_\linput)= N_\linput$, \size$(\circuit_\rinput)= N_\rinput$, where $N_\linput+ N_\rinput\leq N$, this is indeed the case.
To prove this result, we start by proving that $\abs{\circuit}(1,\ldots, 1)\leq N^{2^k }$ for \degree(\circuit) $= k$.
For the base case, we have that \depth(\circuit) $=0$, and there can only be one node which must contain a coefficient (or constant) of $1$. In this case, $\abs{\circuit}(1,\ldots, 1)=1$, and \size(\circuit) $=1$, and it is true that $\abs{\circuit}(1,\ldots, 1)=1\leq N^{2^k}=1^{2^0}=1$.
For the inductive step we consider a circuit \circuit such that $\depth(\circuit)\leq\ell+1$. The sink can only be either a $\circmult$ or $\circplus$ gate. Consider when sink node is $\circmult$. Let $k_\linput, k_\rinput$ denote \degree($\circuit_\linput$) and \degree($\circuit_\rinput$) respectively. Note that this case does not require the constraint on $N_\linput$ or $N_\rinput$.
We derive the upperbound of \Cref{eq:sumcoeff-times-upper} by noting that the maximum value of the LHS occurs when both the base and exponent are maximized.
Similar to the $\circmult$ case, \Cref{eq:sumcoeff-plus-upper} upperbounds its LHS by the fact that the maximum base and exponent combination is always greater than or equal to the sum of lower base/exponent combinations. The final equality is true given the constraint over the inputs.
Since $\abs{\circuit}(1,\ldots, 1)\leq N^{2^k}$ for all circuits such that all $\circplus$ gates share at most one gate with their sibling (across their respective subcircuits), then $\log{N^{2^k}}=2^k \cdot\log{N}$ which for fixed $k$ yields the desired $O(\log{N})$ bits for $O(1)$ arithmetic operations.