--- layout: global title: Statistics Functionality - MLlib displayTitle: MLlib - Statistics Functionality --- * Table of contents {:toc} `\[ \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\wv}{\mathbf{w}} \newcommand{\av}{\mathbf{\alpha}} \newcommand{\bv}{\mathbf{b}} \newcommand{\N}{\mathbb{N}} \newcommand{\id}{\mathbf{I}} \newcommand{\ind}{\mathbf{1}} \newcommand{\0}{\mathbf{0}} \newcommand{\unit}{\mathbf{e}} \newcommand{\one}{\mathbf{1}} \newcommand{\zero}{\mathbf{0}} \]` ## Data Generators ## Stratified Sampling ## Summary Statistics ### Multivariate summary statistics We provide column summary statistics for `RowMatrix` (note: this functionality is not currently supported in `IndexedRowMatrix` or `CoordinateMatrix`). If the number of columns is not large, e.g., on the order of thousands, then the covariance matrix can also be computed as a local matrix, which requires $\mathcal{O}(n^2)$ storage where $n$ is the number of columns. The total CPU time is $\mathcal{O}(m n^2)$, where $m$ is the number of rows, and is faster if the rows are sparse.