[minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API
Author: Reynold Xin <rxin@databricks.com> Closes #6569 from rxin/freqItemsWarning and squashes the following commits: 7eec145 [Reynold Xin] [minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API.
This commit is contained in:
parent
cae9306c4f
commit
4c868b9943
|
@ -1170,6 +1170,9 @@ class DataFrame(object):
|
|||
"http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou".
|
||||
:func:`DataFrame.freqItems` and :func:`DataFrameStatFunctions.freqItems` are aliases.
|
||||
|
||||
This function is meant for exploratory data analysis, as we make no guarantee about the
|
||||
backward compatibility of the schema of the resulting DataFrame.
|
||||
|
||||
:param cols: Names of the columns to calculate frequent items for as a list or tuple of
|
||||
strings.
|
||||
:param support: The frequency with which to consider an item 'frequent'. Default is 1%.
|
||||
|
|
|
@ -97,6 +97,9 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
|
|||
* [[http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou]].
|
||||
* The `support` should be greater than 1e-4.
|
||||
*
|
||||
* This function is meant for exploratory data analysis, as we make no guarantee about the
|
||||
* backward compatibility of the schema of the resulting [[DataFrame]].
|
||||
*
|
||||
* @param cols the names of the columns to search frequent items in.
|
||||
* @param support The minimum frequency for an item to be considered `frequent`. Should be greater
|
||||
* than 1e-4.
|
||||
|
@ -114,6 +117,9 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
|
|||
* [[http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou]].
|
||||
* Uses a `default` support of 1%.
|
||||
*
|
||||
* This function is meant for exploratory data analysis, as we make no guarantee about the
|
||||
* backward compatibility of the schema of the resulting [[DataFrame]].
|
||||
*
|
||||
* @param cols the names of the columns to search frequent items in.
|
||||
* @return A Local DataFrame with the Array of frequent items for each column.
|
||||
*
|
||||
|
@ -128,6 +134,9 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
|
|||
* frequent element count algorithm described in
|
||||
* [[http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou]].
|
||||
*
|
||||
* This function is meant for exploratory data analysis, as we make no guarantee about the
|
||||
* backward compatibility of the schema of the resulting [[DataFrame]].
|
||||
*
|
||||
* @param cols the names of the columns to search frequent items in.
|
||||
* @return A Local DataFrame with the Array of frequent items for each column.
|
||||
*
|
||||
|
@ -143,6 +152,9 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
|
|||
* [[http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou]].
|
||||
* Uses a `default` support of 1%.
|
||||
*
|
||||
* This function is meant for exploratory data analysis, as we make no guarantee about the
|
||||
* backward compatibility of the schema of the resulting [[DataFrame]].
|
||||
*
|
||||
* @param cols the names of the columns to search frequent items in.
|
||||
* @return A Local DataFrame with the Array of frequent items for each column.
|
||||
*
|
||||
|
|
Loading…
Reference in a new issue