From c22f17c573ba676a9aa81db0c772a91d47e618be Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlos=20Pe=C3=B1a?= Date: Fri, 25 Jun 2021 11:19:39 +0900 Subject: [PATCH] [DOCS][MINOR] Update sql-performance-tuning.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ### What changes were proposed in this pull request? Update "Caching Data in Memory" section, add suggestion to call DataFrame `unpersist` method to make it consistent with previous suggestion of using `persist` method. ### Why are the changes needed? Keep documentation consistent. ### Does this PR introduce _any_ user-facing change? Yes, fixes the user-facing docs. ### How was this patch tested? Manually. Closes #33069 from Silverlight42/caching-data-doc. Authored-by: Carlos Peña Signed-off-by: Hyukjin Kwon --- docs/sql-performance-tuning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md index 6ccc0405f8..401e8b971a 100644 --- a/docs/sql-performance-tuning.md +++ b/docs/sql-performance-tuning.md @@ -29,7 +29,7 @@ turning on some experimental options. Spark SQL can cache tables using an in-memory columnar format by calling `spark.catalog.cacheTable("tableName")` or `dataFrame.cache()`. Then Spark SQL will scan only required columns and will automatically tune compression to minimize -memory usage and GC pressure. You can call `spark.catalog.uncacheTable("tableName")` to remove the table from memory. +memory usage and GC pressure. You can call `spark.catalog.uncacheTable("tableName")` or `dataFrame.unpersist()` to remove the table from memory. Configuration of in-memory caching can be done using the `setConf` method on `SparkSession` or by running `SET key=value` commands using SQL.