[MINOR][DOCS] Fix typo in documents

### What changes were proposed in this pull request?
Fixed typo in `docs` directory and in `project/MimaExcludes.scala`

### Why are the changes needed?
Better readability of documents

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
No test needed

Closes #28447 from kiszk/typo_20200504.

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
Kazuaki Ishizaki 2020-05-04 16:53:50 +09:00 committed by HyukjinKwon
parent f72220b8ab
commit 35fcc8d5c5
7 changed files with 12 additions and 12 deletions

View file

@ -567,7 +567,7 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.classificat
Refer to the [R API docs](api/R/spark.fmClassifier.html) for more details.
Note: At the moment SparkR doesn't suport feature scaling.
Note: At the moment SparkR doesn't support feature scaling.
{% include_example r/ml/fmClassifier.R %}
</div>
@ -1105,7 +1105,7 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.regression.
Refer to the [R API documentation](api/R/spark.fmRegressor.html) for more details.
Note: At the moment SparkR doesn't suport feature scaling.
Note: At the moment SparkR doesn't support feature scaling.
{% include_example r/ml/fmRegressor.R %}
</div>

View file

@ -335,7 +335,7 @@ SPARK_WORKER_OPTS supports the following system properties:
overlap with `spark.worker.cleanup.enabled`, as this enables cleanup of non-shuffle files in
local directories of a dead executor, while `spark.worker.cleanup.enabled` enables cleanup of
all files/subdirectories of a stopped and timeout application.
This only affects Standalone mode, support of other cluster manangers can be added in the future.
This only affects Standalone mode, support of other cluster managers can be added in the future.
</td>
<td>2.4.0</td>
</tr>

View file

@ -42,7 +42,7 @@ license: |
- In Spark 3.0, `CREATE TABLE` without a specific provider uses the value of `spark.sql.sources.default` as its provider. In Spark version 2.4 and below, it was Hive. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.createHiveTableByDefault.enabled` to `true`.
- In Spark 3.0, when inserting a value into a table column with a different data type, the type coercion is performed as per ANSI SQL standard. Certain unreasonable type conversions such as converting `string` to `int` and `double` to `boolean` are disallowed. A runtime exception is thrown if the value is out-of-range for the data type of the column. In Spark version 2.4 and below, type conversions during table insertion are allowed as long as they are valid `Cast`. When inserting an out-of-range value to a integral field, the low-order bits of the value is inserted(the same as Java/Scala numeric type casting). For example, if 257 is inserted to a field of byte type, the result is 1. The behavior is controlled by the option `spark.sql.storeAssignmentPolicy`, with a default value as "ANSI". Setting the option as "Legacy" restores the previous behavior.
- In Spark 3.0, when inserting a value into a table column with a different data type, the type coercion is performed as per ANSI SQL standard. Certain unreasonable type conversions such as converting `string` to `int` and `double` to `boolean` are disallowed. A runtime exception is thrown if the value is out-of-range for the data type of the column. In Spark version 2.4 and below, type conversions during table insertion are allowed as long as they are valid `Cast`. When inserting an out-of-range value to an integral field, the low-order bits of the value is inserted(the same as Java/Scala numeric type casting). For example, if 257 is inserted to a field of byte type, the result is 1. The behavior is controlled by the option `spark.sql.storeAssignmentPolicy`, with a default value as "ANSI". Setting the option as "Legacy" restores the previous behavior.
- The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set.
@ -50,7 +50,7 @@ license: |
- Refreshing a cached table would trigger a table uncache operation and then a table cache (lazily) operation. In Spark version 2.4 and below, the cache name and storage level are not preserved before the uncache operation. Therefore, the cache name and storage level could be changed unexpectedly. In Spark 3.0, cache name and storage level are first preserved for cache recreation. It helps to maintain a consistent cache behavior upon table refreshing.
- In Spark 3.0, the properties listing below become reserved; commands fail if you specify reserved properties in places like `CREATE DATABASE ... WITH DBPROPERTIES` and `ALTER TABLE ... SET TBLPROPERTIES`. You need their specific clauses to specify them, for example, `CREATE DATABASE test COMMENT 'any comment' LOCATION 'some path'`. You can set `spark.sql.legacy.notReserveProperties` to `true` to ignore the `ParseException`, in this case, these properties will be silently removed, for example: `SET DBPROTERTIES('location'='/tmp')` will have no effect. In Spark version 2.4 and below, these properties are neither reserved nor have side effects, for example, `SET DBPROTERTIES('location'='/tmp')` do not change the location of the database but only create a headless property just like `'a'='b'`.
- In Spark 3.0, the properties listing below become reserved; commands fail if you specify reserved properties in places like `CREATE DATABASE ... WITH DBPROPERTIES` and `ALTER TABLE ... SET TBLPROPERTIES`. You need their specific clauses to specify them, for example, `CREATE DATABASE test COMMENT 'any comment' LOCATION 'some path'`. You can set `spark.sql.legacy.notReserveProperties` to `true` to ignore the `ParseException`, in this case, these properties will be silently removed, for example: `SET DBPROPERTIES('location'='/tmp')` will have no effect. In Spark version 2.4 and below, these properties are neither reserved nor have side effects, for example, `SET DBPROPERTIES('location'='/tmp')` do not change the location of the database but only create a headless property just like `'a'='b'`.
| Property (case sensitive) | Database Reserved | Table Reserved | Remarks |
| ------------------------- | ----------------- | -------------- | ------- |
@ -130,7 +130,7 @@ license: |
- In Spark 3.0, negative scale of decimal is not allowed by default, for example, data type of literal like `1E10BD` is `DecimalType(11, 0)`. In Spark version 2.4 and below, it was `DecimalType(2, -9)`. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.allowNegativeScaleOfDecimal` to `true`.
- In Spark 3.0, the unary arithmetic operator plus(`+`) only accepts string, numeric and interval type values as inputs. Besides, `+` with a integral string representation is coerced to a double value, for example, `+'1'` returns `1.0`. In Spark version 2.4 and below, this operator is ignored. There is no type checking for it, thus, all type values with a `+` prefix are valid, for example, `+ array(1, 2)` is valid and results `[1, 2]`. Besides, there is no type coercion for it at all, for example, in Spark 2.4, the result of `+'1'` is string `1`.
- In Spark 3.0, the unary arithmetic operator plus(`+`) only accepts string, numeric and interval type values as inputs. Besides, `+` with an integral string representation is coerced to a double value, for example, `+'1'` returns `1.0`. In Spark version 2.4 and below, this operator is ignored. There is no type checking for it, thus, all type values with a `+` prefix are valid, for example, `+ array(1, 2)` is valid and results `[1, 2]`. Besides, there is no type coercion for it at all, for example, in Spark 2.4, the result of `+'1'` is string `1`.
- In Spark 3.0, Dataset query fails if it contains ambiguous column reference that is caused by self join. A typical example: `val df1 = ...; val df2 = df1.filter(...);`, then `df1.join(df2, df1("a") > df2("a"))` returns an empty result which is quite confusing. This is because Spark cannot resolve Dataset column references that point to tables being self joined, and `df1("a")` is exactly the same as `df2("a")` in Spark. To restore the behavior before Spark 3.0, you can set `spark.sql.analyzer.failAmbiguousSelfJoin` to `false`.

View file

@ -30,7 +30,7 @@ An example below uses [GenericUDFAbs](https://github.com/apache/hive/blob/master
{% highlight sql %}
-- Register `GenericUDFAbs` and use it in Spark SQL.
-- Note that, if you use your own programmed one, you need to add a JAR containig it
-- Note that, if you use your own programmed one, you need to add a JAR containing it
-- into a classpath,
-- e.g., ADD JAR yourHiveUDF.jar;
CREATE TEMPORARY FUNCTION testUDF AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFAbs';

View file

@ -24,7 +24,7 @@ Built-in functions are commonly used routines that Spark SQL predefines and a co
### Built-in Functions
Spark SQL has some categories of frequently-used built-in functions for aggregtion, arrays/maps, date/timestamp, and JSON data.
Spark SQL has some categories of frequently-used built-in functions for aggregation, arrays/maps, date/timestamp, and JSON data.
This subsection presents the usages and descriptions of these functions.
#### Scalar Functions

View file

@ -75,7 +75,7 @@ DESCRIBE QUERY WITH all_names_cte
| name| string| null|
+--------+---------+-------+
-- Returns column metadata information for a inline table.
-- Returns column metadata information for an inline table.
DESC QUERY VALUES(100, 'John', 10000.20D) AS employee(id, name, salary);
+--------+---------+-------+
|col_name|data_type|comment|

View file

@ -99,7 +99,7 @@ This page displays the details of a specific job identified by its job ID.
The Stages tab displays a summary page that shows the current state of all stages of all jobs in
the Spark application.
At the beginning of the page is the summary with the count of all stages by status (active, pending, completed, sikipped, and failed)
At the beginning of the page is the summary with the count of all stages by status (active, pending, completed, skipped, and failed)
<p style="text-align: center;">
<img src="img/AllStagesPageDetail1.png" title="Stages header" alt="Stages header" width="30%">
@ -136,7 +136,7 @@ Summary metrics for all task are represented in a table and in a timeline.
* **[Tasks deserialization time](configuration.html#compression-and-serialization)**
* **Duration of tasks**.
* **GC time** is the total JVM garbage collection time.
* **Result serialization time** is the time spent serializing the task result on a executor before sending it back to the driver.
* **Result serialization time** is the time spent serializing the task result on an executor before sending it back to the driver.
* **Getting result time** is the time that the driver spends fetching task results from workers.
* **Scheduler delay** is the time the task waits to be scheduled for execution.
* **Peak execution memory** is the maximum memory used by the internal data structures created during shuffles, aggregations and joins.