364b0db753
# What changes were proposed in this pull request? It seems there are several non-breaking spaces were inserted into several `.md`s and they look breaking rendering markdown files. These are different. For example, this can be checked via `python` as below: ```python >>> " " '\xc2\xa0' >>> " " ' ' ``` _Note that it seems this PR description automatically replaces non-breaking spaces into normal spaces. Please open a `vi` and copy and paste it into `python` to verify this (do not copy the characters here)._ I checked the output below in Sapari and Chrome on Mac OS and, Internal Explorer on Windows 10. **Before** ![2017-04-03 12 37 17](https://cloud.githubusercontent.com/assets/6477701/24594655/50aaba02-186a-11e7-80bb-d34b17a3398a.png) ![2017-04-03 12 36 57](https://cloud.githubusercontent.com/assets/6477701/24594654/50a855e6-186a-11e7-94e2-661e56544b0f.png) **After** ![2017-04-03 12 36 46](https://cloud.githubusercontent.com/assets/6477701/24594657/53c2545c-186a-11e7-9a73-00529afbfd75.png) ![2017-04-03 12 36 31](https://cloud.githubusercontent.com/assets/6477701/24594658/53c286c0-186a-11e7-99c9-e66b1f510fe7.png) ## How was this patch tested? Manually checking. These instances were found via ``` grep --include=*.scala --include=*.python --include=*.java --include=*.r --include=*.R --include=*.md --include=*.r -r -I " " . ``` in Mac OS. It seems there are several instances more as below: ``` ./docs/sql-programming-guide.md: │ ├── ... ./docs/sql-programming-guide.md: │ │ ./docs/sql-programming-guide.md: │ ├── country=US ./docs/sql-programming-guide.md: │ │ └── data.parquet ./docs/sql-programming-guide.md: │ ├── country=CN ./docs/sql-programming-guide.md: │ │ └── data.parquet ./docs/sql-programming-guide.md: │ └── ... ./docs/sql-programming-guide.md: ├── ... ./docs/sql-programming-guide.md: │ ./docs/sql-programming-guide.md: ├── country=US ./docs/sql-programming-guide.md: │ └── data.parquet ./docs/sql-programming-guide.md: ├── country=CN ./docs/sql-programming-guide.md: │ └── data.parquet ./docs/sql-programming-guide.md: └── ... ./sql/core/src/test/README.md:│ ├── *.avdl # Testing Avro IDL(s) ./sql/core/src/test/README.md:│ └── *.avpr # !! NO TOUCH !! Protocol files generated from Avro IDL(s) ./sql/core/src/test/README.md:│ ├── gen-avro.sh # Script used to generate Java code for Avro ./sql/core/src/test/README.md:│ └── gen-thrift.sh # Script used to generate Java code for Thrift ``` These seems generated via `tree` command which inserts non-breaking spaces. They do not look causing any problem for rendering within code blocks and I did not fix it to reduce the overhead to manually replace it when it is overwritten via `tree` command in the future. Author: hyukjinkwon <gurwls223@gmail.com> Closes #17517 from HyukjinKwon/non-breaking-space.
104 lines
3.7 KiB
Markdown
104 lines
3.7 KiB
Markdown
# Apache Spark
|
|
|
|
Spark is a fast and general cluster computing system for Big Data. It provides
|
|
high-level APIs in Scala, Java, Python, and R, and an optimized engine that
|
|
supports general computation graphs for data analysis. It also supports a
|
|
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
|
|
MLlib for machine learning, GraphX for graph processing,
|
|
and Spark Streaming for stream processing.
|
|
|
|
<http://spark.apache.org/>
|
|
|
|
|
|
## Online Documentation
|
|
|
|
You can find the latest Spark documentation, including a programming
|
|
guide, on the [project web page](http://spark.apache.org/documentation.html).
|
|
This README file only contains basic setup instructions.
|
|
|
|
## Building Spark
|
|
|
|
Spark is built using [Apache Maven](http://maven.apache.org/).
|
|
To build Spark and its example programs, run:
|
|
|
|
build/mvn -DskipTests clean package
|
|
|
|
(You do not need to do this if you downloaded a pre-built package.)
|
|
|
|
You can build Spark using more than one thread by using the -T option with Maven, see ["Parallel builds in Maven 3"](https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3).
|
|
More detailed documentation is available from the project site, at
|
|
["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
|
|
|
|
For general development tips, including info on developing Spark using an IDE, see ["Useful Developer Tools"](http://spark.apache.org/developer-tools.html).
|
|
|
|
## Interactive Scala Shell
|
|
|
|
The easiest way to start using Spark is through the Scala shell:
|
|
|
|
./bin/spark-shell
|
|
|
|
Try the following command, which should return 1000:
|
|
|
|
scala> sc.parallelize(1 to 1000).count()
|
|
|
|
## Interactive Python Shell
|
|
|
|
Alternatively, if you prefer Python, you can use the Python shell:
|
|
|
|
./bin/pyspark
|
|
|
|
And run the following command, which should also return 1000:
|
|
|
|
>>> sc.parallelize(range(1000)).count()
|
|
|
|
## Example Programs
|
|
|
|
Spark also comes with several sample programs in the `examples` directory.
|
|
To run one of them, use `./bin/run-example <class> [params]`. For example:
|
|
|
|
./bin/run-example SparkPi
|
|
|
|
will run the Pi example locally.
|
|
|
|
You can set the MASTER environment variable when running examples to submit
|
|
examples to a cluster. This can be a mesos:// or spark:// URL,
|
|
"yarn" to run on YARN, and "local" to run
|
|
locally with one thread, or "local[N]" to run locally with N threads. You
|
|
can also use an abbreviated class name if the class is in the `examples`
|
|
package. For instance:
|
|
|
|
MASTER=spark://host:7077 ./bin/run-example SparkPi
|
|
|
|
Many of the example programs print usage help if no params are given.
|
|
|
|
## Running Tests
|
|
|
|
Testing first requires [building Spark](#building-spark). Once Spark is built, tests
|
|
can be run using:
|
|
|
|
./dev/run-tests
|
|
|
|
Please see the guidance on how to
|
|
[run tests for a module, or individual tests](http://spark.apache.org/developer-tools.html#individual-tests).
|
|
|
|
## A Note About Hadoop Versions
|
|
|
|
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
|
|
storage systems. Because the protocols have changed in different versions of
|
|
Hadoop, you must build Spark against the same version that your cluster runs.
|
|
|
|
Please refer to the build documentation at
|
|
["Specifying the Hadoop Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
|
|
for detailed guidance on building for a particular distribution of Hadoop, including
|
|
building for particular Hive and Hive Thriftserver distributions.
|
|
|
|
## Configuration
|
|
|
|
Please refer to the [Configuration Guide](http://spark.apache.org/docs/latest/configuration.html)
|
|
in the online documentation for an overview on how to configure Spark.
|
|
|
|
## Contributing
|
|
|
|
Please review the [Contribution to Spark guide](http://spark.apache.org/contributing.html)
|
|
for information on how to get started contributing to the project.
|