Matei Zaharia
0456384939
Merge pull request #911 from pwendell/ganglia-sink
...
Adding Manen dependency for Ganglia
2013-09-09 09:57:54 -07:00
Stephen Haberman
59003d387d
Use a set since shuffle could change order.
2013-09-09 11:45:03 -05:00
Stephen Haberman
6471bfec73
Reword 'evenly distributed' to 'distributed with a hash partitioner.
2013-09-09 11:44:15 -05:00
Patrick Wendell
528fdbae97
Adding Manen dependency
2013-09-09 09:32:18 -07:00
Matei Zaharia
bf984e2745
Merge pull request #890 from mridulm/master
...
Fix hash bug
2013-09-08 23:50:24 -07:00
Reynold Xin
e9d4f44a7a
Merge pull request #909 from mateiz/exec-id-fix
...
Fix an instance where full standalone mode executor IDs were passed to
2013-09-08 23:36:48 -07:00
Matei Zaharia
2447b1c4e6
Merge pull request #910 from mateiz/ml-doc-tweaks
...
Small tweaks to MLlib docs
2013-09-08 22:27:49 -07:00
Matei Zaharia
7a5c4b647b
Small tweaks to MLlib docs
2013-09-08 21:47:24 -07:00
Matei Zaharia
7d3204b056
Merge pull request #905 from mateiz/docs2
...
Job scheduling and cluster mode docs
2013-09-08 21:39:12 -07:00
Matei Zaharia
f1f83712f4
Merge pull request #896 from atalwalkar/master
...
updated content
2013-09-08 21:26:11 -07:00
Matei Zaharia
b458854977
Fix some review comments
2013-09-08 21:25:49 -07:00
Ameet Talwalkar
81a8bd46ac
respose to PR comments
2013-09-08 19:21:30 -07:00
Ameet Talwalkar
bf280c8b0f
Merge remote-tracking branch 'upstream/master'
2013-09-08 18:41:38 -07:00
Patrick Wendell
f68848d95d
Merge pull request #906 from pwendell/ganglia-sink
...
Clean-up of Metrics Code/Docs and Add Ganglia Sink
2013-09-08 18:32:16 -07:00
Matei Zaharia
f9b7f58de2
Fix an instance where full standalone mode executor IDs were passed to
...
StandaloneSchedulerBackend instead of the smaller IDs used within Spark
(that lack the application name).
This was reported by ClearStory in
https://github.com/clearstorydata/spark/pull/9 .
Also fixed some messages that said slave instead of executor.
2013-09-08 18:27:50 -07:00
Matei Zaharia
170b3869ee
Fix unit test failure due to changed default
2013-09-08 17:51:27 -07:00
Ameet Talwalkar
5ac62dbbd0
updates based on comments to PR
2013-09-08 17:39:08 -07:00
Patrick Wendell
b4e382c210
Adding sc name in metrics source
2013-09-08 16:06:49 -07:00
Patrick Wendell
8026537597
Fixing package name in template conf
2013-09-08 16:06:32 -07:00
Matei Zaharia
0b957997ad
Merge pull request #908 from pwendell/master
...
Fix target JVM version in scala build
2013-09-08 15:30:16 -07:00
Patrick Wendell
27bd74c8ad
Fix target JVM version in scala build
2013-09-08 14:37:45 -07:00
Matei Zaharia
5a587fb98d
Updated cluster diagram to show caches
2013-09-08 13:51:57 -07:00
Patrick Wendell
c190b48bf5
Adding more docs and some code cleanup
2013-09-08 13:46:28 -07:00
Stephen Haberman
df5fd35273
Add better docs for coalesce.
...
Include the useful tip that if shuffle=true, coalesce can actually
increase the number of partitions.
This makes coalesce more like a generic `RDD.repartition` operation.
(Ideally this `RDD.repartition` could automatically choose either a coalesce or
a shuffle if numPartitions was either less than or greater than, respectively,
the current number of partitions.)
2013-09-08 15:39:04 -05:00
Matei Zaharia
af8ffdb73c
Review comments
2013-09-08 13:36:50 -07:00
Matei Zaharia
04cfb3aa9d
Merge pull request #898 from ilikerps/660
...
SPARK-660: Add StorageLevel support in Python
2013-09-08 10:33:20 -07:00
Patrick Wendell
8de8ee5d3c
Ganglia sink
2013-09-08 10:08:18 -07:00
Matei Zaharia
c0d375107f
Some tweaks to CDH/HDP doc
2013-09-08 00:44:41 -07:00
Aaron Davidson
a3868544be
Whoopsy daisy
2013-09-08 00:30:47 -07:00
Matei Zaharia
f261d2a60f
Added cluster overview doc, made logo higher-resolution, and added more
...
details on monitoring
2013-09-08 00:29:11 -07:00
Matei Zaharia
651a96adf7
More fair scheduler docs and property names.
...
Also changed uses of "job" terminology to "application" when they
referred to an entire Spark program, to avoid confusion.
2013-09-08 00:29:11 -07:00
Matei Zaharia
98fb69822c
Work in progress:
...
- Add job scheduling docs
- Rename some fair scheduler properties
- Organize intro page better
- Link to Apache wiki for "contributing to Spark"
2013-09-08 00:29:11 -07:00
Matei Zaharia
38488aca8a
Merge pull request #900 from pwendell/cdh-docs
...
Provide docs to describe running on CDH/HDP cluster.
2013-09-08 00:28:53 -07:00
Patrick Wendell
a8e376ec0f
Merge pull request #904 from pwendell/master
...
Adding Apache license to two files
2013-09-07 21:16:01 -07:00
Patrick Wendell
6d2198643c
Adding Apache license to two files
2013-09-07 20:46:58 -07:00
Aaron Davidson
c1cc8c4da2
Export StorageLevel and refactor
2013-09-07 14:41:31 -07:00
Patrick Wendell
22b982d2bc
File rename
2013-09-07 14:38:54 -07:00
Matei Zaharia
cfde85e395
Merge pull request #901 from ooyala/2013-09/0.8-doc-changes
...
0.8 Doc changes for make-distribution.sh
2013-09-07 13:53:08 -07:00
Matei Zaharia
4a7813a247
Merge pull request #903 from rxin/resulttask
...
Fixed the bug that ResultTask was not properly deserializing outputId.
2013-09-07 13:52:24 -07:00
Patrick Wendell
61c4762d45
Changes based on feedback
2013-09-07 11:55:10 -07:00
Aaron Davidson
8001687af5
Remove reflection, hard-code StorageLevels
...
The sc.StorageLevel -> StorageLevel pathway is a bit janky, but otherwise
the shell would have to call a private method of SparkContext. Having
StorageLevel available in sc also doesn't seem like the end of the world.
There may be a better solution, though.
As for creating the StorageLevel object itself, this seems to be the best
way in Python 2 for creating singleton, enum-like objects:
http://stackoverflow.com/questions/36932/how-can-i-represent-an-enum-in-python
2013-09-07 09:34:07 -07:00
Evan Chan
be1ee28ca6
CR feedback from Matei
2013-09-07 08:56:24 -07:00
Matei Zaharia
afe46ba36e
Merge pull request #892 from jey/fix-yarn-assembly
...
YARN build fixes
2013-09-07 07:28:51 -07:00
Reynold Xin
210eae26f4
Fixed the bug that ResultTask was not properly deserializing outputId.
2013-09-07 21:59:47 +08:00
Aaron Davidson
b8a0b6ea5e
Memoize StorageLevels read from JVM
2013-09-06 15:36:04 -07:00
Patrick Wendell
2eebeff5eb
Merge pull request #897 from pwendell/master
...
Docs describing Spark monitoring and instrumentation
2013-09-06 15:25:22 -07:00
Evan Chan
ff1dbf2106
Add references to make-distribution.sh
2013-09-06 14:20:44 -07:00
Evan Chan
88d53f0dff
"launch" scripts is more accurate terminology
2013-09-06 14:03:44 -07:00
Evan Chan
5a18b854a7
Easier way to start the master
2013-09-06 13:59:43 -07:00
Evan Chan
76d5d2d3c5
Add notes about starting spark-shell
2013-09-06 13:53:00 -07:00