Reynold Xin
3bd2890d2b
Fixed the deadlock situation in multi-job actions and added more unit tests.
2013-10-10 12:07:09 -07:00
Prashant Sharma
bfbd7e5d9f
Fixed some scala warnings in core.
2013-10-10 15:22:31 +05:30
Prashant Sharma
34da58ae50
Changed message-frame-size to maximum-frame-size as property.
...
Removed a test accidentally added during merge.
2013-10-10 15:13:44 +05:30
Aaron Davidson
42d8b8efe6
Address Matei's comments on documentation
...
Updates to the documentation and changing some logError()s to logWarning()s.
2013-10-10 00:33:47 -07:00
Reynold Xin
0353f74a9a
Put the job cancellation handling into the dagscheduler's main event loop.
2013-10-10 00:28:00 -07:00
Reynold Xin
dbae7795ba
Merge branch 'master' of github.com:apache/incubator-spark into kill
...
Conflicts:
core/src/main/scala/org/apache/spark/CacheManager.scala
core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
core/src/main/scala/org/apache/spark/scheduler/DAGSchedulerSource.scala
2013-10-09 22:57:35 -07:00
Reynold Xin
53895f9cde
Implemented FutureAction, FutureJob, CancellablePromise.
...
Implemented more unit tests for async actions.
2013-10-09 22:43:06 -07:00
Prashant Sharma
026ab75661
Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10
2013-10-10 09:42:55 +05:30
Prashant Sharma
26860639c5
Merge branch 'scala-2.10' of github.com:ScrapCodes/spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
project/SparkBuild.scala
2013-10-10 09:42:23 +05:30
Reynold Xin
320418f7c8
Merge pull request #49 from mateiz/kryo-fix-2
...
Fix Chill serialization of Range objects
It used to write out each element one by one, creating very large objects.
2013-10-09 16:55:30 -07:00
Reynold Xin
215238cb39
Merge pull request #50 from kayousterhout/SPARK-908
...
Fix race condition in SparkListenerSuite (fixes SPARK-908).
2013-10-09 16:49:44 -07:00
Matei Zaharia
c84c205289
Fix Chill serialization of Range objects, which used to write out each
...
element, and register user and Spark classes before Chill's serializers
to let them override Chill's behavior in general.
2013-10-09 16:23:40 -07:00
Kay Ousterhout
36966f65df
Style fixes
2013-10-09 15:36:34 -07:00
Kay Ousterhout
3f7e9b265c
Fixed comment to use javadoc style
2013-10-09 15:23:04 -07:00
Kay Ousterhout
a34a4e8174
Fix race condition in SparkListenerSuite (fixes SPARK-908).
2013-10-09 15:07:53 -07:00
Patrick Wendell
bd3bcc5f8e
Use standard abbreviations in metrics labels
2013-10-09 11:16:24 -07:00
Patrick Wendell
19d445d37c
Merge pull request #22 from GraceH/metrics-naming
...
SPARK-900 Use coarser grained naming for metrics
see SPARK-900 Use coarser grained naming for metrics.
Now the new metric name is formatted as {XXX.YYY.ZZZ.COUNTER_UNIT}, XXX.YYY.ZZZ represents the group name, which can group several metrics under the same Ganglia view.
2013-10-09 11:08:34 -07:00
Matei Zaharia
12d593129d
Create fewer function objects in uses of AppendOnlyMap.changeValue
2013-10-08 23:16:51 -07:00
Matei Zaharia
0b35051f19
Address some comments on code clarity
2013-10-08 23:16:17 -07:00
Matei Zaharia
4acbc5afdd
Moved files that were in the wrong directory after package rename
2013-10-08 23:16:17 -07:00
Matei Zaharia
0e40cfabf8
Fix some review comments
2013-10-08 23:16:16 -07:00
Matei Zaharia
b535db7d89
Added a fast and low-memory append-only map implementation for cogroup
...
and parallel reduce operations
2013-10-08 23:14:38 -07:00
Reynold Xin
e67d5b962a
Merge pull request #43 from mateiz/kryo-fix
...
Don't allocate Kryo buffers unless needed
I noticed that the Kryo serializer could be slower than the Java one by 2-3x on small shuffles because it spend a lot of time initializing Kryo Input and Output objects. This is because our default buffer size for them is very large. Since the serializer is often used on streams, I made the initialization lazy for that, and used a smaller buffer (auto-managed by Kryo) for input.
2013-10-08 22:57:38 -07:00
Grace Huang
f7628e4033
remove those futile suffixes like number/count
2013-10-09 08:36:41 +08:00
Aaron Davidson
749233b869
Revert change to spark-class
...
Also adds comment about how to configure for FaultToleranceTest.
2013-10-08 11:41:52 -07:00
Grace Huang
22bed59d2d
create metrics name manually.
2013-10-08 18:01:11 +08:00
Grace Huang
188abbf8f1
Revert "SPARK-900 Use coarser grained naming for metrics"
...
This reverts commit 4b68be5f3c
.
2013-10-08 17:45:14 +08:00
Grace Huang
a2af6b543a
Revert "remedy the line-wrap while exceeding 100 chars"
...
This reverts commit 892fb8ffa8
.
2013-10-08 17:44:56 +08:00
Prashant Sharma
7be75682b9
Merge branch 'master' into wip-merge-master
...
Conflicts:
bagel/pom.xml
core/pom.xml
core/src/test/scala/org/apache/spark/ui/UISuite.scala
examples/pom.xml
mllib/pom.xml
pom.xml
project/SparkBuild.scala
repl/pom.xml
streaming/pom.xml
tools/pom.xml
In scala 2.10, a shorter representation is used for naming artifacts
so changed to shorter scala version for artifacts and made it a property in pom.
2013-10-08 11:29:40 +05:30
Patrick Wendell
9e9e9e1b42
Making the timing block more narrow for the sync
2013-10-07 21:28:12 -07:00
Patrick Wendell
8b377718b8
Responses to review
2013-10-07 20:03:35 -07:00
Matei Zaharia
a8725bf8f8
Don't allocate Kryo buffers unless needed
2013-10-07 19:16:35 -07:00
Patrick Wendell
391133f66a
Fix inconsistent and incorrect log messages in shuffle read path
2013-10-07 17:24:18 -07:00
Patrick Wendell
b08306c5cf
Minor cleanup
2013-10-07 16:30:25 -07:00
Patrick Wendell
524d01ea31
Perf benchmark
2013-10-07 15:15:42 -07:00
Patrick Wendell
d15acd6457
Trying new approach with writes
2013-10-07 15:15:42 -07:00
Patrick Wendell
a224c8c9b8
Adding option to force sync to the filesystem
2013-10-07 15:15:42 -07:00
Patrick Wendell
3478ca6762
Track and report write throughput for shuffle tasks.
2013-10-07 15:15:41 -07:00
Reynold Xin
213b70a2db
Merge pull request #31 from sundeepn/branch-0.8
...
Resolving package conflicts with hadoop 0.23.9
Hadoop 0.23.9 is having a package conflict with easymock's dependencies.
(cherry picked from commit 023e3fdf00
)
Signed-off-by: Reynold Xin <rxin@apache.org>
2013-10-07 10:54:22 -07:00
Kay Ousterhout
fdc52b2f8b
Added back fully qualified class name
2013-10-06 18:45:43 -07:00
Aaron Davidson
718e8c2052
Change url format to spark://host1:port1,host2:port2
...
This replaces the format of spark://host1:port1,spark://host2:port2 and is more
consistent with ZooKeeper's zk:// urls.
2013-10-06 00:02:08 -07:00
Aaron Davidson
e1190229e1
Add end-to-end test for standalone scheduler fault tolerance
...
Docker files drawn mostly from Matt Masse. Some updates from Andre Schumacher.
2013-10-05 23:20:31 -07:00
Patrick Wendell
aa9fb84994
Merging build changes in from 0.8
2013-10-05 22:07:00 -07:00
Matei Zaharia
4a25b116d4
Merge pull request #20 from harveyfeng/hadoop-config-cache
...
Allow users to pass broadcasted Configurations and cache InputFormats across Hadoop file reads.
Note: originally from https://github.com/mesos/spark/pull/942
Currently motivated by Shark queries on Hive-partitioned tables, where there's a JobConf broadcast for every Hive-partition (i.e., every subdirectory read). The only thing different about those JobConfs is the input path - the Hadoop Configuration that the JobConfs are constructed from remain the same.
This PR only modifies the old Hadoop API RDDs, but similar additions to the new API might reduce computation latencies a little bit for high-frequency FileInputDStreams (which only uses the new API right now).
As a small bonus, added InputFormats caching, to avoid reflection calls for every RDD#compute().
Few other notes:
Added a general soft-reference hashmap in SparkHadoopUtil because I wanted to avoid adding another class to SparkEnv.
SparkContext default hadoopConfiguration isn't cached. There's no equals() method for Configuration, so there isn't a good way to determine when configuration properties have changed.
2013-10-05 19:28:55 -07:00
Harvey Feng
6a2bbec5e3
Some comments regarding JobConf and InputFormat caching for HadoopRDDs.
2013-10-05 17:53:58 -07:00
Harvey Feng
96929f28bb
Make HadoopRDD object Spark private.
2013-10-05 17:14:19 -07:00
Harvey Feng
b5e93c1227
Fix API changes; lines > 100 chars.
2013-10-05 16:57:08 -07:00
Aaron Davidson
0f070279e7
Address Matei's comments
2013-10-05 15:15:29 -07:00
Martin Weindel
e09f4a9601
fixed some warnings
2013-10-05 23:08:23 +02:00
Matei Zaharia
100222b048
Merge pull request #27 from davidmccauley/master
...
SPARK-920/921 - JSON endpoint updates
920 - Removal of duplicate scheme part of Spark URI, it was appearing as spark://spark//host:port in the JSON field.
JSON now delivered as:
url:spark://127.0.0.1:7077
921 - Adding the URL of the Main Application UI will allow custom interfaces (that use the JSON output) to redirect from the standalone UI.
2013-10-05 13:38:59 -07:00
Mridul Muralidharan
b5025d90bb
- Allow for finer control of cleaner
...
- Address review comments, move to incubator spark
- Also includes a change to speculation - including preventing exceptions in rare cases.
2013-10-06 00:35:51 +05:30
Prashant Sharma
3e41495288
Fixed tests, changed property akka.remote.netty.x to akka.remote.netty.tcp.x
2013-10-05 16:39:25 +05:30
Prashant Sharma
c810ee0690
Merge branch 'master' into scala-2.10
...
Conflicts:
core/src/test/scala/org/apache/spark/DistributedSuite.scala
project/SparkBuild.scala
2013-10-05 15:52:57 +05:30
Aaron Davidson
db6f154940
Fix race conditions during recovery
...
One major change was the use of messages instead of raw functions as the
parameter of Akka scheduled timers. Since messages are serialized, unlike
raw functions, the behavior is easier to think about and doesn't cause
race conditions when exceptions are thrown.
Another change is to avoid using global pointers that might change without
a lock.
2013-10-04 19:54:33 -07:00
Kay Ousterhout
7b5ae23a37
Renamed StandaloneX to CoarseGrainedX.
...
The previous names were confusing because the components weren't just
used in Standalone mode -- in fact, the scheduler used for Standalone
mode is called SparkDeploySchedulerBackend. So, the previous names
were misleading.
2013-10-04 13:56:43 -07:00
Andre Schumacher
c84946fe21
Fixing SPARK-602: PythonPartitioner
...
Currently PythonPartitioner determines partition ID by hashing a
byte-array representation of PySpark's key. This PR lets
PythonPartitioner use the actual partition ID, which is required e.g.
for sorting via PySpark.
2013-10-04 11:56:47 -07:00
Reynold Xin
d29e8035a0
Added countAsync and various unit tests for async actions.
2013-10-03 15:13:44 -07:00
tgravescs
0fff4ee852
Adding in the --addJars option to make SparkContext.addJar work on yarn and cleanup
...
the classpaths
2013-10-03 11:52:16 -05:00
Reynold Xin
802bfb870d
- Created AsyncRDDActions.
...
- Make FutureJob a Scala Future instead of Java Future.
2013-10-03 01:22:28 -07:00
Reynold Xin
e8e917f209
Merge branch 'master' into kill
...
Conflicts:
core/src/main/scala/org/apache/spark/TaskEndReason.scala
core/src/main/scala/org/apache/spark/executor/Executor.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
2013-10-02 23:01:34 -07:00
Reynold Xin
1c48ba0d9f
Merge remote-tracking branch 'origin' into kill
...
Conflicts:
core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
2013-10-02 16:40:44 -07:00
David McCauley
1577b373a9
SPARK-921 - Add Application UI URL to ApplicationInfo Json output
2013-10-02 15:03:41 +01:00
David McCauley
351da54676
SPARK-920 - JSON endpoint URI scheme part (spark://) duplicated
2013-10-02 13:23:38 +01:00
Prashant Sharma
5829692885
Merge branch 'master' into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala
docs/_config.yml
project/SparkBuild.scala
repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
2013-10-01 11:57:24 +05:30
Kay Ousterhout
0dcad2edcb
Added additional unit test for repeated task failures
2013-09-30 23:26:15 -07:00
Kay Ousterhout
dea4677c88
Fixed compilation errors and broken test.
2013-09-30 22:07:01 -07:00
Kay Ousterhout
8deda427bc
Merge remote-tracking branch 'upstream/master' into results_through-bm
...
Conflicts:
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterScheduler.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/org/apache/spark/scheduler/local/LocalTaskSetManager.scala
2013-09-30 10:16:58 -07:00
Kay Ousterhout
58b764b7c6
Addressed Matei's code review comments
2013-09-30 10:11:59 -07:00
Prashant Sharma
9865fd6aa0
Fixed non termination of Executor backend, when sc.stop is not called.
2013-09-30 18:09:12 +05:30
Grace Huang
892fb8ffa8
remedy the line-wrap while exceeding 100 chars
2013-09-30 20:12:55 +08:00
Harvey Feng
7d06bdde1d
Merge HadoopDatasetRDD into HadoopRDD.
2013-09-29 20:08:03 -07:00
Grace Huang
4b68be5f3c
SPARK-900 Use coarser grained naming for metrics
2013-09-27 14:47:38 +08:00
Harvey Feng
417085716a
Merge remote-tracking branch 'oldsparkme/hadoopRDD-broadcast-change' into hadoop-config-cache
2013-09-26 15:49:42 -07:00
Aaron Davidson
42d72308fb
Add license notices
2013-09-26 15:45:20 -07:00
Aaron Davidson
f549ea33d3
Standalone Scheduler fault tolerance using ZooKeeper
...
This patch implements full distributed fault tolerance for standalone scheduler Masters.
There is only one master Leader at a time, which is actively serving scheduling
requests. If this Leader crashes, another master will eventually be elected, reconstruct
the state from the first Master, and continue serving scheduling requests.
Leader election is performed using the ZooKeeper leader election pattern. We try to minimize
the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of
retries and session monitoring on top of the ZooKeeper client.
Master failover follows directly from the single-node Master recovery via the file
system (patch 194ba4b8), save that the Master state is stored in ZooKeeper instead.
Configuration:
By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE).
By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url
to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled.
By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory
to an appropriate directory accessible by the Master, we will keep the behavior of from 194ba4b8.
Additionally, places where a Master could be specificied by a spark:// url can now take
comma-delimited lists to specify backup masters. Note that this is only used for registration
of NEW Workers and application Clients. Once a Worker or Client has registered with the
Master Leader, it is "in the system" and will never need to register again.
Forthcoming:
Documentation, tests (! - only ad hoc testing has been performed so far)
I do not intend for this commit to be merged until tests are added, but this patch should
still be mostly reviewable until then.
2013-09-26 15:04:23 -07:00
Aaron Davidson
d5a96feccb
Standalone Scheduler fault recovery
...
Implements a basic form of Standalone Scheduler fault recovery. In particular,
this allows faults to be manually recovered from by means of restarting the
Master process on the same machine. This is the majority of the code necessary
for general fault tolerance, which will first elect a leader and then recover
the Master state.
In order to enable fault recovery, the Master will persist a small amount of state related
to the registration of Workers and Applications to disk. If the Master is started and
sees that this state is still around, it will enter Recovery mode, during which time it
will not schedule any new Executors on Workers (but it does accept the registration of
new Clients and Workers).
At this point, the Master attempts to reconnect to all Workers and Client applications
that were registered at the time of failure. After confirming either the existence
or nonexistence of all such nodes (within a certain timeout), the Master will exit
Recovery mode and resume normal scheduling.
2013-09-26 14:59:35 -07:00
Reynold Xin
70a0b993d4
Merge pull request #14 from kayousterhout/untangle_scheduler
...
Improved organization of scheduling packages.
This commit does not change any code -- only file organization.
Please let me know if there was some masterminded strategy behind
the existing organization that I failed to understand!
There are two components of this change:
(1) Moving files out of the cluster package, and down
a level to the scheduling package. These files are all used by
the local scheduler in addition to the cluster scheduler(s), so
should not be in the cluster package. As a result of this change,
none of the files in the local package reference files in the
cluster package.
(2) Moving the mesos package to within the cluster package.
The mesos scheduling code is for a cluster, and represents a
specific case of cluster scheduling (the Mesos-related classes
often subclass cluster scheduling classes). Thus, the most logical
place for it seems to be within the cluster package.
The one thing about the scheduling code that seems a little funny to me
is the naming of the SchedulerBackends. The StandaloneSchedulerBackend
is not just for Standalone mode, but instead is used by Mesos coarse grained
mode and Yarn, and the backend that *is* just for Standalone mode is instead called SparkDeploySchedulerBackend. I didn't change this because I wasn't sure if there
was a reason for this naming that I'm just not aware of.
2013-09-26 14:11:54 -07:00
Reynold Xin
c514cd1587
Merge pull request #930 from holdenk/master
...
Add mapPartitionsWithIndex
2013-09-26 13:48:20 -07:00
Reynold Xin
560ee5c9bb
Merge pull request #7 from wannabeast/memorystore-fixes
...
some minor fixes to MemoryStore
This is a repeat of #5 , moved to its own branch in my repo.
This makes all updates to on ; it skips on synchronizing the reads where it can get away with it.
2013-09-26 11:27:34 -07:00
Patrick Wendell
6566a19b38
Merge pull request #9 from rxin/limit
...
Smarter take/limit implementation.
2013-09-26 08:01:04 -07:00
Prashant Sharma
42f30b5590
Fixed UISuite, for case when port 4040 is already bound on machine running the test.
2013-09-26 14:38:42 +05:30
Prashant Sharma
604dc40996
Sync with master and some build fixes
2013-09-26 11:40:02 +05:30
Prashant Sharma
7ff4c2d399
fixed maven build for scala 2.10
2013-09-26 10:48:24 +05:30
Kay Ousterhout
d85fe41b2b
Improved organization of scheduling packages.
...
This commit does not change any code -- only file organization.
There are two components of this change:
(1) Moving files out of the cluster package, and down
a level to the scheduling package. These files are all used by
the local scheduler in addition to the cluster scheduler(s), so
should not be in the cluster package. As a result of this change,
none of the files in the local package reference files in the
cluster package.
(2) Moving the mesos package to within the cluster package.
The mesos scheduling code is for a cluster, and represents a
specific case of cluster scheduling (the Mesos-related classes
often subclass cluster scheduling classes). Thus, the most logical
place for it is within the cluster package.
2013-09-25 12:45:46 -07:00
Patrick Wendell
6079721fa1
Update build version in master
2013-09-24 11:41:51 -07:00
Holden Karau
0cef683553
Fix formatting :)
2013-09-23 19:39:42 -07:00
Reynold Xin
ff540a015b
Merge branch 'master' of github.com:markhamstra/incubator-spark
2013-09-23 11:55:02 -07:00
Kay Ousterhout
c75eb14fe5
Send Task results through the block manager when larger than Akka frame size.
...
This change requires adding an extra failure mode: tasks can complete
successfully, but the result gets lost or flushed from the block manager
before it's been fetched.
2013-09-22 21:20:48 -07:00
Holden Karau
7fe0b0ff56
Switch indent from 2 to 4 spaces
2013-09-22 19:44:51 -07:00
Harvey
ef34cfb26c
Move Configuration broadcasts to SparkContext.
2013-09-22 14:43:58 -07:00
Harvey
a6eeb5ffd5
Add a cache for HadoopRDD metadata needed during computation.
...
Currently, the cache is in SparkHadoopUtils, since it's conveniently a member of the SparkEnv.
2013-09-22 03:09:17 -07:00
jerryshao
77e9da1f34
Change Exception to NoSuchElementException and minor style fix
2013-09-22 16:50:08 +08:00
jerryshao
85024acd2e
Remove infix style and others
2013-09-22 14:20:55 +08:00
jerryshao
5850f599dd
Refactor FairSchedulableBuilder:
...
1. Configuration can be read from classpath if not set explicitly.
2. Add missing close handler.
2013-09-22 14:20:55 +08:00
Reynold Xin
a2ea069a5f
Merge pull request #937 from jerryshao/localProperties-fix
...
Fix PR926 local properties issues in Spark Streaming like scenarios
2013-09-21 23:04:42 -07:00
Harvey
be0fc7246f
Split HadoopRDD into one for general Hadoop datasets and one tailored to Hadoop files, which is a common case.
...
This is the first step to avoiding unnecessary Configuration broadcasts per HadoopRDD instantiation.
2013-09-21 21:14:14 -07:00
Prashant Sharma
276c37a51c
Akka 2.2 migration
2013-09-22 08:20:12 +05:30
jerryshao
aa0c29f747
Add barrier for local properties unit test and fix some styles
2013-09-22 09:53:11 +08:00
Reynold Xin
42571d30d0
Smarter take/limit implementation.
2013-09-20 17:09:53 -07:00
Reynold Xin
1d87616b61
Made output of CoGroup and aggregations interruptible.
2013-09-19 23:31:36 -07:00
Mike
9524b943a4
Synchronize on "entries" the remaining update to "currentMemory".
...
Make "currentMemory" @volatile, so that it's reads in ensureFreeSpace() are atomic and up-to-date--i.e., currentMemory can't increase while putLock is held (though it could decrease, which would only help ensureFreeSpace()).
2013-09-19 23:31:35 -07:00
Ankur Dave
026dba6aba
After unit tests, clear port properties unconditionally
...
In MapOutputTrackerSuite, the "remote fetch" test sets spark.driver.port
and spark.hostPort, assuming that they will be cleared by
LocalSparkContext. However, the test never sets sc, so it remains null,
causing LocalSparkContext to skip clearing these properties. Subsequent
tests therefore fail with java.net.BindException: "Address already in
use".
This commit makes LocalSparkContext clear the properties even if sc is
null.
2013-09-19 22:05:23 -07:00
Reynold Xin
c5e40954eb
Wrap around cached data to InterruptibleIterator.
2013-09-19 18:44:38 -07:00
Reynold Xin
c68e72be59
Added comment to InterruptibleIterator.
2013-09-19 18:40:06 -07:00
Reynold Xin
70953810b4
Added task killing iterator to RDDs that take inputs.
2013-09-19 18:33:16 -07:00
Reynold Xin
f19984dafe
More logging changes (task killing for local cluster doesn't work yet).
2013-09-19 18:14:51 -07:00
Reynold Xin
85a0dffe0f
Made task killing work for standalone cluster schedulers.
2013-09-19 16:41:29 -07:00
Reynold Xin
9f8190c17d
Fixed a bug for zero partition in JobWaiter.
2013-09-18 22:42:35 -07:00
Reynold Xin
9332246bd0
Added a hack to kill all active jobs in SparkContext.
2013-09-18 04:38:24 -07:00
Reynold Xin
bf515688e7
Allow SparkContext.submitJob to submit a job for only a subset of the partitions.
2013-09-18 04:16:18 -07:00
jerryshao
ffa5f8e11d
Fix issue when local properties pass from parent to child thread
2013-09-18 17:33:24 +08:00
Reynold Xin
37d8f37a8e
Added a submitJob interface that returns a Future of the result.
2013-09-17 21:13:59 -07:00
Reynold Xin
1cb42e6b2d
Properly handle job failure when the job gets killed.
2013-09-16 22:10:45 -07:00
Reynold Xin
cbc48be13b
Initial commit for job killing.
2013-09-16 18:54:06 -07:00
Prashant Sharma
a90e0eff59
version changed 2.9.3 -> 2.10 in shell script.
2013-09-15 12:47:20 +05:30
Prashant Sharma
383e151fd7
Merge branch 'master' of git://github.com/mesos/spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
project/SparkBuild.scala
2013-09-15 10:55:12 +05:30
Holden Karau
bfcddf4700
Make mapPartitionsWithIndex work with JavaRDD's
2013-09-14 15:53:42 -07:00
Holden Karau
74f710f6cd
Start of working on SPARK-615
2013-09-11 22:35:58 -07:00
Mike
d34672f668
Set currentMemory to 0 in clear().
...
Remove unnecessary entries.get() call.
2013-09-11 18:01:19 -07:00
Kay Ousterhout
93c4253275
Changed localProperties to use ThreadLocal (not DynamicVariable).
...
The fact that DynamicVariable uses an InheritableThreadLocal
can cause problems where the properties end up being shared
across threads in certain circumstances.
2013-09-11 13:01:39 -07:00
Patrick Wendell
91a59e6b10
Merge pull request #919 from mateiz/jets3t
...
Add explicit jets3t dependency, which is excluded in hadoop-client
2013-09-11 10:21:48 -07:00
Patrick Wendell
b9128d34bf
Merge pull request #922 from pwendell/port-change
...
Change default port number from 3030 to 4030.
2013-09-11 10:03:06 -07:00
Patrick Wendell
bddf135670
Change port from 3030 to 4040
2013-09-11 10:01:38 -07:00
David McCauley
5dd875c5b5
SPARK-894 - Not all WebUI fields delivered VIA JSON
2013-09-11 10:46:37 +01:00
Mike
293c758cc0
Remove MemoryStore$Entry.dropPending, unused as of 42e0a68082
.
2013-09-10 00:24:35 -07:00
Matei Zaharia
f117dc6d0d
Add explicit jets3t dependency, which is excluded in hadoop-client
2013-09-10 06:39:25 +00:00
Matei Zaharia
c81377b9ed
Merge pull request #915 from ooyala/master
...
Get rid of / improve ugly NPE when Utils.deleteRecursively() fails
2013-09-09 20:16:19 -07:00
Evan Chan
fdb8b0eec3
Style fix: put body of if within curly braces
2013-09-09 14:29:32 -07:00
Matei Zaharia
a85758c200
Merge pull request #907 from stephenh/document_coalesce_shuffle
...
Add better docs for coalesce.
2013-09-09 13:45:40 -07:00
Evan Chan
27726079e4
Print out more friendly error if listFiles() fails
...
listFiles() could return null if the I/O fails, and this currently results in an ugly NPE which is hard to diagnose.
2013-09-09 12:58:12 -07:00
Y.CORP.YAHOO.COM\tgraves
2186d93285
Add metrics-ganglia to core pom file
2013-09-09 12:37:33 -05:00
Stephen Haberman
59003d387d
Use a set since shuffle could change order.
2013-09-09 11:45:03 -05:00
Stephen Haberman
6471bfec73
Reword 'evenly distributed' to 'distributed with a hash partitioner.
2013-09-09 11:44:15 -05:00
Matei Zaharia
bf984e2745
Merge pull request #890 from mridulm/master
...
Fix hash bug
2013-09-08 23:50:24 -07:00
Reynold Xin
e9d4f44a7a
Merge pull request #909 from mateiz/exec-id-fix
...
Fix an instance where full standalone mode executor IDs were passed to
2013-09-08 23:36:48 -07:00
Matei Zaharia
7d3204b056
Merge pull request #905 from mateiz/docs2
...
Job scheduling and cluster mode docs
2013-09-08 21:39:12 -07:00
Patrick Wendell
f68848d95d
Merge pull request #906 from pwendell/ganglia-sink
...
Clean-up of Metrics Code/Docs and Add Ganglia Sink
2013-09-08 18:32:16 -07:00
Matei Zaharia
f9b7f58de2
Fix an instance where full standalone mode executor IDs were passed to
...
StandaloneSchedulerBackend instead of the smaller IDs used within Spark
(that lack the application name).
This was reported by ClearStory in
https://github.com/clearstorydata/spark/pull/9 .
Also fixed some messages that said slave instead of executor.
2013-09-08 18:27:50 -07:00
Matei Zaharia
170b3869ee
Fix unit test failure due to changed default
2013-09-08 17:51:27 -07:00
Patrick Wendell
b4e382c210
Adding sc name in metrics source
2013-09-08 16:06:49 -07:00
Patrick Wendell
c190b48bf5
Adding more docs and some code cleanup
2013-09-08 13:46:28 -07:00
Stephen Haberman
df5fd35273
Add better docs for coalesce.
...
Include the useful tip that if shuffle=true, coalesce can actually
increase the number of partitions.
This makes coalesce more like a generic `RDD.repartition` operation.
(Ideally this `RDD.repartition` could automatically choose either a coalesce or
a shuffle if numPartitions was either less than or greater than, respectively,
the current number of partitions.)
2013-09-08 15:39:04 -05:00
Matei Zaharia
04cfb3aa9d
Merge pull request #898 from ilikerps/660
...
SPARK-660: Add StorageLevel support in Python
2013-09-08 10:33:20 -07:00
Patrick Wendell
8de8ee5d3c
Ganglia sink
2013-09-08 10:08:18 -07:00
Matei Zaharia
651a96adf7
More fair scheduler docs and property names.
...
Also changed uses of "job" terminology to "application" when they
referred to an entire Spark program, to avoid confusion.
2013-09-08 00:29:11 -07:00
Matei Zaharia
98fb69822c
Work in progress:
...
- Add job scheduling docs
- Rename some fair scheduler properties
- Organize intro page better
- Link to Apache wiki for "contributing to Spark"
2013-09-08 00:29:11 -07:00
Aaron Davidson
c1cc8c4da2
Export StorageLevel and refactor
2013-09-07 14:41:31 -07:00
Aaron Davidson
8001687af5
Remove reflection, hard-code StorageLevels
...
The sc.StorageLevel -> StorageLevel pathway is a bit janky, but otherwise
the shell would have to call a private method of SparkContext. Having
StorageLevel available in sc also doesn't seem like the end of the world.
There may be a better solution, though.
As for creating the StorageLevel object itself, this seems to be the best
way in Python 2 for creating singleton, enum-like objects:
http://stackoverflow.com/questions/36932/how-can-i-represent-an-enum-in-python
2013-09-07 09:34:07 -07:00
Reynold Xin
210eae26f4
Fixed the bug that ResultTask was not properly deserializing outputId.
2013-09-07 21:59:47 +08:00
Aaron Davidson
b8a0b6ea5e
Memoize StorageLevels read from JVM
2013-09-06 15:36:04 -07:00
Reynold Xin
1e15feb5a3
Hot fix to resolve the compilation error caused by SPARK-821.
2013-09-06 22:44:05 +08:00
Prashant Sharma
4106ae9fbf
Merged with master
2013-09-06 17:53:01 +05:30
Patrick Wendell
ddcb9d310a
Merge pull request #895 from ilikerps/821
...
SPARK-821: Don't cache results when action run locally on driver
2013-09-05 23:54:09 -07:00
Aaron Davidson
a63d4c7dc2
SPARK-660: Add StorageLevel support in Python
...
It uses reflection... I am not proud of that fact, but it at least ensures
compatibility (sans refactoring of the StorageLevel stuff).
2013-09-05 23:36:27 -07:00
Aaron Davidson
3a04e76c89
Reynold's second round of comments
2013-09-05 21:43:26 -07:00
Matei Zaharia
699c331f2f
Merge pull request #891 from xiajunluan/SPARK-864
...
[SPARK-864]DAGScheduler Exception if we delete Worker and StandaloneExecutorBackend then add Worker
2013-09-05 20:21:53 -07:00
Aaron Davidson
4f2236a1c5
Add unit test and address comments
2013-09-05 18:06:30 -07:00
Aaron Davidson
1418d18af4
SPARK-821: Don't cache results when action run locally on driver
...
Caching the results of local actions (e.g., rdd.first()) causes the driver to
store entire partitions in its own memory, which may be highly constrained.
This patch simply makes the CacheManager avoid caching the result of all locally-run computations.
2013-09-05 15:34:42 -07:00
Andrew xia
7c15e3c5de
Fix bug SPARK-864
2013-09-05 15:56:11 +08:00
Patrick Wendell
5c7494d7c1
Merge pull request #893 from ilikerps/master
...
SPARK-884: Add unit test to validate Spark JSON output
2013-09-04 22:47:03 -07:00
Aaron Davidson
714e7f9e32
Fix line over 100 chars
2013-09-04 22:40:08 -07:00
Aaron Davidson
37db141aef
Address Patrick's comments
2013-09-04 21:34:20 -07:00
Aaron Davidson
9e6f2b6822
SPARK-884: Add unit test to validate Spark JSON output
...
This unit test simply validates that the outputs of
the JsonProtocol methods are syntactically valid JSON.
2013-09-04 15:26:46 -07:00
Mridul Muralidharan
1e2474b814
Address review comments - rename toHash to nonNegativeHash
2013-09-04 07:46:46 +05:30
Mridul Muralidharan
b3a82b7df3
Fix hash bug - caused failure after 35k stages, sigh
2013-09-04 07:02:25 +05:30
Mark Hamstra
c9bc8af3d1
Removed repetative import; fixes hidden definition compiler warning.
2013-09-03 15:25:20 -07:00
Patrick Wendell
c592a3c9b9
Minor spacing fix
2013-09-03 14:39:11 -07:00
Patrick Wendell
19f70273d2
Merge pull request #878 from tgravescs/yarnUILink
...
Link the Spark UI up to the Yarn UI
2013-09-03 14:29:10 -07:00
Matei Zaharia
68df2464d1
Merge pull request #889 from alig/master
...
Return the port the WebUI is bound to (useful if port 0 was used)
2013-09-03 13:01:17 -07:00
Y.CORP.YAHOO.COM\tgraves
41c1b5b9a0
Update based on review comments. Change function to prependBaseUri and fix formatting.
2013-09-03 14:46:51 -05:00
Y.CORP.YAHOO.COM\tgraves
c8cc276110
Review comment changes and update to org.apache packaging
2013-09-03 10:50:21 -05:00
Y.CORP.YAHOO.COM\tgraves
547fc4a412
Merge remote-tracking branch 'mesos/master' into yarnUILink
...
Conflicts:
core/src/main/scala/org/apache/spark/ui/UIUtils.scala
core/src/main/scala/org/apache/spark/ui/jobs/PoolTable.scala
core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala
docs/running-on-yarn.md
2013-09-03 08:36:59 -05:00
Ali Ghodsi
b25918d841
Merge branch 'master' of https://github.com/alig/spark
...
Conflicts:
core/src/main/scala/org/apache/spark/deploy/master/Master.scala
2013-09-03 00:56:12 -07:00
Ali Ghodsi
bd0788505f
Using configured akka timeouts
2013-09-03 00:50:35 -07:00
Ali Ghodsi
cbfef9b3ff
Sort order of imports to match project guidelines
2013-09-02 19:33:55 -07:00
Ali Ghodsi
36d8fca2cc
Reynold's comment fixed
2013-09-02 19:31:09 -07:00
Ali Ghodsi
e452bd6d77
Brushing the code up slightly
2013-09-02 19:04:08 -07:00
Ali Ghodsi
cf7b115496
Enabling getting the actual WEBUI port
2013-09-02 18:21:21 -07:00
Matei Zaharia
12b2f1f9c9
Add missing license headers found with RAT
2013-09-02 12:23:03 -07:00
Matei Zaharia
246bf67f58
Fix test
2013-09-02 10:57:34 -07:00
Matei Zaharia
9329a7d4cd
Fix spark.io.compression.codec and change default codec to LZF
2013-09-02 10:15:22 -07:00
Matei Zaharia
6550e5e60c
Allow PySpark to launch worker.py directly on Windows
2013-09-01 18:06:15 -07:00
Matei Zaharia
3db404a43a
Run script fixes for Windows after package & assembly change
2013-09-01 23:45:57 +00:00
Matei Zaharia
0a8cc30921
Move some classes to more appropriate packages:
...
* RDD, *RDDFunctions -> org.apache.spark.rdd
* Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util
* JavaSerializer, KryoSerializer -> org.apache.spark.serializer
2013-09-01 14:13:16 -07:00
Matei Zaharia
5701eb92c7
Fix some URLs
2013-09-01 14:13:16 -07:00
Matei Zaharia
12495ec63a
Remove shutdown hook to stop jetty; this is unnecessary for releasing
...
ports and creates noisy log messages
2013-09-01 14:13:15 -07:00
Matei Zaharia
46eecd110a
Initial work to rename package to org.apache.spark
2013-09-01 14:13:13 -07:00
Matei Zaharia
a30fac16ca
Merge pull request #883 from alig/master
...
Don't require the spark home environment variable to be set for standalone mode (change needed by SIMR)
2013-09-01 12:27:50 -07:00
Matei Zaharia
e34bc3a8ee
Small tweak
2013-08-31 17:47:15 -07:00
Matei Zaharia
2ee6a7e32a
Print output from spark-daemon only when it fails to launch
2013-08-31 17:31:07 -07:00
Ali Ghodsi
250bddc255
Don't require spark home to be set for standalone mode
2013-08-31 17:29:05 -07:00
Matei Zaharia
25ac50668b
Various web UI improvements:
...
- Use "fluid" layout that can expand to wide browser windows, instead of
the old one's limit of 1200 px
- Remove unnecessary <hr> elements
- Switch back to Bootstrap's default theme and tweak progress bar colors
- Make headers more consistent between deploy and app UIs
- Replace some inline CSS with stylesheets
2013-08-31 16:55:40 -07:00
Y.CORP.YAHOO.COM\tgraves
96452eea56
fix up minor things
2013-08-30 16:04:31 -05:00
Y.CORP.YAHOO.COM\tgraves
bac46266a9
Link the Spark UI to the Yarn UI
2013-08-30 15:55:32 -05:00
Mikhail Bautin
35090958b3
Also add getConf to NewHadoopRDD
2013-08-30 11:03:57 -07:00
Mikhail Bautin
5e30172f70
Make HadoopRDD's configuration accessible
2013-08-30 11:01:06 -07:00
Matei Zaharia
ca71620950
Merge pull request #857 from mateiz/assembly
...
Change build and run instructions to use assemblies
2013-08-29 21:51:14 -07:00
Matei Zaharia
666d93c294
Update Maven build to create assemblies expected by new scripts
...
This includes the following changes:
- The "assembly" package now builds in Maven by default, and creates an
assembly containing both hadoop-client and Spark, unlike the old
BigTop distribution assembly that skipped hadoop-client
- There is now a bigtop-dist package to build the old BigTop assembly
- The repl-bin package is no longer built by default since the scripts
don't reply on it; instead it can be enabled with -Prepl-bin
- Py4J is now included in the assembly/lib folder as a local Maven repo,
so that the Maven package can link to it
- run-example now adds the original Spark classpath as well because the
Maven examples assembly lists spark-core and such as provided
- The various Maven projects add a spark-yarn dependency correctly
2013-08-29 21:19:06 -07:00
Matei Zaharia
aab345c463
Fix finding of assembly JAR, as well as some pointers to ./run
2013-08-29 21:19:06 -07:00
Matei Zaharia
ab0e625d9e
Fix PySpark for assembly run and include it in dist
2013-08-29 21:19:06 -07:00
Matei Zaharia
53cd50c069
Change build and run instructions to use assemblies
...
This commit makes Spark invocation saner by using an assembly JAR to
find all of Spark's dependencies instead of adding all the JARs in
lib_managed. It also packages the examples into an assembly and uses
that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
with two better-named scripts: "run-examples" for examples, and
"spark-class" for Spark internal classes (e.g. REPL, master, etc). This
is also designed to minimize the confusion people have in trying to use
"run" to run their own classes; it's not meant to do that, but now at
least if they look at it, they can modify run-examples to do a decent
job for them.
As part of this, Bagel's examples are also now properly moved to the
examples package instead of bagel.
2013-08-29 21:19:04 -07:00
jerryshao
f3dbe6b215
Fix removed block zero size log reporting
2013-08-30 09:39:01 +08:00
Patrick Wendell
abdbacf252
Merge pull request #871 from pwendell/expose-local
...
Expose `isLocal` in SparkContext.
2013-08-28 21:11:31 -07:00
Patrick Wendell
30d2421112
Make local variable public
2013-08-28 19:53:31 -07:00
Matei Zaharia
baa84e7e4c
Merge pull request #865 from tgravescs/fixtmpdir
...
Spark on Yarn should use yarn approved directories for spark.local.dir and tmp
2013-08-28 12:44:46 -07:00
Y.CORP.YAHOO.COM\tgraves
aac1214ee4
Change Executor to only look at the env variable SPARK_YARN_MODE
2013-08-28 13:26:26 -05:00
Y.CORP.YAHOO.COM\tgraves
3f206bf0b5
Updated based on review comments.
2013-08-27 14:34:27 -05:00
Y.CORP.YAHOO.COM\tgraves
cf52a3cba6
Allow for Executors to have different directories then the Spark Master for Yarn
2013-08-27 11:00:21 -05:00
Reynold Xin
a77e0abb96
Added worker state to the cluster master JSON ui.
2013-08-26 11:21:03 -07:00
Reynold Xin
9db1e50344
Revert "Merge pull request #841 from rxin/json"
...
This reverts commit 1fb1b09928
, reversing
changes made to c69c48947d
.
2013-08-26 11:05:14 -07:00
Matei Zaharia
8a36fd09dd
Merge pull request #854 from markhamstra/pomUpdate
...
Synced sbt and maven builds to use the same dependencies, etc.
2013-08-22 10:13:35 -07:00
Matei Zaharia
c2d00f12e2
Merge pull request #832 from alig/coalesce
...
Coalesced RDD with locality
2013-08-22 10:13:03 -07:00
Mark Hamstra
ff6f1b0500
Synced sbt and maven builds
2013-08-21 13:50:24 -07:00
Mark Hamstra
5eea613ec0
Removed meaningless types
2013-08-20 16:49:18 -07:00
Ali Ghodsi
f20ed14e87
Merged in from upstream to use TaskLocation instead of strings
2013-08-20 16:21:43 -07:00
Ali Ghodsi
5cd21c4195
added curly braces to make the code more consistent
2013-08-20 16:16:05 -07:00
Ali Ghodsi
db4bc55bef
indent
2013-08-20 16:16:05 -07:00
Ali Ghodsi
c0942a710f
Bug in test fixed
2013-08-20 16:16:05 -07:00
Ali Ghodsi
5db41919b5
Added a test to make sure no locality preferences are ignored
2013-08-20 16:16:05 -07:00
Ali Ghodsi
7b123b3126
Simpler code
2013-08-20 16:16:05 -07:00
Ali Ghodsi
9192c358e4
simpler code
2013-08-20 16:16:05 -07:00
Ali Ghodsi
a75a64eade
Fixed almost all of Matei's feedback
2013-08-20 16:16:05 -07:00
Ali Ghodsi
f1c853d76d
fixed Matei's comments
2013-08-20 16:16:04 -07:00
Ali Ghodsi
890ea6ba79
making CoalescedRDDPartition public
2013-08-20 16:16:04 -07:00
Ali Ghodsi
d6b6c680be
comment in the test to make it more understandable
2013-08-20 16:16:04 -07:00
Ali Ghodsi
b69e7166ba
Coalescer now uses current preferred locations for derived RDDs. Made run() in DAGScheduler thread safe and added a method to be able to ask it for preferred locations. Added a similar method that wraps the former inside SparkContext.
2013-08-20 16:16:04 -07:00
Ali Ghodsi
3b5bb8a4ae
added one test that will test a future functionality
2013-08-20 16:13:37 -07:00
Ali Ghodsi
33a0f59354
Added error messages to the tests to make failed tests less cryptic
2013-08-20 16:13:37 -07:00
Ali Ghodsi
abcefb3858
fixed matei's comments
2013-08-20 16:13:37 -07:00
Ali Ghodsi
35537e6341
Made a function object that returns the coalesced groups
2013-08-20 16:13:37 -07:00
Ali Ghodsi
339598c080
several of Reynold's suggestions implemented
2013-08-20 16:13:37 -07:00
Ali Ghodsi
02d6464f2f
space removed
2013-08-20 16:13:37 -07:00
Ali Ghodsi
4f99be1ffd
use count rather than foreach
2013-08-20 16:13:37 -07:00
Ali Ghodsi
f67753cdfc
made preferredLocation a val of the surrounding case class
2013-08-20 16:13:37 -07:00
Ali Ghodsi
f24861b60a
Fix bug in tests
2013-08-20 16:13:36 -07:00
Ali Ghodsi
f6e47e8b51
Renamed split to partition
2013-08-20 16:13:36 -07:00
Ali Ghodsi
937f72feb8
word wrap before 100 chars per line
2013-08-20 16:13:36 -07:00
Ali Ghodsi
c4d59910b1
added goals inline as comment
2013-08-20 16:13:36 -07:00
Ali Ghodsi
7a2a33e32d
Large scale load and locality tests for the coalesced partitions added
2013-08-20 16:13:36 -07:00
Ali Ghodsi
66edf854aa
Bug, should compute slack wrt parent partition size, not number of bins
2013-08-20 16:13:36 -07:00
Ali Ghodsi
1ede102ba5
load balancing coalescer
2013-08-20 16:13:36 -07:00
Matei Zaharia
aa2b89d98d
Merge remote-tracking branch 'jey/hadoop-agnostic'
...
Conflicts:
core/src/main/scala/spark/PairRDDFunctions.scala
2013-08-20 10:14:15 -07:00
Mark Hamstra
1630fbf838
changeGeneration --> changeEpoch renaming
2013-08-20 00:17:16 -07:00
Mark Hamstra
ad18410427
Renamed 'priority' to 'jobId' and assorted minor changes
2013-08-20 00:07:04 -07:00
Matei Zaharia
8cae72e94e
Merge pull request #828 from mateiz/sched-improvements
...
Scheduler fixes and improvements
2013-08-19 23:40:04 -07:00
Matei Zaharia
efeb142981
Merge pull request #849 from mateiz/web-fixes
...
Small fixes to web UI
2013-08-19 19:23:50 -07:00
Matei Zaharia
793a722f8e
Allow some wiggle room in UISuite port test and in EC2 ports
2013-08-19 18:51:00 -07:00
Matei Zaharia
abdc1f8bbb
Merge pull request #847 from rxin/rdd
...
Allow subclasses of Product2 in all key-value related classes
2013-08-19 18:30:56 -07:00
Matei Zaharia
498a26189b
Small fixes to web UI:
...
- Use SPARK_PUBLIC_DNS environment variable if set (for EC2)
- Use a non-ephemeral port (3030 instead of 33000) by default
- Updated test to use non-ephemeral port too
2013-08-19 18:17:49 -07:00
Reynold Xin
5054abd41b
Code review feedback. (added tests for cogroup and substract; added more documentation on MutablePair)
2013-08-19 12:58:02 -07:00
Reynold Xin
acc4aa1f47
Added a test for sorting using MutablePair's.
2013-08-19 11:02:10 -07:00
Reynold Xin
71d705a66e
Made PairRDDFunctions taking only Tuple2, but made the rest of the shuffle code path working with general Product2.
2013-08-19 00:40:43 -07:00
Reynold Xin
2a7b99c08b
Added the missing RDD files and cleaned up SparkContext.
2013-08-18 20:39:29 -07:00
Reynold Xin
82bf4c0339
Allow subclasses of Product2 in all key-value related classes (ShuffleDependency, PairRDDFunctions, etc).
2013-08-18 20:25:45 -07:00
Matei Zaharia
8ac3d1e263
Added unit tests for ClusterTaskSetManager, and fix a bug found with
...
resetting locality level after a non-local launch
2013-08-18 19:51:07 -07:00
Matei Zaharia
4004cf775d
Added some comments on threading in scheduler code
2013-08-18 19:51:07 -07:00
Matei Zaharia
2a4ed10210
Address some review comments:
...
- When a resourceOffers() call has multiple offers, force the TaskSets
to consider them in increasing order of locality levels so that they
get a chance to launch stuff locally across all offers
- Simplify ClusterScheduler.prioritizeContainers
- Add docs on the new configuration options
2013-08-18 19:51:07 -07:00
Matei Zaharia
222c897128
Comment cleanup (via Kay) and some debug messages
2013-08-18 19:51:07 -07:00
Matei Zaharia
cf39d45d14
More scheduling fixes:
...
- Added periodic revival of offers in StandaloneSchedulerBackend
- Replaced task scheduling aggression with multi-level delay scheduling
in ClusterTaskSetManager
- Fixed ZippedRDD preferred locations because they can't currently be
process-local
- Fixed some uses of hostPort
2013-08-18 19:51:07 -07:00
Matei Zaharia
90a04dab8d
Initial work towards scheduler refactoring:
...
- Replace use of hostPort vs host in Task.preferredLocations with a
TaskLocation class that contains either an executorId and a host or
just a host. This is part of a bigger effort to eliminate hostPort
based data structures and just use executorID, since the hostPort vs
host stuff is confusing (and not checkable with static typing, leading
to ugly debug code), and hostPorts are not provided by Mesos.
- Replaced most hostPort-based data structures and fields as above.
- Simplified ClusterTaskSetManager to deal with preferred locations in a
more concise way and generally be more concise.
- Updated the way ClusterTaskSetManager handles racks: instead of
enqueueing a task to a separate queue for all the hosts in the rack,
which would create lots of large queues, have one queue per rack name.
- Removed non-local fallback stuff in ClusterScheduler that tried to
launch less-local tasks on a node once the local ones were all
assigned. This change didn't work because many cluster schedulers send
offers for just one node at a time (even the standalone and YARN ones
do so as nodes join the cluster one by one). Thus, lots of non-local
tasks would be assigned even though a node with locality for them
would be able to receive tasks just a short time later.
- Renamed MapOutputTracker "generations" to "epochs".
2013-08-18 19:51:06 -07:00
Jey Kottalam
bdd861c6c3
Fix Maven build with Hadoop 0.23.9
2013-08-18 18:28:57 -07:00
Matei Zaharia
8fa0747978
Merge pull request #840 from AndreSchumacher/zipegg
...
Implementing SPARK-878 for PySpark: adding zip and egg files to context ...
2013-08-18 17:02:54 -07:00
Reynold Xin
2c00ea3efc
Moved shuffle serializer setting from a constructor parameter to a setSerializer method in various RDDs that involve shuffle operations.
2013-08-17 21:43:29 -07:00
Reynold Xin
0e84fee76b
Removed the mapSideCombine option in partitionBy.
2013-08-17 21:13:41 -07:00
Reynold Xin
10af952a3d
Removed the mapSideCombine option in CoGroupedRDD.
2013-08-17 21:07:34 -07:00
Reynold Xin
5d050a3e1f
Removed the unused shuffleId in ShuffleDependency's constructor.
2013-08-16 23:23:16 -07:00
Matei Zaharia
e89ffc7b3c
Merge pull request #839 from jegonzal/zip_partitions
...
Currying RDD.zipPartitions
2013-08-16 14:02:34 -07:00
Jey Kottalam
ad580b94d5
Maven build now also works with YARN
2013-08-16 13:50:12 -07:00
Jey Kottalam
9dd15fe700
Don't mark hadoop-client as 'provided'
2013-08-16 13:50:12 -07:00
Jey Kottalam
11b42a84db
Maven build now works with CDH hadoop-2.0.0-mr1
2013-08-16 13:50:12 -07:00
Jey Kottalam
353fab2440
Initial changes to make Maven build agnostic of hadoop version
2013-08-16 13:50:12 -07:00
Joseph E. Gonzalez
53b2639a1e
Reversing the argument order in zipPartitions to enable stronger type inference.
2013-08-16 12:38:59 -07:00
Andre Schumacher
c7e348faec
Implementing SPARK-878 for PySpark: adding zip and egg files to context and passing it down to workers which add these to their sys.path
2013-08-16 11:58:20 -07:00
Reynold Xin
c961c19b7b
Use the JSON formatter from Scala library and removed dependency on lift-json.
...
It made the JSON creation slightly more complicated, but reduces one external dependency. The scala library also properly escape "/" (which lift-json doesn't).
2013-08-15 18:23:01 -07:00
Reynold Xin
eddbf43b54
Revert "Merge pull request #834 from Daemoen/master"
...
This reverts commit 230ab2722e
, reversing
changes made to 659553b21d
.
2013-08-15 17:49:37 -07:00
Reynold Xin
230ab2722e
Merge pull request #834 from Daemoen/master
...
Updated json output to allow for display of worker state
2013-08-15 17:45:17 -07:00
Patrick Wendell
659553b21d
Merge pull request #836 from pwendell/rename
...
Rename `memoryBytesToString` and `memoryMegabytesToString`
2013-08-15 16:56:31 -07:00
Jey Kottalam
a06a9d5c5f
Rename HadoopWriter to SparkHadoopWriter since it's outside of our package
2013-08-15 16:50:37 -07:00
Jey Kottalam
8f979edef5
Fix newTaskAttemptID to work under YARN
2013-08-15 16:50:37 -07:00
Jey Kottalam
e2d7656ca3
re-enable YARN support
2013-08-15 16:50:37 -07:00
Jey Kottalam
bd0bab47c9
SparkEnv isn't available this early, and not needed anyway
2013-08-15 16:50:37 -07:00
Jey Kottalam
4f43fd791a
make SparkHadoopUtil a member of SparkEnv
2013-08-15 16:50:37 -07:00
Jey Kottalam
43ebcb8484
rename HadoopMapRedUtil => SparkHadoopMapRedUtil, HadoopMapReduceUtil => SparkHadoopMapReduceUtil
2013-08-15 16:50:37 -07:00
Jey Kottalam
8b1c1520fc
add comment
2013-08-15 16:50:37 -07:00
Jey Kottalam
69c3bbf688
dynamically detect hadoop version
2013-08-15 16:50:37 -07:00
Jey Kottalam
f67b94ad4f
remove core/src/hadoop{1,2} dirs
2013-08-15 16:50:36 -07:00
Jey Kottalam
b877e20a33
move yarn to its own directory
2013-08-15 16:50:36 -07:00
Patrick Wendell
4c6ade1ad5
Rename memoryBytesToString
and memoryMegabytesToString
...
These are used all over the place now and they are not specific to memory at all.
memoryBytesToString --> bytesToString
memoryMegabytesToString --> megabytesToString
2013-08-15 15:58:07 -07:00
Reynold Xin
1a51deae8a
More minor UI changes including code review feedback.
2013-08-15 14:34:07 -07:00
Daemoen
ad2e8b5126
Updated json output to allow for display of worker state
...
Ops teams need to ensure that the cluster is functional and performant. Having to scrape the html source for worker state won't work reliably, and will be slow. By exposing the state in the json output, ops teams are able to ensure a fully functional environment by querying for the json output and parsing for dead nodes.
2013-08-15 12:19:14 -07:00
Reynold Xin
2d2a556bdf
Various UI improvements.
2013-08-14 23:23:09 -07:00
Reynold Xin
290e3e6e65
Renamed setCurrentJobDescription to setJobDescription.
2013-08-14 18:40:53 -07:00
Reynold Xin
3886b54933
A few small scheduler / job description changes.
...
1. Renamed SparkContext.addLocalProperty to setLocalProperty. And allow this function to unset a property.
2. Renamed SparkContext.setDescription to setCurrentJobDescription.
3. Throw an exception if the fair scheduler allocation file is invalid.
2013-08-14 17:19:42 -07:00
Matei Zaharia
839f2d4f3f
Merge pull request #822 from pwendell/ui-features
...
Adding GC Stats to TaskMetrics (and three small fixes)
2013-08-14 16:17:23 -07:00
Patrick Wendell
04ad78b09d
Style cleanup based on Matei feedback
2013-08-14 14:57:21 -07:00
Kay Ousterhout
a88aa5e6ed
Fixed 2 bugs in executor UI.
...
1) UI crashed if the executor UI was loaded before any tasks started.
2) The total tasks was incorrectly reported due to using string (rather
than int) arithmetic.
2013-08-13 23:44:58 -07:00
Patrick Wendell
c223176388
Small style clean-up
2013-08-13 16:56:37 -07:00
Patrick Wendell
fab5cee111
Correcting terminology in RDD page
2013-08-13 16:25:55 -07:00
Patrick Wendell
024e5c5ce1
Correct sorting order for stages
2013-08-13 16:25:55 -07:00
Patrick Wendell
4e9f0c2df6
Capturing GC detials in TaskMetrics
2013-08-13 16:25:55 -07:00
Patrick Wendell
f0382007dc
Bug fix for display of shuffle read/write metrics.
...
This fixes an error where empty cells are missing if a given task
has no shuffle read/write.
2013-08-13 16:25:55 -07:00
Matei Zaharia
d316af9c84
Merge pull request #821 from pwendell/print-launch-command
...
Print run command to stderr rather than stdout
2013-08-13 15:31:01 -07:00
Patrick Wendell
a7feb69ae8
Print run command to stderr rather than stdout
2013-08-13 15:07:03 -07:00
Kay Ousterhout
1beb843a6f
Reuse the set of failed states rather than creating a new object each time
2013-08-13 14:27:40 -07:00
Kay Ousterhout
c92dd627ca
Properly account for killed tasks.
...
The TaskState class's isFinished() method didn't return true for
KILLED tasks, which means some resources are never reclaimed
for tasks that are killed. This also made it inconsistent with the
isFinished() method used by CoarseMesosSchedulerBackend.
2013-08-13 12:40:15 -07:00
Patrick Wendell
ed6a1646e6
Slight change to pr-784
2013-08-13 09:29:40 -07:00
Patrick Wendell
a0133bfbad
Merge pull request #784 from jerryshao/dev-metrics-servlet
...
Add MetricsServlet for Spark metrics system
2013-08-13 09:28:18 -07:00
Matei Zaharia
65d0d91fba
Merge pull request #807 from JoshRosen/guava-optional
...
Change scala.Option to Guava Optional in Java APIs
2013-08-12 19:00:57 -07:00
Josh Rosen
cf08bb7a3e
Fix import organization.
2013-08-12 18:55:02 -07:00
jerryshao
09c7179e81
MetricsServlet code refactor according to comments
2013-08-12 13:23:23 +08:00
jerryshao
320e87e7ab
Add MetricsServlet for Spark metrics system
2013-08-12 13:23:23 +08:00
Reynold Xin
e5b9ed2833
Merge pull request #808 from pwendell/ui_compressed_bytes
...
Report compressed bytes read when calculating TaskMetrics
2013-08-11 17:22:47 -07:00
Patrick Wendell
3d8f281604
Report compressed bytes read when calculating TaskMetrics
2013-08-11 16:25:57 -07:00
Matei Zaharia
379648630b
Merge pull request #805 from woggle/hadoop-rdd-jobconf
...
Use new Configuration() instead of slower new JobConf() in SerializableWritable
2013-08-11 14:51:47 -07:00
Josh Rosen
d7f78b443b
Change scala.Option to Guava Optional in Java APIs.
2013-08-11 12:05:09 -07:00
Charles Reiss
6402b539d0
Use new Configuration() instead of new JobConf() for ObjectWritable.
...
JobConf's constructor loads default config files in some verisons of
Hadoop, which is quite slow, and we only need the Configuration object
to pass the correct ClassLoader.
2013-08-10 21:31:05 -07:00
Matei Zaharia
71c63de22f
Merge pull request #795 from mridulm/master
...
Fix bug reported in PR 791 : a race condition in ConnectionManager and Connection
2013-08-10 10:21:20 -07:00
Matei Zaharia
d3277a0daf
Merge remote-tracking branch 'origin/pr/792'
...
Conflicts:
core/src/main/scala/spark/ui/jobs/IndexPage.scala
core/src/main/scala/spark/ui/jobs/StagePage.scala
2013-08-10 10:18:50 -07:00
Patrick Wendell
d17eeb997d
Merge pull request #785 from anfeng/master
...
expose HDFS file system stats via Executor metrics
2013-08-10 09:02:27 -07:00
Kay Ousterhout
14d14f451a
Shortened names, as per Matei's suggestion
2013-08-10 07:50:27 -07:00
Matei Zaharia
cd247ba5bb
Merge pull request #786 from shivaram/mllib-java
...
Java fixes, tests and examples for ALS, KMeans
2013-08-09 20:41:13 -07:00
Kay Ousterhout
7810a76512
Only print event queue full error message once
2013-08-09 18:20:48 -07:00
Kay Ousterhout
44ca8629d8
Style fix: removing unnecessary return type
2013-08-09 17:22:50 -07:00
Kay Ousterhout
29b79714f9
Style fixes based on code review
2013-08-09 16:46:34 -07:00
Kay Ousterhout
81e1d4a7d1
Refactored SparkListener to process all events asynchronously.
...
This commit fixes issues where SparkListeners that take a while to
process events slow the DAGScheduler.
This commit also fixes a bug in the UI where if a user goes to a
web page of a stage that does not exist, they can create a memory
leak (granted, this is not an issue at small scale -- probably only
an issue if someone actively tried to DOS the UI).
2013-08-09 13:27:41 -07:00
Matei Zaharia
b09d4b79e8
Merge pull request #799 from woggle/sync-fix
...
Remove extra synchronization in ResultTask
2013-08-09 13:17:08 -07:00
Patrick Wendell
cc6b92e80e
Merge pull request #775 from pwendell/print-launch-command
...
Log the launch command for Spark daemons
2013-08-09 13:00:33 -07:00
Patrick Wendell
3970b580c2
Using quotes when printing out command
2013-08-09 11:53:32 -07:00
Charles Reiss
9dfc280f74
Remove extra synchronization in ResultTask
2013-08-09 11:09:02 -07:00
Matei Zaharia
f94fc75c3f
Merge pull request #788 from shane-huang/sparkjavaopts
...
For standalone mode, add worker local env setting of SPARK_JAVA_OPTS as ...
2013-08-09 10:04:03 -07:00
Matei Zaharia
d1e1c1b24d
Add test for Kryo with WrappedArray (which was failing in Chill 0.3.0)
2013-08-08 13:34:11 -07:00
Matei Zaharia
5a4003c1ac
Update to Chill 0.3.1
2013-08-08 13:30:27 -07:00
Mridul Muralidharan
c230ca3b4e
Change line size
2013-08-08 22:28:40 +05:30
Mridul Muralidharan
dc47084f4e
Attempt to fix bug reported in PR 791 : a race condition in ConnectionManager and Connection
2013-08-08 22:19:27 +05:30
Kay Ousterhout
88049a214d
Fixed 3 bugs that caused UI to crash (including SPARK-810).
...
One bug caused the UI to crash if you try to look at a job's status
before any of the tasks have finished.
The second bug was a concurrency issue where two different threads
(the scheduling thread and a UI thread) could be reading/updating
the data structures in JobProgressListener concurrently.
The third bug mis-used an Option, also causing the UI to crash
under certain conditions.
2013-08-07 23:09:25 -07:00
Patrick Wendell
b4321edf68
Reverting boostrap change
2013-08-07 22:18:18 -07:00
Patrick Wendell
21392f2a73
Change I forgot to merge in
2013-08-07 21:45:32 -07:00
Patrick Wendell
706394b370
Bumping font size to 14px and fixing sytle issue in progress bars
2013-08-07 21:27:04 -07:00
Patrick Wendell
8c0d668468
Merge branch 'master' into bootstrap-design
...
Conflicts:
core/src/main/scala/spark/ui/UIUtils.scala
core/src/main/scala/spark/ui/jobs/IndexPage.scala
core/src/main/scala/spark/ui/storage/RDDPage.scala
2013-08-07 21:06:03 -07:00
Kay Ousterhout
b88e26248e
Fixed issue in UI that limited scheduler throughput.
...
Removal of items from ArrayBuffers in the UI code was slow and
significantly impacted scheduler throughput. This commit
improves scheduler throughput by 5x.
2013-08-07 14:42:05 -07:00
shane-huang
cbc5107e36
For standalone mode, add worker local env setting of SPARK_JAVA_OPTS as default and let application env override default options if applicable
...
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-08-07 14:36:48 +08:00
Matei Zaharia
6b043a6f11
Merge pull request #724 from dlyubimov/SPARK-826
...
SPARK-826: fold(), reduce(), collect() always attempt to use java serialization
2013-08-06 22:31:02 -07:00
Matei Zaharia
7c4b7a53b1
Merge remote-tracking branch 'origin/pr/781'
...
Conflicts:
core/src/main/resources/spark/ui/static/webui.css
2013-08-06 17:19:49 -07:00
Karen Feng
908032e79b
Used saturated colors for progress bars
2013-08-06 16:52:21 -07:00
Karen Feng
8bc497fa10
Lightened color of progress bars
2013-08-06 16:33:05 -07:00
Karen Feng
ca1903ea63
Overlays progress text on top of bar
2013-08-06 15:45:42 -07:00
Matei Zaharia
df4d10d630
Merge pull request #779 from adatao/adatao-global-SparkEnv
...
[HOTFIX] Extend thread safety for SparkEnv.get()
2013-08-06 15:44:05 -07:00
Shivaram Venkataraman
471fbadd0c
Java examples, tests for KMeans and ALS
...
- Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it
easier to call from Java
- Renames class methods from `train` to `run` to enable static methods to be
called from Java.
- Add unit tests which check if both static / class methods can be called.
- Also add examples which port the main() function in ALS, KMeans to the
examples project.
Couple of minor changes to existing code:
- Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily
- Workaround a bug where using double[] from Java leads to class cast exception in
KMeans init
2013-08-06 15:43:46 -07:00
anfeng
dda2ac8b5d
reformat registerFileSystemStat()
2013-08-06 15:22:25 -07:00
Karen Feng
099528b6c4
Pre-sorts stage/env tables, changes text/link of stage summaries
2013-08-06 14:52:12 -07:00
Karen Feng
254a930730
Reverse sorts StageTable by submitted time
2013-08-06 14:18:38 -07:00
Karen Feng
5ed5b73026
Sorts first column of env tables
2013-08-06 13:59:53 -07:00
anfeng
0748c60817
expose HDFS file system stats via Executor metrics
2013-08-06 11:47:06 -07:00
Reynold Xin
d031f73679
Merge pull request #782 from WANdisco/master
...
SHARK-94 Log the files computed by HadoopRDD and NewHadoopRDD
2013-08-05 22:33:00 -07:00
Matei Zaharia
1b63dea816
Merge pull request #769 from markhamstra/NegativeCores
...
SPARK-847 + SPARK-845: Zombie workers and negative cores
2013-08-05 22:21:26 -07:00
Alexander Pivovarov
a30866438b
SHARK-94 Log the files computed by HadoopRDD and NewHadoopRDD
2013-08-05 21:48:43 -07:00
Matei Zaharia
8b277892c9
Merge pull request #774 from pwendell/job-description
...
Show user-defined job name in UI
2013-08-05 19:14:52 -07:00
Christopher Nguyen
b1bbbe699c
[HOTFIX] Mark lastSetSparkEnv @volatile in case it gets HotSpot-cached
...
On branch adatao-global-SparkEnv
Changes to be committed:
modified: core/src/main/scala/spark/SparkEnv.scala
2013-08-05 17:22:27 -07:00
Mark Hamstra
35d8f5ee52
Moved handling of timed out workers within the Master actor
2013-08-05 13:13:56 -07:00
Mark Hamstra
37ccf9301a
milliseconds -> seconds in timeOutDeadWorkers logging
2013-08-05 13:13:56 -07:00
Mark Hamstra
cdd1af562e
Timeout zombie workers
2013-08-05 13:13:56 -07:00
Mikhail Bautin
e8bec8365f
Only reduce the number of cores once when removing an executor
2013-08-05 13:13:56 -07:00
Karen Feng
95025afdec
Made most small fixes for SPARK-849 except for table sort, task progress overlay
2013-08-05 13:04:56 -07:00
Bill Zhao
87134b3648
SPARK-850: give better console message
2013-08-05 11:55:35 -07:00
Christopher Nguyen
39e4fda76f
[HOTFIX] Extend thread safety for SparkEnv.get()
...
A ThreadLocal SparkEnv.env is facing various situations leading to
NullPointerExceptions, where SparkEnv.env set in one thread is not
gettable in another thread, but often assumed to be available.
See, e.g., https://groups.google.com/forum/#!topic/spark-developers/GLx8yunSj0A
This hotfixes SparkEnv.env to return either (a) the ThreadLocal
value if non-null, or (b) the previously set value in any thread.
This approach preserves SparkEnv.set() thread safety needed by
RDD.compute() and possibly other places. A refactoring that
parameterizes SparkEnv should be addressed subsequently.
On branch adatao-global-SparkEnv
Changes to be committed:
modified: core/src/main/scala/spark/SparkEnv.scala
2013-08-05 02:09:54 -07:00
Patrick Wendell
f3660d5ab8
Make output formatting consistent between bash/scala
2013-08-03 21:30:15 -07:00
Patrick Wendell
ad94fbb322
Log the launch command for Spark executors
2013-08-03 09:19:46 -07:00
Matei Zaharia
22abbc10d6
Merge pull request #772 from karenfeng/ui-843
...
Show app duration
2013-08-02 16:37:59 -07:00
Patrick Wendell
5b3784a79c
Show user-defined job name in UI
2013-08-02 15:47:41 -07:00
Karen Feng
b3ae5b25d5
Shows time the app has been running
2013-08-02 13:25:14 -07:00
Patrick Wendell
9d7dfd2d5a
Merge pull request #743 from pwendell/app-metrics
...
Add application metrics to standalone master
2013-08-01 17:41:58 -07:00
Patrick Wendell
f1d2ad550e
under_scores --> camelCase for config options
2013-08-01 15:26:26 -07:00
Patrick Wendell
12d9c82c9b
Small style fix
2013-08-01 15:25:52 -07:00
Patrick Wendell
37bc64a205
Adding application-level metrics.
...
This adds metrics for applications in the deploy Master.
2013-08-01 15:25:52 -07:00
Karen Feng
73692f3cb9
Unify, reduce body font size
2013-08-01 15:10:30 -07:00
Patrick Wendell
87fd321a5a
Minor refactoring and code cleanup
2013-08-01 15:02:31 -07:00
Patrick Wendell
b10199413a
Slight refactoring to SparkContext functions
2013-08-01 15:00:42 -07:00
Patrick Wendell
cfcd77b5da
Increasing inter job arrival
2013-08-01 15:00:42 -07:00
Patrick Wendell
5faac7f4f3
Minor style fixes
2013-08-01 15:00:42 -07:00
Patrick Wendell
5e7b38fbb3
Merge pull request #695 from xiajunluan/pool_ui
...
Enhance job ui in spark ui system with adding pool information
2013-08-01 14:59:33 -07:00
Karen Feng
47600e9579
Removed hr margin
2013-08-01 14:57:04 -07:00
Karen Feng
e648a62fc8
Inserted needed line break for log paging
2013-08-01 14:46:19 -07:00
Karen Feng
686d6266c4
Use nav pills instead of default
2013-08-01 14:41:49 -07:00
Karen Feng
86d372d17f
Removed line breaks
2013-08-01 14:37:21 -07:00
Karen Feng
99803d88b9
Reduced all header sizes
2013-08-01 14:18:33 -07:00
Karen Feng
d216d687ef
Reduced size of table text to compact
2013-08-01 13:27:23 -07:00
Karen Feng
5dae283996
Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update
2013-08-01 11:28:28 -07:00
Matei Zaharia
0a96493ac6
Merge pull request #760 from karenfeng/heading-update
...
Clean up web UI page headers
2013-08-01 11:27:17 -07:00
Patrick Wendell
9177bea2b4
Removing extra imports
2013-08-01 10:42:50 -07:00
Patrick Wendell
3e4d5e5f8b
Merge branch 'master' into master-json
...
Conflicts:
core/src/main/scala/spark/deploy/master/ui/IndexPage.scala
2013-08-01 10:42:07 -07:00
Patrick Wendell
ffc034e4fb
Import cleanup
2013-08-01 10:39:56 -07:00
Andrew xia
d58502a156
fix bug of spark "SubmitStage" listener as unit test error
2013-08-01 23:21:41 +08:00
Andrew xia
3b5a11e765
change function name "setName" to "setProperties" as "setName" is also member of Thread class
2013-08-01 19:37:15 +08:00
Dmitriy Lyubimov
d29ee3689b
Merge fixes merge commit hasn't picked
2013-08-01 00:21:26 -07:00
Dmitriy Lyubimov
cb6be5bd7e
Merge remote-tracking branch 'mesos/master' into SPARK-826
...
Conflicts:
core/src/main/scala/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/spark/scheduler/local/LocalTaskSetManager.scala
core/src/test/scala/spark/KryoSerializerSuite.scala
2013-07-31 22:09:22 -07:00
Dmitriy Lyubimov
28f1550f01
More elegant rewrite of the same.
2013-07-31 21:41:00 -07:00
Dmitriy Lyubimov
7c52ecc6a4
(1) added reduce test case.
...
(2) added nested streaming in ParallelCollectionRDD
(3) added kryo with fold test which still doesn't work
2013-07-31 19:27:30 -07:00
Matei Zaharia
3097d75d6f
Merge remote-tracking branch 'dlyubimov/SPARK-827'
...
Conflicts:
docs/configuration.md
2013-07-31 18:36:43 -07:00
Karen Feng
7c9c5ef6c6
Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update
2013-07-31 16:39:26 -07:00
Karen Feng
02cde8efdf
Replaces theme with Bootswatch Spacelab theme
2013-07-31 16:34:07 -07:00
Karen Feng
09cd67bf98
Changed bootstrap colors, fixed logpaging buttons
2013-07-31 16:18:53 -07:00
Matei Zaharia
39c75f3033
Merge pull request #757 from BlackNiuza/result_task_generation
...
Bug fix: SPARK-837
2013-07-31 15:52:36 -07:00
Matei Zaharia
14bf2fe039
Merge pull request #749 from benh/spark-executor-uri
...
Added property 'spark.executor.uri' for launching on Mesos.
2013-07-31 14:18:16 -07:00
Benjamin Hindman
4692ea4892
Used 'uri.split('/').last' instead of 'new File(uri).getName()'.
2013-07-31 12:29:44 -07:00
Karen Feng
c453967f9a
Reduced size of heading
2013-07-31 11:57:50 -07:00
Matei Zaharia
a386ced2c6
Merge pull request #754 from rxin/compression
...
Compression codec change
2013-07-31 11:22:50 -07:00
Karen Feng
49e6344142
Removed master URL from job UI, reduced heading size of basic spark pages
2013-07-31 11:17:59 -07:00
Reynold Xin
c61843a69f
Changed other LZF uses to use the compression codec interface.
2013-07-31 10:32:13 -07:00
Patrick Wendell
89da9d94b3
Add JSON path to master index page
2013-07-31 09:47:53 -07:00
BlackNiuza
9a815de4bf
write and read generation in ResultTask
2013-08-01 00:36:47 +08:00
Roman Tkalenko
0c6553714a
Refactored Vector.apply(length, initializer) replacing excessive code with library method
...
(also removed unused variable ```ans``` as minor change)
2013-07-31 19:05:46 +03:00
Matei Zaharia
12553e5c55
Simplified nonNegativeMod to match previous version
2013-07-31 08:50:28 -07:00
Matei Zaharia
d4556f4207
Merge pull request #751 from cdshines/master
...
Cleaned Partitioner & PythonPartitioner source by taking out non-related logic to Utils
2013-07-31 08:48:14 -07:00
Andrew xia
5670c96f29
Merge branch 'master' into Pool_UI
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/scheduler/DAGScheduler.scala
core/src/main/scala/spark/scheduler/SparkListener.scala
core/src/main/scala/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala
core/src/main/scala/spark/scheduler/local/LocalTaskSetManager.scala
core/src/main/scala/spark/ui/jobs/IndexPage.scala
core/src/main/scala/spark/ui/jobs/JobProgressUI.scala
2013-07-31 19:36:36 +08:00
cdshines
fefb03cbd7
Eliminated code duplication, refactored to pattern-matching style Partitioner and PythonPartitioner
2013-07-31 13:19:42 +03:00
Dmitriy Lyubimov
96664431cb
IDEA flipped JavaSerialized import at some point to a wrong class.
2013-07-30 23:10:09 -07:00
Dmitriy Lyubimov
c219fc94fd
Minor, style
2013-07-30 22:08:39 -07:00
Dmitriy Lyubimov
f4b4b8836e
reverting back to one-by-one serialization for parallelize()
2013-07-30 19:00:58 -07:00
jerryshao
bf9318091a
Add Apache license header to metrics system
2013-07-31 09:42:16 +08:00
Reynold Xin
98024eadc3
Renamed compressionOutputStream and compressionInputStream to compressedOutputStream and compressedInputStream.
2013-07-30 18:28:46 -07:00
Dmitriy Lyubimov
abada94ebf
removing default constructor (not Externalizable any more)
2013-07-30 18:04:02 -07:00
Dmitriy Lyubimov
943c6590c9
realiging "extends" back manually
2013-07-30 18:01:35 -07:00
Dmitriy Lyubimov
ca33b12e98
resetting wrap and continuation indent = 4
2013-07-30 17:51:44 -07:00
Reynold Xin
dae12fef9e
Updated the configuration option for Snappy block size to be consistent with the documentation.
2013-07-30 17:49:31 -07:00
Dmitriy Lyubimov
984b56155a
changing approaches for parallelize(): java serialization needs to avoid writing headers!
2013-07-30 17:36:59 -07:00
Reynold Xin
311aae76a2
Added Snappy dependency to Maven build files.
2013-07-30 17:25:42 -07:00
Reynold Xin
56774b176e
Added unit test for compression codecs.
2013-07-30 17:12:33 -07:00
Reynold Xin
ad7e9d0d64
CompressionCodec cleanup. Moved it to spark.io package.
2013-07-30 17:11:54 -07:00
Dmitriy Lyubimov
ef9529a943
refactoring using writeByteBuffer() from Utils.
2013-07-30 16:24:23 -07:00
Dmitriy Lyubimov
43394b9a6d
fixing formatting
2013-07-30 16:13:41 -07:00
Dmitriy Lyubimov
13a9d66645
adding ===
2013-07-30 16:10:55 -07:00
Reynold Xin
368c58eac5
Merge branch 'lazy_file_open' of github.com:lyogavin/spark into compression
...
Conflicts:
project/SparkBuild.scala
2013-07-30 16:04:18 -07:00
Patrick Wendell
e87de037d6
Merge pull request #744 from karenfeng/bootstrap-update
...
Use Bootstrap progress bars in web UI
2013-07-30 15:00:08 -07:00
Karen Feng
26144c400f
Fixed wrap style
2013-07-30 12:40:41 -07:00
Karen Feng
218d7c4ed8
Fixed style, lowered height of progress bars
2013-07-30 12:39:17 -07:00
Karen Feng
f1cab31b73
Removed intermediate set for activeTasks, removed progress bar margin
2013-07-30 11:06:47 -07:00
Dmitriy Lyubimov
1bca91633e
+ bug fixes;
...
test added
Conflicts:
core/src/test/scala/spark/KryoSerializerSuite.scala
2013-07-30 11:04:11 -07:00
Benjamin Hindman
f6f46455eb
Added property 'spark.executor.uri' for launching on Mesos without
...
requiring Spark to be installed. Using 'make_distribution.sh' a user
can put a Spark distribution at a URI supported by Mesos (e.g.,
'hdfs://...') and then set that when launching their job. Also added
SPARK_EXECUTOR_URI for the REPL.
2013-07-29 23:32:52 -07:00
Josh Rosen
49be084ed3
Use File.pathSeparator instead of hardcoding ':'.
2013-07-29 22:08:57 -07:00
Josh Rosen
b95732632b
Do not inherit master's PYTHONPATH on workers.
...
This fixes SPARK-832, an issue where PySpark
would not work when the master and workers used
different SPARK_HOME paths.
This change may potentially break code that relied
on the master's PYTHONPATH being used on workers.
To have custom PYTHONPATH additions used on the
workers, users should set a custom PYTHONPATH in
spark-env.sh rather than setting it in the shell.
2013-07-29 22:08:57 -07:00
Andrew xia
5406013997
refactor codes less than 100 character per line
2013-07-30 11:41:38 +08:00
Andrew xia
614ee16cc4
refactor job ui with pool information
2013-07-30 10:57:26 +08:00
Dmitriy Lyubimov
8e5cd041bb
initial externalization of ParallelCollectionRDD's split
2013-07-29 19:02:53 -07:00
Reynold Xin
81720e13fc
Moved all StandaloneClusterMessage's into StandaloneClusterMessages object.
2013-07-29 17:53:01 -07:00
Reynold Xin
23b5da14ed
Moved block manager messages into BlockManagerMessages object.
2013-07-29 17:42:05 -07:00
Reynold Xin
105f4d22e9
Removed Cache and SoftReferenceCache since they are no longer used.
2013-07-29 17:30:38 -07:00
Reynold Xin
17e62113d4
Moved DeployMessage's into its own DeployMessages object.
...
Also renamed MasterState to MasterStateResponse and WorkerState to WorkerStateResponse for clarity.
2013-07-29 17:14:44 -07:00
Karen Feng
87b821dc39
Fixed continuity of executorToTasksActive, changed color of progress bars
2013-07-29 16:50:51 -07:00
Karen Feng
c7b2788948
Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update
...
Conflicts:
core/src/main/scala/spark/ui/jobs/IndexPage.scala
2013-07-29 16:36:07 -07:00
Patrick Wendell
c99b674405
Merge pull request #735 from karenfeng/ui-807
...
Totals for shuffle data and CPU time
2013-07-29 16:32:55 -07:00
Karen Feng
2d6da9195a
Alphabetized imports
2013-07-29 15:50:52 -07:00
Karen Feng
478a2886d9
Added started tasks to progress bar
2013-07-29 14:51:07 -07:00
Karen Feng
e04a37a332
Merge branch 'master' of https://github.com/mesos/spark into bootstrap-update
...
cially if it merges an updated upstream into a topic branch.
2013-07-29 14:32:48 -07:00
Reynold Xin
fe7298b587
Merge pull request #741 from pwendell/usability
...
Fix two small usability issues
2013-07-29 14:01:00 -07:00
Karen Feng
43a2cc15c0
Use Bootstrap progress bars in web UI
2013-07-29 13:37:24 -07:00
Matei Zaharia
b9d6783f36
Optimize Python take() to not compute entire first partition
2013-07-29 02:51:43 -04:00
Dmitriy Lyubimov
f5067abe85
changes per comments.
2013-07-27 23:08:00 -07:00
Karen Feng
077f2dad22
Fixed outdated bugs
2013-07-27 16:39:36 -07:00
Patrick Wendell
bcafb36c1e
Slight wording change
2013-07-27 16:03:50 -07:00
Patrick Wendell
8177165ac4
Log executor on finish
2013-07-27 16:02:06 -07:00
Patrick Wendell
c2223e6801
Improve catch scope and logging for client stop()
...
This does two things:
1. Catches the more general `TimeoutException`, since those can be thrown.
2. Logs at info level when a timeout is detected.
2013-07-27 16:02:06 -07:00
Karen Feng
5a93e3c58c
Cleaned up code based on pwendell's suggestions
2013-07-27 15:55:26 -07:00
Karen Feng
dcc4743a95
Moved val now to render
2013-07-27 12:52:53 -07:00
Karen Feng
1714693324
Current time called once with value now
2013-07-27 12:24:41 -07:00
Dmitriy Lyubimov
6a47cee721
style
2013-07-26 22:35:13 -07:00
Dmitriy Lyubimov
0c391feb73
Maximum task failures configurable
2013-07-26 22:34:43 -07:00
Dmitriy Lyubimov
23f3e0f117
mixing in SharedSparkContext for the kryo-collect test
2013-07-26 19:15:11 -07:00
Karen Feng
bd4cc52e30
Made metrics Option instead of Some, fixed NullPointerException
2013-07-26 17:23:18 -07:00
Reynold Xin
cb366774c8
Merge pull request #738 from harsha2010/pruning
...
Fix bug in Partition Pruning.
2013-07-26 16:59:30 -07:00
harshars
392d7474fd
Code review
2013-07-26 15:23:15 -07:00
harshars
72cf7ec0e5
Indentation
2013-07-26 15:16:41 -07:00
harshars
822aac8f5a
Indentation
2013-07-26 15:10:32 -07:00
harshars
743fc4e7aa
Fix Bug in Partition Pruning, index of Pruned Partitions should inherit from parent
2013-07-26 14:35:17 -07:00
Karen Feng
3fbe9eaac0
Displys shuffle read/write only if exists, wraps if statements, trims old vals, grabs current time once
2013-07-26 11:51:38 -07:00
Karen Feng
22faeab261
Split Shuffle Activity overview column for read/write
2013-07-25 17:14:18 -07:00
Karen Feng
d4bbc8bd25
Shows totals for shuffle data and CPU time in Stage, homepage overviews including active time
2013-07-25 15:59:52 -07:00
Charles Reiss
a6de90c927
For standalone mode, get JAVA_HOME, SPARK_JAVA_OPTS, SPARK_LIBRARY_PATH from application env, not worker env
2013-07-25 12:42:30 -07:00
Matei Zaharia
8eb8b52997
Fix Chill version in Maven
2013-07-25 08:58:02 -07:00
Matei Zaharia
e2421c1311
Update Chill reference in pom.xml too
2013-07-25 00:05:43 -07:00
ryanlecompte
e56aa75de0
fix wrapping
2013-07-24 22:08:09 -07:00
ryanlecompte
fc4b025314
add test
2013-07-24 20:53:15 -07:00
ryanlecompte
a1c515fb02
add copyright back in
2013-07-24 20:50:32 -07:00
ryanlecompte
8e0939f5a9
refactor Kryo serializer support to use chill/chill-java
2013-07-24 20:43:57 -07:00
Karen Feng
57009eef90
Fixed consistency of "success" status string
2013-07-24 13:43:09 -07:00
Karen Feng
4280e1768d
Removed finished status for task info, changed name of success case
2013-07-24 12:48:48 -07:00
Karen Feng
bd3931c874
Changed ifs with returns to if/else
2013-07-24 11:27:17 -07:00
Karen Feng
93c6015f82
Shows task status and running tasks on Stage Page: fixes SPARK-804 and 811
2013-07-24 10:53:02 -07:00
jerryshao
31ec72b243
Code refactor according to comments
2013-07-24 14:57:47 +08:00
jerryshao
8d1ef7f2df
Code style changes
2013-07-24 14:57:47 +08:00
Andrew xia
05637de842
Change class xxxInstrumentation to class xxxSource
2013-07-24 14:57:47 +08:00
Andrew xia
ed1a3bc206
continue to refactor code style and functions
2013-07-24 14:57:47 +08:00
jerryshao
5730193e0c
Fix some typos
2013-07-24 14:57:47 +08:00
jerryshao
a79f6077f0
Add Maven metrics library dependency and code changes
2013-07-24 14:57:47 +08:00
jerryshao
1daff54b2e
Change Executor MetricsSystem initialize code to SparkEnv
2013-07-24 14:57:47 +08:00
Andrew xia
5f8802c1fb
Register and init metricsSystem in SparkContext
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/SparkEnv.scala
2013-07-24 14:57:47 +08:00
Andrew xia
9cea0c2818
Refactor metricsSystem unit test, add resource files.
2013-07-24 14:57:47 +08:00
Andrew xia
7d2eada451
Add metrics source of DAGScheduler and blockManager
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/SparkEnv.scala
2013-07-24 14:57:47 +08:00
jerryshao
e9ac88754d
Remove twice add Source bug and code clean
2013-07-24 14:57:47 +08:00
jerryshao
e080588f73
Add metrics system unit test
2013-07-24 14:57:47 +08:00
jerryshao
5ce5dc9fcd
Add default properties to deal with no configure file situation
2013-07-24 14:57:47 +08:00