Mark Hamstra
903f3518df
fall back to filter-map-collect when calling lookup() on an RDD without a partitioner
2012-12-24 13:18:45 -08:00
Mark Hamstra
61be8566e2
Allow distinct() to be called without parentheses when using the default number of splits.
2012-12-24 02:36:47 -08:00
Reynold Xin
60f7338092
Remove the call to close input stream in Kryo serializer.
2012-12-21 15:49:33 -08:00
Matei Zaharia
3334b7c6b5
Merge pull request #341 from rxin/4a3fb06ac2d11125feb08acbbd4df76d1e91b677
...
Kryo2 update against Spark master
2012-12-21 15:31:23 -08:00
Matei Zaharia
5e51b889fe
Merge pull request #327 from rxin/spark-633
...
Added the ability in block manager to remove blocks.
2012-12-20 11:33:38 -08:00
Reynold Xin
9397c5014e
Let the slave notify the master block removal.
2012-12-20 01:37:09 -08:00
Reynold Xin
68c52d80ec
Moved BlockManager's IdGenerator into BlockManager object. Removed some
...
excessive debug messages.
2012-12-19 15:27:23 -08:00
Patrick Wendell
bfac06e1f6
SPARK-616: Logging dead workers in Web UI.
...
This patch keeps track of which workers have died and marks them
as such in the master web UI. It also handles workers which die and
re-register using different actor ID's.
2012-12-17 23:09:05 -08:00
Matei Zaharia
b82a6dd2c7
Merge pull request #332 from JoshRosen/spark-607
...
Add try-finally to handle MapOutputTracker timeouts
2012-12-14 11:41:16 -08:00
Reynold Xin
06f855c24d
Merge branch 'spark-633' of github.com:rxin/spark into spark-633
2012-12-14 00:27:24 -08:00
Reynold Xin
8c01295b85
Fixed conflicts from merging Charles' and TD's block manager changes.
2012-12-14 00:26:36 -08:00
Charles Reiss
c528932a41
Code review cleanup.
2012-12-13 22:37:16 -08:00
Charles Reiss
0aad42b5e7
Have standalone cluster report exit codes to clients. Addresses SPARK-639.
2012-12-13 22:37:16 -08:00
Reynold Xin
97434f49b8
Merged TD's block manager refactoring.
2012-12-13 22:32:19 -08:00
Reynold Xin
41e58a519a
Merge branch 'master' of github.com:mesos/spark into spark-633
2012-12-13 22:06:47 -08:00
Josh Rosen
cf52d9cade
Add try-finally to handle MapOutputTracker timeouts.
2012-12-13 21:53:30 -08:00
Matei Zaharia
05e225f988
Merge pull request #329 from woggling/executor-status-codes
...
Executor exit status codes
2012-12-13 20:14:10 -08:00
Charles Reiss
b054d3b222
ExecutorLostReason -> ExecutorLossReason
2012-12-13 18:44:07 -08:00
Charles Reiss
24d7aa2d15
Extra whitespace in ExecutorExitCode
2012-12-13 18:39:23 -08:00
Reynold Xin
dc7d7fc286
Merge branch 'master' of github.com:mesos/spark into spark-633
2012-12-13 16:48:34 -08:00
Reynold Xin
4f076e105e
SPARK-635: Pass a TaskContext object to compute() interface and use
...
that to close Hadoop input stream. Incorporated Matei's command.
2012-12-13 16:41:15 -08:00
Charles Reiss
829206f1a7
Explain slaveLost calls made by StandaloneSchedulerBackend
2012-12-13 16:23:36 -08:00
Charles Reiss
a4041dd87f
Log duplicate slaveLost() calls in ClusterScheduler.
2012-12-13 16:23:36 -08:00
Charles Reiss
fa9df4a45d
Normalize executor exit statuses and report them to the user.
2012-12-13 16:23:31 -08:00
Reynold Xin
eacb98e900
SPARK-635: Pass a TaskContext object to compute() interface and use that
...
to close Hadoop input stream.
2012-12-13 15:41:53 -08:00
Josh Rosen
7c9e3d1c21
Return success or failure in BlockStore.remove().
2012-12-13 15:22:27 -08:00
Reynold Xin
1b7a0451ed
Added the ability in block manager to remove blocks.
2012-12-13 00:04:42 -08:00
Charles Reiss
1d8e2e6cff
Call slaveLost on executor death for standalone clusters.
2012-12-12 21:15:34 -08:00
Reynold Xin
21b271f5bd
Suppress shuffle block updates when a slave node comes back.
2012-12-10 20:36:03 -08:00
Matei Zaharia
a1a2daa7ef
Merge pull request #317 from woggling/block-manager-heartbeat
...
Implement block manager heartbeat
2012-12-10 11:03:55 -08:00
Charles Reiss
b6b62d774f
Decrease BlockManagerMaster logging verbosity
2012-12-10 00:31:55 -08:00
Charles Reiss
5d3e917d09
Use Akka scheduler for BlockManager heart beats.
...
Adds required ActorSystem argument to BlockManager constructors.
2012-12-10 00:31:50 -08:00
Charles Reiss
b53dd28c90
Changed default block manager heartbeat interval to 5 s
2012-12-09 23:03:34 -08:00
Matei Zaharia
e1d7cd2276
Search for a non-loopback address in Utils.getLocalIpAddress
2012-12-08 00:33:11 -08:00
Charles Reiss
714c8d32d5
Don't divide by milliseconds by 1000 more.
2012-12-06 18:38:34 -08:00
Charles Reiss
8f0819520c
map -> foreach
2012-12-06 18:29:50 -08:00
Charles Reiss
7a033fd795
Make LocalSparkCluster use distinct IPs
2012-12-06 00:03:08 -08:00
Charles Reiss
d21ca010ac
Add block manager heart beats.
...
Renames old message called 'HeartBeat' to 'BlockUpdate'.
The BlockManager periodically sends a heart beat message to the master.
If the manager is currently not registered. The master responds to the
heart beat by indicating whether the BlockManager is currently registered
with the master. Additionally, the master now also responds to block
updates by indicating whether the BlockManager in question is registered.
When the BlockManager detects (by heart beat or failed block update)
that it stopped being registered, it reregisters and sends block
updates for all its blocks.
2012-12-05 23:35:20 -08:00
Charles Reiss
c9e54a6755
Track block managers by hostname; handle manager removal.
2012-12-05 23:35:20 -08:00
Charles Reiss
5afa2ee9e9
Actually put millis in _lastSeenMs
2012-12-05 23:35:20 -08:00
Charles Reiss
813ac71459
Don't use bogus port number in notifyADeadHost().
2012-12-05 23:35:20 -08:00
Josh Rosen
cdaa0fad51
Use external addresses in standalone WebUI on EC2.
2012-12-01 18:19:13 -08:00
Matei Zaharia
f86960cba9
Merge pull request #313 from rxin/pde_size_compress
...
Added a partition preserving flag to MapPartitionsWithSplitRDD.
2012-11-27 22:39:25 -08:00
Matei Zaharia
3ebd8e1885
Added zip to Java API
2012-11-27 22:38:09 -08:00
Matei Zaharia
27e43abd19
Added a zip() operation for RDDs with the same shape (number of
...
partitions and number of elements in each partition)
2012-11-27 22:27:47 -08:00
Matei Zaharia
f410a111ad
Merge branch 'master' of github.com:mesos/spark
2012-11-27 20:51:58 -08:00
Josh Rosen
7d71b9a56a
Fix NullPointerException caused by unregistered map outputs.
2012-11-27 20:51:51 -08:00
Matei Zaharia
935c468b71
Merge pull request #311 from woggling/map-output-npe
...
Fix NullPointerException when map output unregistered from MapOutputTracker twice
2012-11-27 20:50:48 -08:00
Reynold Xin
bd6dd1a3a6
Added a partition preserving flag to MapPartitionsWithSplitRDD.
2012-11-27 19:43:30 -08:00
Reynold Xin
f24bfd2dd1
For size compression, compress non zero values into non zero values.
2012-11-27 19:20:45 -08:00
Charles Reiss
cf79de425d
Fix NullPointerException when unregistering a map output twice.
2012-11-27 16:12:05 -08:00
Matei Zaharia
3ff6f4bdee
Merge pull request #304 from mbautin/configurable_local_ip
...
SPARK-624: make the default local IP customizable
2012-11-19 13:23:39 -08:00
mbautin
00f4e3ff9c
Addressing Matei's comment: SPARK_LOCAL_IP environment variable
2012-11-19 11:52:10 -08:00
Charles Reiss
12c24e786c
Set default uncaught exception handler to exit.
...
Among other things, should prevent OutOfMemoryErrors in some daemon threads
(such as the network manager) from causing a spark executor to enter a state
where it cannot make progress but does not report an error.
2012-11-16 20:12:31 -08:00
mbautin
1f5a7e0e64
SPARK-624: make the default local IP customizable
2012-11-15 13:57:47 -08:00
Matei Zaharia
c23a74df0a
Use DNS names instead of IP addresses in standalone mode, to allow
...
matching with data locality hints from storage systems.
2012-11-15 00:10:52 -08:00
Matei Zaharia
173e0354c0
Detect correctly when one has disconnected from a standalone cluster.
...
SPARK-617 #resolve
2012-11-11 21:06:57 -08:00
root
acf8272324
Fix K-means example a little
2012-11-10 23:07:21 -08:00
Tathagata Das
9915989bfa
Incorporated Matei's suggestions. Tested with 5 producer(consumer) threads each doing 50k puts (gets), took 15 minutes to run, no errors or deadlocks.
2012-11-09 15:46:15 -08:00
Tathagata Das
de00bc63db
Fixed deadlock in BlockManager.
...
1. Changed the lock structure of BlockManager by replacing the 337 coarse-grained locks to use BlockInfo objects as per-block fine-grained locks.
2. Changed the MemoryStore lock structure by making the block putting threads lock on a different object (not the memory store) thus making sure putting threads minimally blocks to the getting treads.
3. Added spark.storage.ThreadingTest to stress test the BlockManager using 5 block producer and 5 block consumer threads.
2012-11-09 14:09:37 -08:00
Matei Zaharia
6607f546cc
Added an option to spread out jobs in the standalone mode.
2012-11-08 23:13:12 -08:00
Matei Zaharia
66cbdee941
Fix for connections not being reused (from Josh Rosen)
2012-11-08 09:53:40 -08:00
Imran Rashid
809b2bb1fe
fix bug in getting slave id out of mesos
2012-11-08 00:34:28 -08:00
Matei Zaharia
bb1bce7924
Various fixes to standalone mode and web UI:
...
- Don't report a job as finishing multiple times
- Don't show state of workers as LOADING when they're running
- Show start and finish times in web UI
- Sort web UI tables by ID and time by default
2012-11-07 16:49:53 -08:00
Matei Zaharia
e2b8477487
Made Akka timeout and message frame size configurable, and upped the defaults
2012-11-06 15:58:05 -08:00
Shivaram Venkataraman
a7d967a1ca
Remove unnecessary hash-map put in MemoryStore
2012-11-01 10:46:38 -07:00
Josh Rosen
2ccf3b6652
Fix PySpark hash partitioning bug.
...
A Java array's hashCode is based on its object
identify, not its elements, so this was causing
serialized keys to be hashed incorrectly.
This commit adds a PySpark-specific workaround
and adds more tests.
2012-10-28 22:30:28 -07:00
root
e782187b4a
Don't throw an error in the block manager when a block is cached on the master due to
...
a locally computed operation
Conflicts:
core/src/main/scala/spark/storage/BlockManagerMaster.scala
2012-10-26 00:33:45 -07:00
Matei Zaharia
f63a40fd99
Strip leading mesos:// in URLs passed to Mesos
2012-10-24 21:52:13 -07:00
Matei Zaharia
d290e964ea
Merge pull request #281 from rxin/memreport
...
Added a method to report slave memory status; force serialize accumulator update in local mode.
2012-10-23 22:04:35 -07:00
Matei Zaharia
0bd20c63e2
Merge remote-tracking branch 'JoshRosen/shuffle_refactoring' into dev
...
Conflicts:
core/src/main/scala/spark/Dependency.scala
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/rdd/ShuffledRDD.scala
2012-10-23 22:01:45 -07:00
Josh Rosen
d4f2e5b0ef
Remove PYTHONPATH from SparkContext's executorEnvs.
...
It makes more sense to pass it in the dictionary
of environment variables that is used to construct
PythonRDD.
2012-10-22 10:28:59 -07:00
Josh Rosen
c23bf1aff4
Add PySpark README and run scripts.
2012-10-20 00:22:27 +00:00
Josh Rosen
52989c8a2c
Update Python API for v0.6.0 compatibility.
2012-10-19 10:24:49 -07:00
Josh Rosen
e21eb6e00d
Merge tag 'v0.6.0' into python-api
2012-10-19 09:44:32 -07:00
Thomas Dudziak
d9c2a89c57
Support for Hadoop 2 distributions such as cdh4
2012-10-18 16:08:54 -07:00
Reynold Xin
4a3fb06ac2
Updated Kryo to 2.20.
2012-10-16 01:10:01 -07:00
Reynold Xin
63fae9bc23
Serialize accumulator updates in TaskResult for local mode.
2012-10-15 21:38:28 -07:00
Reynold Xin
42d20fa8da
Added a method to report slave memory status.
2012-10-14 22:30:53 -07:00
Matei Zaharia
64dbf8d372
Made ShuffleDependency automatically find a shuffle ID for itself
2012-10-14 10:00:22 -07:00
Matei Zaharia
8815aeba0c
Take executor environment vars as an arguemnt to SparkContext
2012-10-13 15:31:11 -07:00
Josh Rosen
33cd3a0c12
Remove map-side combining from ShuffleMapTask.
...
This separation of concerns simplifies the
ShuffleDependency and ShuffledRDD interfaces.
Map-side combining can be performed in a
mapPartitions() call prior to shuffling the RDD.
I don't anticipate this having much of a
performance impact: in both approaches, each tuple
is hashed twice: once in the bucket partitioning
and once in the combiner's hashtable. The same
steps are being performed, but in a different
order and through one extra Iterator.
2012-10-13 14:59:20 -07:00
Josh Rosen
10bcd217d2
Remove mapSideCombine field from Aggregator.
...
Instead, the presence or absense of a ShuffleDependency's aggregator
will control whether map-side combining is performed.
2012-10-13 14:59:20 -07:00
Josh Rosen
4775c55641
Change ShuffleFetcher to return an Iterator.
2012-10-13 14:59:20 -07:00
Josh Rosen
110832e88f
Add helper methods to Aggregator.
2012-10-13 14:57:56 -07:00
Denny
0700d1920a
Protect from null env variables in mesos.
2012-10-13 13:57:59 -07:00
Denny
21047d923e
Protect from setting null environment variables.
2012-10-13 13:44:24 -07:00
Denny
fa41d50f7d
Don't use system envs for Mesos.
2012-10-13 13:15:50 -07:00
Denny
67c42a41d0
Let the user specify environment variables to be passed to the Executors.
...
Also removed unused variables in the ExecutorRunner.
2012-10-13 13:08:44 -07:00
Matei Zaharia
b4067cbad4
More doc updates, and moved Serializer to a subpackage.
2012-10-12 18:19:21 -07:00
Matei Zaharia
8d7b77bcb5
Some doc and usability improvements:
...
- Added a StorageLevels class for easy access to StorageLevel constants
in Java
- Added doc comments on Function classes in Java
- Updated Accumulator and HadoopWriter docs slightly
2012-10-12 17:53:20 -07:00
Matei Zaharia
dca496bb77
Document cartesian() operation
2012-10-12 14:46:41 -07:00
Matei Zaharia
23015ccac0
Merge pull request #271 from shivaram/block-manager-npe-fix
...
Change block manager to accept a ArrayBuffer
2012-10-12 14:36:28 -07:00
Patrick Wendell
dc8adbd359
Adding Java documentation
2012-10-11 00:49:03 -07:00
Shivaram Venkataraman
2cf40c5fd5
Change block manager to accept a ArrayBuffer instead of an iterator to ensure
...
that the computation can proceed even if we run out of memory to cache the
block. Update CacheTracker to use this new interface
2012-10-11 00:42:46 -07:00
Denny
d3f095f904
Fixed bug when fetching Jar dependencies.
...
Instead of checking currentFiles check currentJars.
2012-10-10 16:09:53 -07:00
Matei Zaharia
ee2fcb2ce6
Added documentation to all the *RDDFunction classes, and moved them into
...
the spark package to make them more visible. Also documented various
other miscellaneous things in the API.
2012-10-09 18:38:36 -07:00
Matei Zaharia
bc0bc672d0
Updates to documentation:
...
- Edited quick start and tuning guide to simplify them a little
- Simplified top menu bar
- Made private a SparkContext constructor parameter that was left as
public
- Various small fixes
2012-10-09 14:30:23 -07:00
Andy Konwinski
1d79ff6028
Fixes a typo, adds scaladoc comments to SparkContext constructors.
2012-10-08 22:49:17 -07:00
Patrick Wendell
ac310098ef
More docs in RDD class
2012-10-08 22:25:11 -07:00