Commit graph

247 commits

Author SHA1 Message Date
Matei Zaharia 6ae2746d1e Handle arrays that contain the same element many times better in
SizeEstimator. Also added a test for SizeEstimator. Fixes #136.
2012-06-06 16:13:02 -07:00
Matei Zaharia 0a617958d1 Some refactoring to make BoundedMemoryCache test similar to others 2012-06-06 16:12:08 -07:00
Matei Zaharia e141f644ca Merge pull request #132 from Benky/rb-first-iteration
Little refactoring and unit tests for CacheTrackerActor
2012-05-26 13:15:06 -07:00
Richard Benkovsky ae64920337 MesosScheduler refactoring 2012-05-22 11:04:54 +02:00
Richard Benkovsky 3a1bcd4028 Added tests for CacheTrackerActor 2012-05-22 11:04:54 +02:00
Richard Benkovsky 8f2f736d53 Little refactoring 2012-05-22 11:04:54 +02:00
Richard Benkovsky 518506a7c5 Added tests for Utils.copyStream 2012-05-22 11:04:51 +02:00
Richard Benkovsky f162fc2beb Formating fixed 2012-05-22 09:45:38 +02:00
Richard Benkovsky 565245871f BoundedMemoryCache.put fails when estimated size of 'value' is larger than cache capacity 2012-05-20 22:13:35 +02:00
Richard Benkovsky 822a4be37d Utils.memoryBytesToString fixed 2012-05-19 15:13:20 +02:00
Reynold Xin d0c6e9f639 Made some RDD dependencies transient to reduce the amount of data needed
to be serialized in closure serialization. This can significantly reduce
the task setup time in Shark when the query involves a large number of
(Hive) partitions.
2012-05-16 14:16:55 -07:00
Reynold Xin 16461e2eda Updated Cache's put method to use a case class for response. Previously
it was pretty ugly that put() should return -1 for failures.
2012-05-15 00:31:52 -07:00
Reynold Xin 019e48833f Added the capacity to report cache usage status back to the cache
trackor. This is essential for building a dashboard to see the status of
caches on all slaves.
2012-05-14 18:39:04 -07:00
Matei Zaharia f48742683a Made caches dataset-aware so that they won't cyclically evict partitions
from the same dataset.
2012-05-06 20:14:40 -07:00
Matei Zaharia 32a4f4623c Merge pull request #129 from mesos/rxin
Force serialize/deserialize task results in local execution mode.
2012-04-24 16:18:39 -07:00
Reynold Xin 761ea65a98 Added a test for the previous commit (failing to serialize task results
would throw an exception for local tasks).
2012-04-24 15:14:35 -07:00
Reynold Xin 9821cd4d42 Force serialize/deserialize task results in local execution mode. 2012-04-24 14:55:28 -07:00
Antonio 3e48818993 Removed commented-out System.exit call 2012-04-23 11:42:58 -07:00
Antonio 39d99168dc Added exception handling instead of just exiting in LocalScheduler for tasks that throw exceptions 2012-04-20 14:46:43 -07:00
Reynold Xin e601b3b9e5 Added the ability to set environmental variables in piped rdd. 2012-04-17 16:40:56 -07:00
Matei Zaharia 3b745176e0 Bug fix to pluggable closure serialization change 2012-04-12 17:53:02 +00:00
Matei Zaharia 112655f032 Merge pull request #121 from rxin/kryo-closure
Added an option (spark.closure.serializer) to specify the serializer for closures.
2012-04-10 14:21:02 -07:00
Reynold Xin d295ccb43c Added a closureSerializer field in SparkEnv and use it to serialize
tasks.
2012-04-10 13:29:46 -07:00
Reynold Xin 968f75f6af Added an option (spark.closure.serializer) to specify the serializer for
closures. This enables using Kryo as the closure serializer.
2012-04-09 21:59:56 -07:00
Matei Zaharia a633974143 Merge branch 'master' of github.com:mesos/spark 2012-04-08 23:41:25 -07:00
Matei Zaharia d401e1b3e8 Fix a possible deadlock in MesosScheduler 2012-04-08 23:38:49 -07:00
Ankur Dave 7be1c7b331 Report entry dropping in BoundedMemoryCache 2012-04-06 15:49:32 -07:00
Matei Zaharia 816d4e5840 Pass local IP address instead of hostname in spark.master.host. Fixes #117. 2012-04-05 14:53:17 -07:00
Matei Zaharia 335a6036ad Converted some tabs to spaces 2012-04-05 11:58:01 -07:00
Matei Zaharia 8c95a85438 Use Runtime.maxMemory instead of Runtime.totalMemory in
BoundedMemoryCache, in case the JVM was not started with its initial
heap size equaling its maximum one (-Xms == -Xmx).
2012-03-30 13:39:35 -04:00
Reynold Xin 42dcdbcb2f Removed the extra spaces in OrderedRDDFunctions and SortedRDD. 2012-03-29 15:21:57 -07:00
Matei Zaharia c7af538ac1 Some fixes to sorting for when the RDD has fewer elements than the
number of partitions we ask to partition it into. Also, removed a test
that was taking way too long to run.
2012-03-17 13:08:36 -07:00
Matei Zaharia a5e2b6a6bd Merge pull request #112 from cengle/master
Changed HadoopRDD to get key and value containers from the RecordReader instead of through reflection
2012-03-06 13:38:32 -08:00
Matei Zaharia 97eee50825 Fixes a nasty bug that could happen when tasks fail, because calling
wait() with a timeout of 0 on a Java object means "wait forever".
2012-03-01 13:43:17 -08:00
Cliff Engle dd68cb6099 Get key and value container from RecordReader 2012-02-29 16:33:23 -08:00
Matei Zaharia 1e10df0a46 Merge pull request #111 from alupher/master
Adding sorting to RDDs
2012-02-24 15:50:14 -08:00
Antonio 0d93d95bcf Removed unnecessary import 2012-02-21 19:57:12 -08:00
Antonio 2990298f71 Added sorting testing suite 2012-02-21 19:54:21 -08:00
Matei Zaharia aa04f87cd2 Added support for parallel execution of jobs in DAGScheduler. 2012-02-19 22:50:23 -08:00
Antonio 620798161b Added fixes to sorting 2012-02-13 00:07:39 -08:00
Matei Zaharia 2587ce1690 Fixed a deadlock that occured with MesosScheduler due to an earlier
synchronization change
2012-02-11 21:22:45 -08:00
Antonio e93f622665 Added sorting by key for pair RDDs 2012-02-11 00:56:28 -08:00
Matei Zaharia 98f008b721 Formatting fixes 2012-02-10 10:52:03 -08:00
Matei Zaharia 7660a8b12f Merge branch 'formatting'
Conflicts:
	core/src/main/scala/spark/DAGScheduler.scala
	core/src/main/scala/spark/SimpleShuffleFetcher.scala
	core/src/main/scala/spark/SparkContext.scala
2012-02-10 10:42:14 -08:00
haoyuan 194c42ab79 Code format. 2012-02-10 08:19:53 -08:00
Matei Zaharia 8f5ed51234 Delete Spark's temporary directories when the JVM exits. 2012-02-09 22:58:24 -08:00
Matei Zaharia c0a0df3285 Made the default cache BoundedMemoryCache, and reduced its default size 2012-02-09 22:32:02 -08:00
Matei Zaharia a766780f4c Added some tests for multithreaded access to Spark. 2012-02-09 22:27:53 -08:00
Matei Zaharia 0e93891d3d Replaced LocalFileShuffle with a non-singleton ShuffleManager class
and made DAGScheduler automatically set SparkEnv.
2012-02-09 22:14:56 -08:00
haoyuan 445e0bb1b5 Format the code a bit mroe. 2012-02-09 15:50:26 -08:00