Denny
18a1faedf6
Stylistic changes and Public Accumulable and Broadcast
2012-10-02 19:28:37 -07:00
Denny
b7a913e1fa
Make dependency classes public - used by spark
2012-10-02 19:04:23 -07:00
Denny
4d9f4b01af
Make classes package private
2012-10-02 19:00:19 -07:00
Matei Zaharia
22684653a5
Revert "Place Spray repo ahead of Cloudera in Maven search path"
...
This reverts commit 42e0a68082
.
2012-10-02 12:01:32 -07:00
Matei Zaharia
42e0a68082
Place Spray repo ahead of Cloudera in Maven search path
2012-10-02 11:37:19 -07:00
Matei Zaharia
b9fb8d6463
Include date in folder name for Spark local dir.
2012-10-01 15:55:16 -07:00
Matei Zaharia
bc881e4798
Merge branch 'dev' of github.com:mesos/spark into dev
2012-10-01 15:21:56 -07:00
Matei Zaharia
802aa8aef9
Some bug fixes and logging fixes for broadcast.
2012-10-01 15:20:42 -07:00
Reynold Xin
f264153162
Fixed #232 : DirectBuffer's cleaner was empty and Spark tried to invoke
...
clean on it.
2012-10-01 14:07:34 -07:00
Matei Zaharia
3b348f909d
Improve log messages from BlockManager
2012-10-01 12:01:38 -07:00
Matei Zaharia
53f90d0f0e
Use underscores instead of colons in RDD IDs
2012-10-01 10:48:53 -07:00
Matei Zaharia
2314132d57
Added a (failing) test for LRU with MEMORY_AND_DISK.
2012-09-30 22:52:16 -07:00
Matei Zaharia
3128c57f90
Simplified Class / ClassLoader test
2012-09-30 21:48:27 -07:00
Matei Zaharia
83143f9a5f
Fixed several bugs that caused weird behavior with files in spark-shell:
...
- SizeEstimator was following through a ClassLoader field of Hadoop
JobConfs, which referenced the whole interpreter, Scala compiler, etc.
Chaos ensued, giving an estimated size in the tens of gigabytes.
- Broadcast variables in local mode were only stored as MEMORY_ONLY and
never made accessible over a server, so they fell out of the cache when
they were deemed too large and couldn't be reloaded.
2012-09-30 21:19:39 -07:00
Matei Zaharia
fd0374b9de
Comment
2012-09-29 21:43:06 -07:00
Matei Zaharia
5718cef2a4
Removed Logging trait from CoalescedRDD since we don't log anything
2012-09-29 21:40:43 -07:00
Matei Zaharia
143ef4f90d
Added a CoalescedRDD class for reducing the number of partitions in an RDD.
2012-09-29 21:30:52 -07:00
Matei Zaharia
ebd52347b5
Merge branch 'dev' of github.com:mesos/spark into dev
2012-09-29 20:22:31 -07:00
Matei Zaharia
9b326d01e9
Made BlockManager unmap memory-mapped files when necessary to reduce the
...
number of open files. Also optimized sending of disk-based blocks.
2012-09-29 20:21:54 -07:00
Matei Zaharia
2f11e3c285
Merge pull request #227 from JoshRosen/fix/distinct_numsplits
...
Allow controlling number of splits in distinct().
2012-09-28 23:57:24 -07:00
Josh Rosen
8654165e69
Use null as dummy value in distinct().
2012-09-28 23:55:17 -07:00
Josh Rosen
37c199bbb0
Allow controlling number of splits in distinct().
2012-09-28 23:44:19 -07:00
Matei Zaharia
56dcad5936
Don't create a Cache in SparkEnv because we don't use it
2012-09-28 23:40:56 -07:00
Matei Zaharia
1d44644f4f
Logging tweaks
2012-09-28 23:28:16 -07:00
Matei Zaharia
815d6bd69a
Renamed subdirs option
2012-09-28 19:02:41 -07:00
Matei Zaharia
e54e1d7043
Made subdirs per local dir configurable, and reduced lock usage a bit
2012-09-28 19:00:50 -07:00
Matei Zaharia
ae8c7d6cfa
Made disk store use multiple directories, deleted ShuffleManager
2012-09-28 18:28:13 -07:00
Matei Zaharia
3d7267999d
Print and track user call sites in more places in Spark
2012-09-28 17:42:00 -07:00
Matei Zaharia
9f6efbf06a
Merge pull request #225 from pwendell/dev
...
Log message which records RDD origin
2012-09-28 16:28:07 -07:00
Matei Zaharia
0121a26bd1
Changed the way tasks' dependency files are sent to workers so that
...
custom serializers or Kryo registrators can be loaded.
2012-09-28 16:14:05 -07:00
Patrick Wendell
9fc78f8f29
Fixing some whitespace issues
2012-09-28 16:05:50 -07:00
Patrick Wendell
bc909c2903
Changes based on Matei's comments
2012-09-28 16:04:36 -07:00
Patrick Wendell
c387e40fb1
Log message which records RDD origin
...
This adds tracking to determine the "origin" of an RDD. Origin is defined by
the boundary between the user's code and the spark code, during an RDD's
instantiation. It is meant to help users understand where a Spark RDD is
coming from in their code.
This patch also logs origin data when stages are submitted to the scheduler.
Finally, it adds a new log message to fix an inconsitency in the way that
dependent stages (those missing parents) and independent stages (those
without) are logged during submission.
2012-09-28 15:51:46 -07:00
Matei Zaharia
2a8bfbca00
Fixed a bug where isLocal was set to false when using local[K]
2012-09-28 14:50:54 -07:00
Matei Zaharia
4a138403ef
Fix a bug in JAR fetcher that made it always fetch the JAR
2012-09-27 21:32:06 -07:00
Matei Zaharia
009b0e37e7
Added an option to compress blocks in the block store
2012-09-27 18:45:44 -07:00
Matei Zaharia
7bcb08cef5
Renamed storage levels to something cleaner; fixes #223 .
2012-09-27 17:50:59 -07:00
Matei Zaharia
920fab23c3
Merge pull request #222 from rxin/dev
...
Added MapPartitionsWithSplitRDD.
2012-09-26 23:16:45 -07:00
Matei Zaharia
ea05fc130b
Updates to standalone cluster, web UI and deploy docs.
2012-09-26 22:54:39 -07:00
Matei Zaharia
1ef4f0fbd2
Allow controlling number of splits in sortByKey.
2012-09-26 19:18:47 -07:00
Reynold Xin
1ad1331a34
Added MapPartitionsWithSplitRDD.
2012-09-26 17:11:28 -07:00
Matei Zaharia
ee71fa49c1
Look for Kryo registrator using context class loader
2012-09-26 14:15:16 -07:00
Matei Zaharia
d71a358c46
Fixed a test that was getting extremely lucky before, and increased the
...
number of samples used for sorting
2012-09-26 00:25:34 -07:00
Matei Zaharia
051785c7e6
Several fixes to sampling issues pointed out by Henry Milner:
...
- takeSample was biased towards earlier partitions
- There were some range errors in takeSample
- SampledRDDs with replacement didn't produce appropriate counts
across partitions (we took exactly frac of each one)
2012-09-25 21:46:58 -07:00
Matei Zaharia
4d3339a3ec
Merge pull request #217 from rxin/dev
...
Added a method to RDD to expose the ClassManifest.
2012-09-24 23:52:32 -07:00
Reynold Xin
7a4cd92861
Renamed RDD.manifest to RDD.elementClassManifest
2012-09-24 23:42:33 -07:00
Matei Zaharia
296e24b440
Merge pull request #218 from rnpandya/dev
...
Scripts to start Spark under windows
2012-09-24 21:10:31 -07:00
Reynold Xin
348bcbca1f
Added a method to RDD to expose the ClassManifest.
2012-09-24 16:56:27 -07:00
Ravi Pandya
39215357af
Windows command scripts for sbt and run
2012-09-24 15:43:19 -07:00
Matei Zaharia
6eeb379cf8
Fix some test issues
2012-09-24 15:39:58 -07:00