Matei Zaharia
6cf5dffc72
Make more stuff private[spark]
2012-10-02 22:28:55 -07:00
Matei Zaharia
626f701931
Merge pull request #240 from dennybritz/private_classes
...
Package-Private Classes
2012-10-02 21:24:32 -07:00
Denny
0361353a70
Make Java API abstract wrapped functions private
2012-10-02 20:02:53 -07:00
Denny
b9badcd5bd
accidentially removed trait
2012-10-02 19:35:07 -07:00
Denny
18a1faedf6
Stylistic changes and Public Accumulable and Broadcast
2012-10-02 19:28:37 -07:00
Denny
b7a913e1fa
Make dependency classes public - used by spark
2012-10-02 19:04:23 -07:00
Denny
4d9f4b01af
Make classes package private
2012-10-02 19:00:19 -07:00
Matei Zaharia
97cbd699d7
Merge branch 'dev' of github.com:mesos/spark into dev
2012-10-02 17:31:01 -07:00
Matei Zaharia
6098f7e87a
Fixed cache replacement behavior of BlockManager:
...
- Partitions that get dropped to disk will now be loaded back into RAM
after they're accessed again
- Same-RDD rule for cache replacement is now implemented (don't drop
partitions from an RDD to make room for other partitions from itself)
- Items stored as MEMORY_AND_DISK go into memory only first, instead of
being eagerly written out to disk
- MemoryStore.ensureFreeSpace is called within a lock on the writer
thread to prevent race conditions (this can still be optimized to
allow multiple concurrent calls to it but it's a start)
- MemoryStore does not accept blocks larger than its limit
2012-10-02 17:25:38 -07:00
Reynold Xin
7997585616
Added a check to make sure SPARK_MEM <= memoryPerSlave for local cluster
...
mode.
2012-10-02 15:45:25 -07:00
Reynold Xin
0898a21b95
Merge branch 'dev' of https://github.com/mesos/spark into dev
2012-10-02 13:08:01 -07:00
Matei Zaharia
22684653a5
Revert "Place Spray repo ahead of Cloudera in Maven search path"
...
This reverts commit 42e0a68082
.
2012-10-02 12:01:32 -07:00
Reynold Xin
b8cd681169
Allow whitespaces in cluster URL configuration for local cluster.
2012-10-02 11:52:12 -07:00
Matei Zaharia
42e0a68082
Place Spray repo ahead of Cloudera in Maven search path
2012-10-02 11:37:19 -07:00
Matei Zaharia
b9fb8d6463
Include date in folder name for Spark local dir.
2012-10-01 15:55:16 -07:00
Matei Zaharia
bc881e4798
Merge branch 'dev' of github.com:mesos/spark into dev
2012-10-01 15:21:56 -07:00
Matei Zaharia
802aa8aef9
Some bug fixes and logging fixes for broadcast.
2012-10-01 15:20:42 -07:00
Reynold Xin
f264153162
Fixed #232 : DirectBuffer's cleaner was empty and Spark tried to invoke
...
clean on it.
2012-10-01 14:07:34 -07:00
Matei Zaharia
3b348f909d
Improve log messages from BlockManager
2012-10-01 12:01:38 -07:00
Matei Zaharia
53f90d0f0e
Use underscores instead of colons in RDD IDs
2012-10-01 10:48:53 -07:00
Matei Zaharia
2314132d57
Added a (failing) test for LRU with MEMORY_AND_DISK.
2012-09-30 22:52:16 -07:00
Matei Zaharia
3128c57f90
Simplified Class / ClassLoader test
2012-09-30 21:48:27 -07:00
Matei Zaharia
83143f9a5f
Fixed several bugs that caused weird behavior with files in spark-shell:
...
- SizeEstimator was following through a ClassLoader field of Hadoop
JobConfs, which referenced the whole interpreter, Scala compiler, etc.
Chaos ensued, giving an estimated size in the tens of gigabytes.
- Broadcast variables in local mode were only stored as MEMORY_ONLY and
never made accessible over a server, so they fell out of the cache when
they were deemed too large and couldn't be reloaded.
2012-09-30 21:19:39 -07:00
Matei Zaharia
fd0374b9de
Comment
2012-09-29 21:43:06 -07:00
Matei Zaharia
5718cef2a4
Removed Logging trait from CoalescedRDD since we don't log anything
2012-09-29 21:40:43 -07:00
Matei Zaharia
143ef4f90d
Added a CoalescedRDD class for reducing the number of partitions in an RDD.
2012-09-29 21:30:52 -07:00
Matei Zaharia
ebd52347b5
Merge branch 'dev' of github.com:mesos/spark into dev
2012-09-29 20:22:31 -07:00
Matei Zaharia
9b326d01e9
Made BlockManager unmap memory-mapped files when necessary to reduce the
...
number of open files. Also optimized sending of disk-based blocks.
2012-09-29 20:21:54 -07:00
Matei Zaharia
2f11e3c285
Merge pull request #227 from JoshRosen/fix/distinct_numsplits
...
Allow controlling number of splits in distinct().
2012-09-28 23:57:24 -07:00
Josh Rosen
8654165e69
Use null as dummy value in distinct().
2012-09-28 23:55:17 -07:00
Josh Rosen
37c199bbb0
Allow controlling number of splits in distinct().
2012-09-28 23:44:19 -07:00
Matei Zaharia
56dcad5936
Don't create a Cache in SparkEnv because we don't use it
2012-09-28 23:40:56 -07:00
Matei Zaharia
1d44644f4f
Logging tweaks
2012-09-28 23:28:16 -07:00
Matei Zaharia
815d6bd69a
Renamed subdirs option
2012-09-28 19:02:41 -07:00
Matei Zaharia
e54e1d7043
Made subdirs per local dir configurable, and reduced lock usage a bit
2012-09-28 19:00:50 -07:00
Matei Zaharia
ae8c7d6cfa
Made disk store use multiple directories, deleted ShuffleManager
2012-09-28 18:28:13 -07:00
Matei Zaharia
3d7267999d
Print and track user call sites in more places in Spark
2012-09-28 17:42:00 -07:00
Matei Zaharia
9f6efbf06a
Merge pull request #225 from pwendell/dev
...
Log message which records RDD origin
2012-09-28 16:28:07 -07:00
Matei Zaharia
0121a26bd1
Changed the way tasks' dependency files are sent to workers so that
...
custom serializers or Kryo registrators can be loaded.
2012-09-28 16:14:05 -07:00
Patrick Wendell
9fc78f8f29
Fixing some whitespace issues
2012-09-28 16:05:50 -07:00
Patrick Wendell
bc909c2903
Changes based on Matei's comments
2012-09-28 16:04:36 -07:00
Patrick Wendell
c387e40fb1
Log message which records RDD origin
...
This adds tracking to determine the "origin" of an RDD. Origin is defined by
the boundary between the user's code and the spark code, during an RDD's
instantiation. It is meant to help users understand where a Spark RDD is
coming from in their code.
This patch also logs origin data when stages are submitted to the scheduler.
Finally, it adds a new log message to fix an inconsitency in the way that
dependent stages (those missing parents) and independent stages (those
without) are logged during submission.
2012-09-28 15:51:46 -07:00
Matei Zaharia
2a8bfbca00
Fixed a bug where isLocal was set to false when using local[K]
2012-09-28 14:50:54 -07:00
Matei Zaharia
4a138403ef
Fix a bug in JAR fetcher that made it always fetch the JAR
2012-09-27 21:32:06 -07:00
Matei Zaharia
009b0e37e7
Added an option to compress blocks in the block store
2012-09-27 18:45:44 -07:00
Matei Zaharia
7bcb08cef5
Renamed storage levels to something cleaner; fixes #223 .
2012-09-27 17:50:59 -07:00
Matei Zaharia
920fab23c3
Merge pull request #222 from rxin/dev
...
Added MapPartitionsWithSplitRDD.
2012-09-26 23:16:45 -07:00
Matei Zaharia
ea05fc130b
Updates to standalone cluster, web UI and deploy docs.
2012-09-26 22:54:39 -07:00
Matei Zaharia
1ef4f0fbd2
Allow controlling number of splits in sortByKey.
2012-09-26 19:18:47 -07:00
Reynold Xin
1ad1331a34
Added MapPartitionsWithSplitRDD.
2012-09-26 17:11:28 -07:00