Justin Ma
156bccbe23
HdfsFile.scala: added a try/catch block to exit gracefully for correupted gzip files
...
MesosScheduler.scala: formatted the slaveOffer() output to include the serialized task size
RDD.scala: added support for aggregating RDDs on a per-split basis
(aggregateSplit()) as well as for sampling without replacement (sample())
2010-08-18 15:25:57 -07:00
Matei Zaharia
75b2ca10c3
Removed HOD from included Hadoop because it was making the project count
...
as Python on GitHub :|.
2010-08-16 23:16:35 -07:00
Matei Zaharia
1cbffaae6f
Modified Scala interpreter to have it avoid computing string versions of
...
all results when :silent is enabled, so that it is easier to work with
large arrays in Spark. (The string version of an array of numbers might
not fit in memory even though the array itself does.)
2010-08-15 18:33:27 -07:00
Matei Zaharia
1600c31554
Added latest mesos.jar
2010-08-13 19:03:46 -07:00
Matei Zaharia
0b195927b6
Improved README and added blank templates for config files.
2010-08-13 18:54:32 -07:00
Matei Zaharia
3d8d7fd557
Bug fix from Justin
2010-08-13 11:29:19 -07:00
root
a9481c3514
Update to work with latest Mesos API changes
2010-08-13 07:39:36 +00:00
Matei Zaharia
4488b3bc8a
Fixed a bug where we would incorrectly decide we've finished a parallel operation if Mesos tells us a task is finished twice
2010-08-09 16:46:14 -07:00
Matei Zaharia
f415b071af
Change shell framework's name to "Spark shell"
2010-08-06 12:07:26 -07:00
Matei Zaharia
0e6e577fdf
Add Mesos native library to .gitignore
2010-07-25 23:54:56 -04:00
Matei Zaharia
b56ed67553
Updated code to work with Nexus->Mesos name change
2010-07-25 23:53:46 -04:00
Matei Zaharia
4239f76997
Removed Matei's old start on broadcast code
2010-07-25 23:46:44 -04:00
Matei Zaharia
e240e38ee9
Updated a bunch of libraries, and increased the default memory in run so
...
that unit tests can run successfully.
2010-07-25 21:10:03 -04:00
Matei Zaharia
0435de9e87
Made it possible to set various Spark options and environment variables
...
in general through a conf/spark-env.sh script.
2010-07-19 18:00:30 -07:00
Justin Ma
edad598684
Updated Spark to run with latest Mesos build and Scala-2.8.0.final.
2010-07-19 15:03:49 -07:00
Matei Zaharia
0da5b00d6e
Merge branch 'master' into multi-tracker
...
Conflicts:
Makefile
run
src/scala/spark/Broadcast.scala
src/scala/spark/HdfsFile.scala
src/scala/spark/NexusScheduler.scala
src/scala/spark/SparkContext.scala
src/test/spark/repl/ReplSuite.scala
third_party/nexus.jar
2010-06-27 22:25:56 -07:00
Matei Zaharia
7d0eae17e3
Merge branch 'dev'
...
Conflicts:
src/scala/spark/HdfsFile.scala
src/scala/spark/NexusScheduler.scala
src/test/spark/repl/ReplSuite.scala
2010-06-27 15:21:54 -07:00
root
6aacaa6870
Made Spark shell class directory configurable.
2010-06-18 23:24:18 +00:00
Matei Zaharia
323571a177
Initial work on union operation.
2010-06-18 12:54:33 -07:00
Matei Zaharia
b54198819e
Added appropriate hashCode, equals and toString to ParallelArraySplit.
2010-06-17 13:19:02 -07:00
Matei Zaharia
cd247b7d86
Created common RDD superclass for distributed files and parallel arrays.
...
This also means that parallel arrays now get all the functionality files
used to have (filter, map, reduce, cache, etc).
2010-06-17 12:49:42 -07:00
Matei Zaharia
77103eab2a
Fixed README
2010-06-11 14:55:23 -07:00
Matei Zaharia
0d9c51d473
Added back REPL tests
2010-06-11 10:03:01 -07:00
Matei Zaharia
e58fba279d
Fix junk stripper
2010-06-11 01:18:43 -07:00
Matei Zaharia
396f48e5a4
New interpreter port for Scala 2.8 interpreter
2010-06-11 01:10:03 -07:00
Matei Zaharia
4eb39e0c8a
New nexus.jar
2010-06-10 22:41:23 -07:00
Matei Zaharia
1473987fb7
Fixed classpath for tests
2010-06-10 22:36:45 -07:00
Matei Zaharia
359e84c585
Use new Nexus API
2010-06-10 22:09:13 -07:00
Matei Zaharia
92246c843b
Initial work on 2.8 port
2010-06-10 21:50:55 -07:00
Matei Zaharia
c177a546a5
Ignore .DS_Store
2010-06-10 18:08:59 -07:00
Mosharaf Chowdhury
7ab703117a
Added timers around BroadcastTest and sendBroadcast.
...
Turned OFF saving to HDFS for now for stress tests.
pqOfSources is ordered by least leechers again.
2010-05-18 20:49:59 -07:00
Mosharaf Chowdhury
2d381c974e
Added flush calls after all writeObject calls as well as after creating every
...
ObjectOutputStream object.
2010-05-16 17:06:05 -07:00
Mosharaf Chowdhury
e85bb3f04d
All ObjectOutputStream objects created before OInputS objects.
2010-05-16 16:24:48 -07:00
Mosharaf Chowdhury
520b594bdf
"if (!local) { sendBroadcast }" must be called after all the variables have
...
been created /initialized
2010-05-16 16:20:00 -07:00
Mosharaf Chowdhury
4f0b7eb02d
SplitStream was not working in EC2.
...
We have turned OFF SSB for now.
2010-05-16 13:00:47 -07:00
Mosharaf Chowdhury
53a2367c9c
- SplitStream working in local machine for single variable broadcast
...
- Removed delays before publishing and receiving.
- Commented out some prints.
2010-05-04 15:32:30 -07:00
root
1c90a32621
Fix native build to use build directory
2010-04-30 22:41:21 +00:00
Mosharaf Chowdhury
d0a92571dd
- Should work, but not tested yet.
...
- Right now, each variable has to come one after another. Within a single
variable, blocks can come out-of-order.
2010-04-21 19:48:25 -07:00
Mosharaf Chowdhury
c0117f9473
Added flesh to publish/deliver functions.
2010-04-21 02:01:34 -07:00
Mosharaf Chowdhury
e2f21279be
Moved SplitStreamClient inside the BroadcastSS object with the decision that
...
there should be only a single SSClient for the whole Spark program instead of
one for each broadcasted variable.
It's still working well though.
2010-04-20 19:16:27 -07:00
Mosharaf Chowdhury
d2f1d0151a
SplitStream integration in progress.
2010-04-20 02:08:48 -07:00
Mosharaf Chowdhury
1c1ac3161d
More porting of SplitStream code.
2010-04-19 20:32:17 -07:00
Mosharaf Chowdhury
dc2c69e659
SplitStream implementation in progress.
2010-04-19 00:14:53 -07:00
Mosharaf Chowdhury
bb0178d1e4
- Receiving retry now starts from where the last try left off, not from the very
...
beginning.
- Some refactoring.
2010-04-15 22:41:44 -07:00
Mosharaf Chowdhury
ee6c524fdf
Fixed some bugs in speed-based PQ-ing.
2010-04-15 21:39:57 -07:00
Mosharaf Chowdhury
c6962f516e
Several things, but the most important one is that now we are using node speed
...
to select source instead of leecher count.
2010-04-14 21:19:32 -07:00
Mosharaf Chowdhury
e0db4e0482
- HDFS storing is in separate thread.
...
- Receivers now ask for a range instead of expecting the whole variable. But,
they are still asking for the whole range from a single source.
- Next step: make receivers ask for different parts from different sources.
Also, make sure that Master sends back a list of sources instead of a single
one.
2010-04-04 13:50:25 -07:00
Matei Zaharia
10cf3828ad
Imported Mosharaf's multi-tracker branch
2010-04-03 23:50:04 -07:00
Matei Zaharia
06aac8a889
Imported changes from old repository (mostly Mosharaf's work,
...
plus some fault tolerance code).
2010-04-03 23:44:55 -07:00
Matei Zaharia
df29d0ea4c
Initial commit
2010-03-29 16:17:55 -07:00