Josh Rosen
e61729113d
Remove unnecessary doctest __main__ methods.
2013-02-03 21:29:40 -08:00
Josh Rosen
8fbd5380b7
Fetch fewer objects in PySpark's take() method.
2013-02-03 06:44:49 +00:00
Josh Rosen
2415c18f48
Fix reporting of PySpark doctest failures.
2013-02-03 06:44:11 +00:00
Matei Zaharia
ae26911ec0
Add back test for distinct without parens
2013-02-01 21:07:24 -08:00
Matei Zaharia
7ae4b6a23d
Merge pull request #441 from stephenh/lessnoisyakka
...
Reduce the amount of duplicate logging Akka does to stdout.
2013-02-01 21:03:37 -08:00
Stephen Haberman
12c1eb4756
Reduce the amount of duplicate logging Akka does to stdout.
...
Given we have Akka logging go through SLF4j to log4j, we don't need
all the extra noise of Akka's stdout logger that is supposedly only
used during Akka init time but seems to continue logging lots of
noisy network events that we either don't care about or are in the
log4j logs anyway.
See:
http://doc.akka.io/docs/akka/2.0/general/configuration.html
# Log level for the very basic logger activated during AkkaApplication startup
# Options: ERROR, WARNING, INFO, DEBUG
# stdout-loglevel = "WARNING"
2013-02-01 21:21:44 -06:00
Matei Zaharia
8b3041c723
Reduced the memory usage of reduce and similar operations
...
These operations used to wait for all the results to be available in an
array on the driver program before merging them. They now merge values
incrementally as they arrive.
2013-02-01 15:38:42 -08:00
Matei Zaharia
4529876db0
Merge branch 'master' of github.com:mesos/spark
2013-02-01 14:07:38 -08:00
Matei Zaharia
9970926ede
formatting
2013-02-01 14:07:34 -08:00
Matei Zaharia
79c24abe4c
Merge pull request #432 from stephenh/moreprivacy
...
Add more private declarations.
2013-02-01 14:06:55 -08:00
Matei Zaharia
de340ddf0b
Merge pull request #437 from stephenh/cancelmetacleaner
...
Stop BlockManagers metadataCleaner.
2013-02-01 12:59:25 -08:00
Matei Zaharia
0455650713
Merge pull request #439 from JoshRosen/spark-580
...
Use spark.local.dir for PySpark temp files (SPARK-580).
2013-02-01 12:07:42 -08:00
Josh Rosen
e211f405bc
Use spark.local.dir for PySpark temp files (SPARK-580).
2013-02-01 11:50:27 -08:00
Matei Zaharia
b6a6092177
Merge pull request #438 from JoshRosen/spark-674
...
Do not launch JavaGateways on workers (SPARK-674).
2013-02-01 11:29:47 -08:00
Josh Rosen
9cc6ff9c4e
Do not launch JavaGateways on workers (SPARK-674).
...
The problem was that the gateway was being initialized whenever the
pyspark.context module was loaded. The fix uses lazy initialization
that occurs only when SparkContext instances are actually constructed.
I also made the gateway and jvm variables private.
This change results in ~3-4x performance improvement when running the
PySpark unit tests.
2013-02-01 11:13:10 -08:00
Stephen Haberman
59c57e48df
Stop BlockManagers metadataCleaner.
2013-02-01 10:34:02 -06:00
Matei Zaharia
571af31304
Merge pull request #433 from rxin/master
...
Changed PartitionPruningRDD's split to make sure it returns the correct split index.
2013-02-01 00:32:41 -08:00
Matei Zaharia
5ce5efec10
Merge pull request #435 from JoshRosen/pyspark_stdout_fix
...
Fix stdout redirection in PySpark.
2013-02-01 00:32:07 -08:00
Josh Rosen
57b64d0d19
Fix stdout redirection in PySpark.
2013-02-01 00:25:19 -08:00
Reynold Xin
f9af9cee6f
Moved PruneDependency into PartitionPruningRDD.scala.
2013-02-01 00:02:46 -08:00
Matei Zaharia
7e2e046e37
Merge pull request #434 from pwendell/python-exceptions
...
SPARK-673: Capture and re-throw Python exceptions
2013-01-31 21:58:26 -08:00
Patrick Wendell
39ab83e957
Small fix from last commit
2013-01-31 21:52:52 -08:00
Patrick Wendell
c33f0ef41a
Some style cleanup
2013-01-31 21:50:02 -08:00
Matei Zaharia
95e14fbc38
Merge pull request #431 from mbautin/revert_default_profile
...
Remove activation of profiles by default
2013-01-31 21:34:59 -08:00
Patrick Wendell
3446d5c8d6
SPARK-673: Capture and re-throw Python exceptions
...
This patch alters the Python <-> executor protocol to pass on
exception data when they occur in user Python code.
2013-01-31 18:06:11 -08:00
Reynold Xin
6289d9654e
Removed the TODO comment from PartitionPruningRDD.
2013-01-31 17:49:36 -08:00
Reynold Xin
5b0fc265c2
Changed PartitionPruningRDD's split to make sure it returns the correct
...
split index.
2013-01-31 17:48:39 -08:00
Stephen Haberman
418e36caa8
Add more private declarations.
2013-01-31 17:18:33 -06:00
Mikhail Bautin
fe3eceab57
Remove activation of profiles by default
...
See the discussion at https://github.com/mesos/spark/pull/355 for why
default profile activation is a problem.
2013-01-31 13:30:41 -08:00
Matei Zaharia
55327a283e
Merge pull request #430 from pwendell/pyspark-guide
...
Minor improvements to PySpark docs
2013-01-30 15:35:29 -08:00
Patrick Wendell
3f945e3b83
Make module help available in python shell.
...
Also, adds a line in doc explaining how to use.
2013-01-30 15:04:06 -08:00
Patrick Wendell
58a7d320d7
Inclue packaging and launching pyspark in guide.
...
It's nicer if all the commands you need are made explicit.
2013-01-30 15:04:02 -08:00
Matei Zaharia
d12330bd2c
Merge pull request #426 from woggling/conn-manager-ips
...
Remember ConnectionManagerId used to initiate SendingConnections
2013-01-30 15:02:53 -08:00
Matei Zaharia
612a9fee71
Merge pull request #428 from woggling/mesos-exec-id
...
Make ExecutorIDs include SlaveIDs when running Mesos
2013-01-30 15:01:46 -08:00
Matei Zaharia
dfb721b970
Merge pull request #429 from stephenh/includemessage
...
Include message and exitStatus if availalbe.
2013-01-30 15:01:24 -08:00
Stephen Haberman
871476d506
Include message and exitStatus if availalbe.
2013-01-30 16:56:46 -06:00
Charles Reiss
252845d304
Remove remants of attempt to use slaveId-executorId in MesosExecutorBackend
2013-01-30 10:38:06 -08:00
Charles Reiss
f7de6978c1
Use Mesos ExecutorIDs to hold SlaveIDs. Then we can safely use
...
the Mesos ExecutorID as a Spark ExecutorID.
2013-01-30 09:38:57 -08:00
Charles Reiss
16a0789e10
Remember ConnectionManagerId used to initiate SendingConnections.
...
This prevents ConnectionManager from getting confused if a machine
has multiple host names and the one getHostName() finds happens
not to be the one that was passed from, e.g., the BlockManagerMaster.
2013-01-29 18:13:59 -08:00
Matei Zaharia
d54b10b6ad
Merge remote-tracking branch 'stephenh/removefailedjob'
...
Conflicts:
core/src/main/scala/spark/deploy/master/Master.scala
2013-01-29 18:12:29 -08:00
Matei Zaharia
ccb67ff2ca
Merge pull request #425 from stephenh/toDebugString
...
Add RDD.toDebugString.
2013-01-29 10:44:18 -08:00
Matei Zaharia
9ae11603b4
Merge pull request #415 from stephenh/driver
...
Replace old 'master' term with 'driver'.
2013-01-29 10:41:42 -08:00
Matei Zaharia
64ba6a8c2c
Simplify checkpointing code and RDD class a little:
...
- RDD's getDependencies and getSplits methods are now guaranteed to be
called only once, so subclasses can safely do computation in there
without worrying about caching the results.
- The management of a "splits_" variable that is cleared out when we
checkpoint an RDD is now done in the RDD class.
- A few of the RDD subclasses are simpler.
- CheckpointRDD's compute() method no longer assumes that it is given a
CheckpointRDDSplit -- it can work just as well on a split from the
original RDD, because it only looks at its index. This is important
because things like UnionRDD and ZippedRDD remember the parent's
splits as part of their own and wouldn't work on checkpointed parents.
- RDD.iterator can now reuse cached data if an RDD is computed before it
is checkpointed. It seems like it wouldn't do this before (it always
called iterator() on the CheckpointRDD, which read from HDFS).
2013-01-28 22:30:12 -08:00
Matei Zaharia
b29599e5cf
Fix code that depended on metadata cleaner interval being in minutes
2013-01-28 22:24:47 -08:00
Stephen Haberman
cbf72bffa5
Include name, if set, in RDD.toString().
2013-01-29 00:20:36 -06:00
Stephen Haberman
3cda14af3f
Add number of splits.
2013-01-29 00:12:31 -06:00
Matei Zaharia
a1ecec8d79
Merge branch 'master' of github.com:mesos/spark
2013-01-28 22:08:44 -08:00
Stephen Haberman
951cfd9ba2
Add JavaRDDLike.toDebugString().
2013-01-29 00:02:17 -06:00
Matei Zaharia
f6eb1f0825
Merge pull request #413 from pwendell/stage-logging
...
SPARK-658: Adding logging of stage duration
2013-01-28 22:01:52 -08:00
Stephen Haberman
b45857c965
Add RDD.toDebugString.
...
Original idea by Nathan Kronenfeld.
2013-01-28 23:56:56 -06:00