Commit graph

6105 commits

Author SHA1 Message Date
Henry Saputra 93a65e5fde Remove simple redundant return statement for Scala methods/functions:
-) Only change simple return statements at the end of method
-) Ignore the complex if-else check
-) Ignore the ones inside synchronized
2014-01-12 10:30:04 -08:00
Tathagata Das c5921e5c61 Fixed bugs. 2014-01-12 01:12:08 -08:00
Matei Zaharia 224f1a754a Update Python required version to 2.7, and mention MLlib support 2014-01-12 00:15:34 -08:00
Matei Zaharia 5741078c46 Log Python exceptions to stderr as well
This helps in case the exception happened while serializing a record to
be sent to Java, leaving the stream to Java in an inconsistent state
where PythonRDD won't be able to read the error.
2014-01-12 00:10:41 -08:00
Tathagata Das 18f4889d96 Merge remote-tracking branch 'apache/master' into error-handling 2014-01-11 23:40:57 -08:00
Tathagata Das 4d9b0ab420 Added waitForStop and stop to JavaStreamingContext. 2014-01-11 23:35:51 -08:00
Tathagata Das f5108ffc24 Converted JobScheduler to use actors for event handling. Changed protected[streaming] to private[streaming] in StreamingContext and DStream. Added waitForStop to StreamingContext, and StreamingContextSuite. 2014-01-11 23:15:09 -08:00
Matei Zaharia f00e949f84 Added Java unit test, data, and main method for Naive Bayes
Also fixes mains of a few other algorithms to print the final model
2014-01-11 22:30:48 -08:00
Matei Zaharia 4c28a2bad8 Update some Python MLlib parameters to use camelCase, and tweak docs
We've used camel case in other Spark methods so it felt reasonable to
keep using it here and make the code match Scala/Java as much as
possible. Note that parameter names matter in Python because it allows
passing optional parameters by name.
2014-01-11 22:30:48 -08:00
Matei Zaharia 9a0dfdf868 Add Naive Bayes to Python MLlib, and some API fixes
- Added a Python wrapper for Naive Bayes
- Updated the Scala Naive Bayes to match the style of our other
  algorithms better and in particular make it easier to call from Java
  (added builder pattern, removed default value in train method)
- Updated Python MLlib functions to not require a SparkContext; we can
  get that from the RDD the user gives
- Added a toString method in LabeledPoint
- Made the Python MLlib tests run as part of run-tests as well (before
  they could only be run individually through each file)
2014-01-11 22:30:48 -08:00
Reynold Xin 288a878999 Merge pull request #389 from rxin/clone-writables
Minor update for clone writables and more documentation.
2014-01-11 21:53:19 -08:00
Reynold Xin dbc11df411 Merge pull request #388 from pwendell/master
Fix UI bug introduced in #244.

The 'duration' field was incorrectly renamed to 'task time' in the table that
lists stages.
2014-01-11 18:07:13 -08:00
Reynold Xin 362cda18bc Renamed cloneKeyValues to cloneRecords; updated docs. 2014-01-11 18:01:29 -08:00
Joseph E. Gonzalez cf57b1b055 Correcting typos in documentation. 2014-01-11 17:13:10 -08:00
Patrick Wendell 409866b351 Merge pull request #393 from pwendell/revert-381
Revert PR 381

This PR missed a bunch of test cases that require "spark.cleaner.ttl". I think it is what is causing test failures on Jenkins right now (though it's a bit hard to tell because the DNS for cs.berkeley.edu is down).

I'm submitting this to see if it fixes jeknins. I did try just patching various tests but it was taking a really long time because there are a bunch of them, so for now I'm just seeing if a revert works.
2014-01-11 17:12:06 -08:00
Patrick Wendell 07b952e1d1 Revert "Fix default TTL for metadata cleaner"
This reverts commit 669ba4caa9.
2014-01-11 16:07:10 -08:00
Patrick Wendell 22d4d62420 Revert "Fix one unit test that was not setting spark.cleaner.ttl"
This reverts commit 942c80b34c.
2014-01-11 16:07:03 -08:00
Joseph E. Gonzalez 64c4593586 Finished docummenting join operators and revised some of the initial presentation. 2014-01-11 13:48:35 -08:00
Reynold Xin 2180c87188 Stop SparkListenerBus daemon thread when DAGScheduler is stopped. 2014-01-11 13:36:37 -08:00
Ankur Dave 02771aa087 Make EdgeDirection val instead of case object for Java compat. 2014-01-11 13:15:46 -08:00
Reynold Xin 6510f04e4d Merge pull request #387 from jerryshao/conf-fix
Fix configure didn't work small problem in ALS
2014-01-11 12:48:26 -08:00
Ankur Dave 574c0d28c2 Use SparkConf in GraphX tests (via LocalSparkContext) 2014-01-11 12:39:30 -08:00
Ankur Dave 55101f5821 One-line Scaladoc comments in Edge and EdgeDirection 2014-01-11 12:35:41 -08:00
Reynold Xin b0fbfccadc Minor update for clone writables and more documentation. 2014-01-11 12:35:10 -08:00
Ankur Dave 64f73f73a0 Fix indent and use SparkConf in Analytics 2014-01-11 12:33:06 -08:00
Reynold Xin ee6e7f9b8c Merge pull request #359 from ScrapCodes/clone-writables
We clone hadoop key and values by default and reuse objects if asked to.

 We try to clone for most common types of writables and we call WritableUtils.clone otherwise intention is to optimize, for example for NullWritable there is no need and for Long, int and String creating a new object with value set would be faster than doing copy on object hopefully.

There is another way to do this PR where we ask for both key and values whether to clone them or not, but could not think of a use case for it except either of them is actually a NullWritable for which I have already worked around. So thought that would be unnecessary.
2014-01-11 12:07:55 -08:00
Ankur Dave 732333d78e Remove GraphLab 2014-01-11 11:49:35 -08:00
Ankur Dave 0b5c49ebad Make nullValue and VertexSet package-private 2014-01-11 11:49:35 -08:00
Joseph E. Gonzalez fac44bbe2c Finished documenting structural operators and starting join operators. 2014-01-11 11:28:01 -08:00
Patrick Wendell b313e15616 Fix UI bug introduced in #244.
The 'duration' field was incorrectly renamed to 'task time' in the table that
lists stages.
2014-01-11 10:52:57 -08:00
Patrick Wendell 4216178d5e Merge pull request #373 from jerryshao/kafka-upgrade
Upgrade Kafka dependecy to 0.8.0 release version
2014-01-11 09:46:48 -08:00
Joseph E. Gonzalez 1f45e4e572 starting structural operator discussion. 2014-01-11 09:27:00 -08:00
Ankur Dave feaa078022 algorithms -> lib 2014-01-11 00:30:10 -08:00
jerryshao cbfbc01938 Fix configure didn't work small problem in ALS 2014-01-11 16:22:45 +08:00
Joseph E. Gonzalez 56a245c6bc Addressing comment about Graph Processing in docs. 2014-01-11 00:21:17 -08:00
Ankur Dave 4f7ddf40fc Optimize Edge.lexicographicOrdering 2014-01-11 00:15:01 -08:00
Joseph E. Gonzalez 0c9d39bbaa More organizational changes and dropping the benchmark plot. 2014-01-11 00:09:08 -08:00
Ankur Dave 34496d6a9f Move Analytics to algorithms and fix doc 2014-01-11 00:08:36 -08:00
Joseph E. Gonzalez b8a44f12a5 More edits. 2014-01-10 23:52:24 -08:00
Ankur Dave 362b9422e4 Soften wording about GraphX superseding Bagel 2014-01-10 23:48:32 -08:00
Ankur Dave 2d7e8d8c48 Add GC note to GraphLab 2014-01-10 23:46:02 -08:00
Reynold Xin 92ad18b00e Merge pull request #376 from prabeesh/master
Change clientId to random clientId

The client identifier should be unique across all clients connecting to the same server. A convenience method is provided to generate a random client id that should satisfy this criteria - generateClientId(). Returns a randomly generated client identifier based on the current user's login name and the system time. As the client identifier is used by the server to identify a client when it reconnects, the client must use the same identifier between connections if durable subscriptions are to be used.
2014-01-10 23:25:15 -08:00
Reynold Xin 0b5ce7af17 Merge pull request #386 from pwendell/typo-fix
Small typo fix
2014-01-10 23:23:21 -08:00
Andrew Or bb8098f203 Add number of bytes spilled to Web UI 2014-01-10 21:40:55 -08:00
Ankur Dave a696be1e01 Finish d1d2b6d9b6 2014-01-10 21:18:34 -08:00
Ankur Dave d1d2b6d9b6 Remove blank lines added to Spark core 2014-01-10 21:17:32 -08:00
Matei Zaharia 1d7bef0c91 Merge pull request #381 from mateiz/default-ttl
Fix default TTL for metadata cleaner

It seems to have been set to 3500 in a previous commit for debugging, but it should be off by default.
2014-01-10 18:53:03 -08:00
Ankur Dave c4fb6a87d3 Fix scaladoc warnings 2014-01-10 18:36:42 -08:00
Andrew Or e6447152b3 Induce spilling in ExternalAppendOnlyMapSuite 2014-01-10 18:33:48 -08:00
Ankur Dave 0ca18b8b07 Revert GraphX changes to SparkILoopInit
The changes were to support a custom banner in spark-shell for use by
graphx-shell, but once GraphX is merged into Spark, a separate shell
will be unnecessary.
2014-01-10 18:05:11 -08:00