ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reza Zadeh	d28bf41827	changes from PR	2014-01-17 13:39:40 -08:00
Mridul Muralidharan	b690e11d9c	Address review comment	2014-01-17 18:28:55 +05:30
Patrick Wendell	d749d472b3	Merge pull request #451 from Qiuzhuang/master Fixed Window spark shell launch script error. JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029	2014-01-16 23:18:15 -08:00
Patrick Wendell	d4fd89e3c8	Merge pull request #438 from ScrapCodes/clone-records-java-api Clone records java api	2014-01-16 23:17:30 -08:00
Prashant Sharma	fcb4fc653d	adding clone records field to equivaled java apis	2014-01-17 11:16:03 +05:30
Tathagata Das	11e6534d92	Updated java API docs for streaming, along with very minor changes in the code examples.	2014-01-16 14:44:02 -08:00
Mridul Muralidharan	edd82c58a2	Use method, not variable	2014-01-16 17:26:42 +05:30
Mridul Muralidharan	1a0da89277	Address review comments	2014-01-16 17:23:25 +05:30
Qiuzhuang Lian	4e510b0b0c	Fixed Window spark shell launch script error. JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029	2014-01-16 16:09:10 +08:00
Reynold Xin	c06a307ca2	Merge pull request #445 from kayousterhout/exec_lost Fail rather than hanging if a task crashes the JVM. Prior to this commit, if a task crashes the JVM, the task (and all other tasks running on that executor) is marked at KILLED rather than FAILED. As a result, the TaskSetManager will retry the task indefinitely rather than failing the job after maxFailures. Eventually, this makes the job hang, because the Standalone Scheduler removes the application after 10 works have failed, and then the app is left in a state where it's disconnected from the master and waiting to reconnect. This commit fixes that problem by marking tasks as FAILED rather than killed when an executor is lost. The downside of this commit is that if task A fails because another task running on the same executor caused the VM to crash, the failure will incorrectly be counted as a failure of task A. This should not be an issue because we typically set maxFailures to 3, and it is unlikely that a task will be co-located with a JVM-crashing task multiple times.	2014-01-15 23:47:25 -08:00
Kay Ousterhout	718a13c179	Updated unit test comment	2014-01-15 23:46:14 -08:00
Reynold Xin	84595ea3e2	Merge pull request #414 from soulmachine/code-style Code clean up for mllib * Removed unnecessary parentheses * Removed unused imports * Simplified `filter...size()` to `count ...` * Removed obsoleted parameters' comments	2014-01-15 20:15:29 -08:00
CrazyJvm	8400536456	fix some format problem.	2014-01-16 11:57:46 +08:00
CrazyJvm	7a0c5b5a23	fix "set MASTER automatically fails" bug. spark-shell intends to set MASTER automatically if we do not provide the option when we start the shell , but there's a problem. The condition is "if [[ "x" != "x$SPARK_MASTER_IP" && "y" != "y$SPARK_MASTER_PORT" ]];" we sure will set SPARK_MASTER_IP explicitly, the SPARK_MASTER_PORT option, however, we probably do not set just using spark default port 7077. So if we do not set SPARK_MASTER_PORT, the condition will never be true. We should just use default port if users do not set port explicitly I think.	2014-01-16 11:45:02 +08:00
Reynold Xin	0675ca50f3	Merge pull request #439 from CrazyJvm/master SPARK-1024 Remove "-XX:+UseCompressedStrings" option from tuning guide remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.	2014-01-15 16:09:03 -08:00
Kay Ousterhout	a268d63411	Fail rather than hanging if a task crashes the JVM. Prior to this commit, if a task crashes the JVM, the task (and all other tasks running on that executor) is marked at KILLED rather than FAILED. As a result, the TaskSetManager will retry the task indefiniteily rather than failing the job after maxFailures. This commit fixes that problem by marking tasks as FAILED rather than killed when an executor is lost. The downside of this commit is that if task A fails because another task running on the same executor caused the VM to crash, the failure will incorrectly be counted as a failure of task A. This should not be an issue because we typically set maxFailures to 3, and it is unlikely that a task will be co-located with a JVM-crashing task multiple times.	2014-01-15 16:03:40 -08:00
Patrick Wendell	4f0c361b0e	Merge pull request #444 from mateiz/py-version Clarify that Python 2.7 is only needed for MLlib	2014-01-15 14:25:45 -08:00
Matei Zaharia	2ffdaefbcb	Clarify that Python 2.7 is only needed for MLlib	2014-01-15 14:20:39 -08:00
Patrick Wendell	59f475c79f	Merge pull request #442 from pwendell/standalone Workers should use working directory as spark home if it's not specified If users don't set SPARK_HOME in their environment file when launching an application, the standalone cluster should default to the spark home of the worker.	2014-01-15 13:55:14 -08:00
Patrick Wendell	2a05403a7c	Merge pull request #443 from tdas/filestream-fix Made some classes private[stremaing] and deprecated a method in JavaStreamingContext. Classes `RawTextHelper`, `RawTextSender` and `RateLimitedOutputStream` are not useful in the streaming API. There are not used by the core functionality and was there as a support classes for an obscure example. One of the classes is RawTextSender has a main function which can be executed using bin/spark-class even if it is made private[streaming]. In future, I will probably completely remove these classes. For the time being, I am just converting them to private[streaming]. Accessing underlying JavaSparkContext in JavaStreamingContext was through `JavaStreamingContext.sc` . This is deprecated and preferred method is `JavaStreamingContext.sparkContext` to keep it consistent with the `StreamingContext.sparkContext`.	2014-01-15 13:54:45 -08:00
Tathagata Das	9e6375349e	Made some classes private[stremaing] and deprecated a method in JavaStreamingContext.	2014-01-15 12:15:46 -08:00
Patrick Wendell	5fecd2516d	Merge pull request #441 from pwendell/graphx-build GraphX shouldn't list Spark as provided. I noticed this when building an application against GraphX to audit the released artifacts.	2014-01-15 11:15:07 -08:00
Patrick Wendell	00a3f7eec5	Workers should use working directory as spark home if it's not specified	2014-01-15 11:05:36 -08:00
Patrick Wendell	9259d706be	GraphX shouldn't list Spark as provided	2014-01-15 10:46:37 -08:00
Patrick Wendell	494d3c0774	Merge pull request #433 from markhamstra/debFix Updated Debian packaging	2014-01-15 10:00:50 -08:00
Thomas Graves	cef2af9c7d	Merge pull request #366 from colorant/yarn-dev More yarn code refactor Try to retrive common code in yarn alpha/stable for client and workerRunnable to reduce duplicated codes. By put them into a trait in common dir and extends with them. Same works could be done for the remaining files in alpha/stable , while the remainning files have much more overlapping codes with different API call here and there within functions, and will need much more close review , aslo it might divide functions into too small trifle ones, thus might not deserve to be done in this way. So just make it run for these two files firstly.	2014-01-15 10:06:17 -06:00
CrazyJvm	263933da97	remove "-XX:+UseCompressedStrings" option remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.	2014-01-15 22:26:15 +08:00
Reynold Xin	3d9e66d92a	Merge pull request #436 from ankurdave/VertexId-case Rename VertexID -> VertexId in GraphX	2014-01-14 23:17:05 -08:00
Mridul Muralidharan	0aea33d39e	Expose method and class - so that we can use it from user code (particularly since checkpoint directory is autogenerated now	2014-01-15 12:44:44 +05:30
Patrick Wendell	139c24ef08	Merge pull request #435 from tdas/filestream-fix Fixed the flaky tests by making SparkConf not serializable SparkConf was being serialized with CoGroupedRDD and Aggregator, which somehow caused OptionalJavaException while being deserialized as part of a ShuffleMapTask. SparkConf should not even be serializable (according to conversation with Matei). This change fixes that. @mateiz @pwendell	2014-01-14 23:07:55 -08:00
Patrick Wendell	087487e90e	Merge pull request #434 from rxin/graphxmaven Fixed SVDPlusPlusSuite in Maven build. This should go into 0.9.0 also.	2014-01-14 22:50:36 -08:00
Tathagata Das	0e15bd7827	Merge remote-tracking branch 'apache/master' into filestream-fix	2014-01-14 22:21:20 -08:00
Tathagata Das	1f4718c480	Changed SparkConf to not be serializable. And also fixed unit-test log paths in log4j.properties of external modules.	2014-01-14 22:20:14 -08:00
Reynold Xin	dfb152446d	Fixed SVDPlusPlusSuite in Maven build.	2014-01-14 22:18:43 -08:00
Mark Hamstra	147a943df0	Removed repl-bin and updated maven build doc.	2014-01-14 22:17:24 -08:00
Ankur Dave	f4d9019aa8	VertexID -> VertexId	2014-01-14 22:17:18 -08:00
Mark Hamstra	148757e88c	Add deb profile to assembly/pom.xml	2014-01-14 22:05:42 -08:00
Reynold Xin	3a386e2389	Merge pull request #424 from jegonzal/GraphXProgrammingGuide Additional edits for clarity in the graphx programming guide. Added an overview of the Graph and GraphOps functions and fixed numerous typos.	2014-01-14 21:52:50 -08:00
Reynold Xin	ad294db326	Merge pull request #431 from ankurdave/graphx-caching-doc Describe caching and uncaching in GraphX programming guide	2014-01-14 21:51:06 -08:00
Ankur Dave	1210ec2945	Describe GraphX caching and uncaching in guide	2014-01-14 17:25:38 -08:00
Reynold Xin	74b46acdc5	Merge pull request #428 from pwendell/writeable-objects Don't clone records for text files	2014-01-14 14:59:13 -08:00
Reynold Xin	193a0757c8	Merge pull request #429 from ankurdave/graphx-examples-pom.xml Add GraphX dependency to examples/pom.xml	2014-01-14 14:53:24 -08:00
Reynold Xin	d601a76d1f	Merge pull request #427 from pwendell/deprecate-aggregator Deprecate rather than remove old combineValuesByKey function	2014-01-14 14:52:24 -08:00
Ankur Dave	8ea056d721	Add GraphX dependency to examples/pom.xml	2014-01-14 13:58:48 -08:00
Patrick Wendell	b1b22b7a13	Style fix	2014-01-14 13:56:27 -08:00
Patrick Wendell	8ea2cd56e4	Adding fix covering combineCombinersByKey as well	2014-01-14 13:52:23 -08:00
Reynold Xin	2ce23a55a3	Merge pull request #425 from rxin/scaladoc API doc update & make Broadcast public In #413 Broadcast was mistakenly made private[spark]. I changed it to public again. Also exposing id in public given the R frontend requires that. Copied some of the documentation from the programming guide to API Doc for Broadcast and Accumulator. This should be cherry picked into branch-0.9 as well for 0.9.0 release.	2014-01-14 13:28:44 -08:00
Matei Zaharia	5b3a3e28d7	Complain if Python and NumPy versions are too old for MLlib	2014-01-14 12:27:58 -08:00
Patrick Wendell	b683608c9f	Deprecate rather than remove old combineValuesByKey function	2014-01-14 12:15:10 -08:00
Matei Zaharia	938e4a0e16	Re-enable Python MLlib tests (require Python 2.7 and NumPy 1.7+)	2014-01-14 12:14:48 -08:00

... 25 26 27 28 29 ...

7527 commits