ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Patrick Wendell	e5514790d7	HOTFIX: SPARK-2208 local metrics tests can fail on fast machines Author: Patrick Wendell <pwendell@gmail.com> Closes #1141 from pwendell/hotfix and squashes the following commits: 83e4c79 [Patrick Wendell] HOTFIX: SPARK-2208 local metrics tests can fail on fast machines	2014-06-19 21:06:28 -07:00
nravi	f14b00a9c6	[SPARK-2151] Recognize memory format for spark-submit int format expected for input memory parameter when spark-submit is invoked in standalone cluster mode. Make it consistent with rest of Spark. Author: nravi <nravi@c1704.halxg.cloudera.com> Closes #1095 from nishkamravi2/master and squashes the following commits: 2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark 3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark 5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456) 6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed) 5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456) 681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles	2014-06-19 17:11:06 -07:00
WangTao	67fca189c9	Minor fix The value "env" is never used in SparkContext.scala. Add detailed comment for method setDelaySeconds in MetadataCleaner.scala instead of the unsure one. Author: WangTao <barneystinson@aliyun.com> Closes #1105 from WangTaoTheTonic/master and squashes the following commits: 688358e [WangTao] Minor fix	2014-06-18 23:24:57 -07:00
Doris Xin	45a95f82ca	Remove unicode operator from RDD.scala Some IDEs don’t support unicode characters in source code. Check if this breaks binary compatibility. Author: Doris Xin <doris.s.xin@gmail.com> Closes #1119 from dorx/unicode and squashes the following commits: 05618c3 [Doris Xin] Remove unicode operator from RDD.scala	2014-06-18 15:01:29 -07:00
Mark Hamstra	4cbeea83e0	SPARK-2158 Clean up core/stdout file from FileAppenderSuite @tdas Author: Mark Hamstra <markhamstra@gmail.com> Closes #1100 from markhamstra/SPARK-2158 and squashes the following commits: ae8e069 [Mark Hamstra] Response to TD's review 2f1e201 [Mark Hamstra] Cleanup 'stdout' file within FileAppenderSuite	2014-06-18 14:56:41 -07:00
Reynold Xin	dd96fcda01	Updated the comment for SPARK-2162. A follow up on #1103 @andrewor14 Author: Reynold Xin <rxin@apache.org> Closes #1117 from rxin/SPARK-2162 and squashes the following commits: a4231de [Reynold Xin] Updated the comment for SPARK-2162.	2014-06-18 12:48:58 -07:00
Raymond Liu	5ad5e3486a	[SPARK-2162] Double check in doGetLocal to avoid read on removed block. other wise, it will either read in vain in memory level case, or throw exception in disk level case when it believe the block is there while actually it had been removed. Author: Raymond Liu <raymond.liu@intel.com> Closes #1103 from colorant/bm and squashes the following commits: daac114 [Raymond Liu] Address comments d1ea287 [Raymond Liu] Double check in doGetLocal to avoid read on removed block.	2014-06-18 10:57:45 -07:00
Patrick Wendell	9e4b4bd083	Revert "SPARK-2038: rename "conf" parameters in the saveAsHadoop functions" This reverts commit `443f5e1bbc`. This commit unfortunately would break source compatibility if users have named the hadoopConf parameter.	2014-06-17 19:34:17 -07:00
Andrew Or	a14807e84c	[SPARK-2147 / 2161] Show removed executors on the UI This PR includes two changes - [SPARK-2147] When an application finishes cleanly (i.e. `sc.stop()` is called), all of its executors used to disappear from the Master UI. This no longer happens. - [SPARK-2161] This adds a "Removed Executors" table to Master UI, so the user can find out why their executors died from the logs, for instance. The equivalent table already existed in the Worker UI, but was hidden because of a bug (the comment `//scalastyle:off` disconnected the `Seq[Node]` that represents the HTML for table). This should go into 1.0.1 if possible. Author: Andrew Or <andrewor14@gmail.com> Closes #1102 from andrewor14/remember-removed-executors and squashes the following commits: 2e2298f [Andrew Or] Add hash code method to ExecutorInfo (minor) abd72e0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into remember-removed-executors 792f992 [Andrew Or] Add missing equals method in ExecutorInfo 3390b49 [Andrew Or] Add executor state column to WorkerPage 161f8a2 [Andrew Or] Display finished executors table (fix bug) fbb65b8 [Andrew Or] Removed unused method c89bb6e [Andrew Or] Add table for removed executors in MasterWebUI fe47402 [Andrew Or] Show exited executors on the Master UI	2014-06-17 12:25:55 -07:00
CodingCat	443f5e1bbc	SPARK-2038: rename "conf" parameters in the saveAsHadoop functions to distinguish with SparkConf object https://issues.apache.org/jira/browse/SPARK-2038 Author: CodingCat <zhunansjtu@gmail.com> Closes #1087 from CodingCat/SPARK-2038 and squashes the following commits: 763975f [CodingCat] style fix d91288d [CodingCat] rename "conf" parameters in the saveAsHadoop functions	2014-06-17 12:17:48 -07:00
Sandy Ryza	2794990e9e	SPARK-2146. Fix takeOrdered doc Removes Python syntax in Scaladoc, corrects result in Scaladoc, and removes irrelevant cache() call in Python doc. Author: Sandy Ryza <sandy@cloudera.com> Closes #1086 from sryza/sandy-spark-2146 and squashes the following commits: 185ff18 [Sandy Ryza] Use Seq instead of Array c996120 [Sandy Ryza] SPARK-2146. Fix takeOrdered doc	2014-06-17 12:03:22 -07:00
Andrew Ash	b92d16b114	SPARK-1063 Add .sortBy(f) method on RDD This never got merged from the apache/incubator-spark repo (which is now deleted) but there had been several rounds of code review on this PR there. I think this is ready for merging. Author: Andrew Ash <andrew@andrewash.com> This patch had conflicts when merged, resolved by Committer: Reynold Xin <rxin@apache.org> Closes #369 from ash211/sortby and squashes the following commits: d09147a [Andrew Ash] Fix Ordering import 43d0a53 [Andrew Ash] Fix missing .collect() 29a54ed [Andrew Ash] Re-enable test by converting to a closure 5a95348 [Andrew Ash] Add license for RDDSuiteUtils 64ed6e3 [Andrew Ash] Remove leaked diff d4de69a [Andrew Ash] Remove scar tissue 63638b5 [Andrew Ash] Add Python version of .sortBy() 45e0fde [Andrew Ash] Add Java version of .sortBy() adf84c5 [Andrew Ash] Re-indent to keep line lengths under 100 chars 9d9b9d8 [Andrew Ash] Use parentheses on .collect() calls 0457b69 [Andrew Ash] Ignore failing test 99f0baf [Andrew Ash] Merge branch 'master' into sortby 222ae97 [Andrew Ash] Try moving Ordering objects out to a different class 3fd0dd3 [Andrew Ash] Add (failing) test for sortByKey with explicit Ordering b8b5bbc [Andrew Ash] Align remove extra spaces that were used to align ='s in test code 8c53298 [Andrew Ash] Actually use ascending and numPartitions parameters 381eef2 [Andrew Ash] Correct silly typo 7db3e84 [Andrew Ash] Support ascending and numPartitions params in sortBy() 0f685fd [Andrew Ash] Merge remote-tracking branch 'origin/master' into sortby ca4490d [Andrew Ash] Add .sortBy(f) method on RDD	2014-06-17 11:47:48 -07:00
Andrew Or	09deb3eee0	[SPARK-2144] ExecutorsPage reports incorrect # of RDD blocks This is reproducible whenever we drop a block because of memory pressure. This is because StorageStatusListener actually never removes anything from the block maps of its StorageStatuses. Instead, when a block is dropped, it sets the block's storage level to `StorageLevel.NONE`, when it should just remove it from the map. This PR includes this simple fix. Author: Andrew Or <andrewor14@gmail.com> Closes #1080 from andrewor14/ui-blocks and squashes the following commits: fcf9f1a [Andrew Or] Remove BlockStatus if it is no longer cached	2014-06-17 01:28:22 -07:00
Daniel Darabos	23a12ce20c	SPARK-2035: Store call stack for stages, display it on the UI. I'm not sure about the test -- I get a lot of unrelated failures for some reason. I'll try to sort it out. But hopefully the automation will test this for me if I send a pull request :). I'll attach a demo HTML in [Jira](https://issues.apache.org/jira/browse/SPARK-2035). Author: Daniel Darabos <darabos.daniel@gmail.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #981 from darabos/darabos-call-stack and squashes the following commits: f7c6bfa [Daniel Darabos] Fix bad merge. I undid `83c226d454` by Doris. 3d0a48d [Daniel Darabos] Merge remote-tracking branch 'upstream/master' into darabos-call-stack b857849 [Daniel Darabos] Style: Break long line. ecb5690 [Daniel Darabos] Include the last Spark method in the full stack trace. Otherwise it is not visible if the stage name is overridden. d00a85b [Patrick Wendell] Make call sites for stages non-optional and well defined b9eba24 [Daniel Darabos] Make StageInfo.details non-optional. Add JSON serialization code for the new field. Verify JSON backward compatibility. 4312828 [Daniel Darabos] Remove Mima excludes for CallSite. They should be unnecessary now, with SPARK-2070 fixed. 0920750 [Daniel Darabos] Merge remote-tracking branch 'upstream/master' into darabos-call-stack a4b1faf [Daniel Darabos] Add Mima exclusions for the CallSite changes it has picked up. They are private methods/classes, so we ought to be safe. 932f810 [Daniel Darabos] Use empty CallSite instead of null in DAGSchedulerSuite. Outside of testing, this parameter always originates in SparkContext.scala, and will never be null. ccd89d1 [Daniel Darabos] Fix long lines. ac173e4 [Daniel Darabos] Hide "show details" if there are no details to show. 6182da6 [Daniel Darabos] Set a configurable limit on maximum call stack depth. It can be useful in memory-constrained situations with large numbers of stages. 8fe2e34 [Daniel Darabos] Store call stack for stages, display it on the UI.	2014-06-17 00:08:05 -07:00
CodingCat	716c88aa14	SPARK-2039: apply output dir existence checking for all output formats https://issues.apache.org/jira/browse/SPARK-2039 apply output dir existence checking for all output formats Author: CodingCat <zhunansjtu@gmail.com> Closes #1088 from CodingCat/SPARK-2039 and squashes the following commits: c52747a [CodingCat] apply output dir existence checking for all output formats	2014-06-15 23:47:58 -07:00
CrazyJvm	a63aa1adb2	SPARK-1999: StorageLevel in storage tab and RDD Storage Info never changes StorageLevel in 'storage tab' and 'RDD Storage Info' never changes even if you call rdd.unpersist() and then you give the rdd another different storage level. Author: CrazyJvm <crazyjvm@gmail.com> Closes #968 from CrazyJvm/ui-storagelevel and squashes the following commits: 62555fa [CrazyJvm] change RDDInfo constructor param 'storageLevel' to var, so there's need to add another variable _storageLevel。 9f1571e [CrazyJvm] JIRA https://issues.apache.org/jira/browse/SPARK-1999 UI : StorageLevel in storage tab and RDD Storage Info never changes	2014-06-15 23:23:26 -07:00
Kan Zhang	ca5d9d43b9	[SPARK-937] adding EXITED executor state and not relaunching cleanly exited executors There seems to be 2 issues. 1. When job is done, driver asks executor to shutdown. However, this clean exit was assigned FAILED executor state by Worker. I introduced EXITED executor state for executors who voluntarily exit (both normal and abnormal exit depending on the exit code). 2. When Master gets notified an executor has exited, it launches another one to replace it, regardless of reason why the executor had exited. When the reason was job has finished, the unnecessary replacement got subsequently killed when App disassociates. This launching and killing of unnecessary executors shows up in the log and is confusing to users. I added check for executor exit status and avoid launching (and subsequent killing) of unnecessary replacements when executors exit cleanly. One could ask the scheduler to tell Master job is done so that Master wouldn't launch the replacement executor. However, there is a race condition between App telling Master job is done and Worker telling Master an executor had exited. There is no guarantee the former will happen before the later. Instead, I chose to check the exit code when executor exits. If the exit code is 0, I assume executor has been asked to shutdown by driver and Master will not launch replacements. Due to race condition, it could also happen that (although didn't happen on my local cluster), Master detects App disassociation event before the executor exits by itself. In such cases, the executor will be rightfully killed and labeled as KILLED, while the App state will show FINISHED. Author: Kan Zhang <kzhang@apache.org> Closes #306 from kanzhang/SPARK-1118 and squashes the following commits: cb0cc86 [Kan Zhang] [SPARK-937] adding EXITED executor state and not relaunching cleanly exited executors	2014-06-15 14:55:34 -07:00
Kan Zhang	7dd9fc67a6	[SPARK-1837] NumericRange should be partitioned in the same way as other... ... sequences Author: Kan Zhang <kzhang@apache.org> Closes #776 from kanzhang/SPARK-1837 and squashes the following commits: e48f018 [Kan Zhang] [SPARK-1837] code refactoring 67c33b5 [Kan Zhang] minor change 403f9b1 [Kan Zhang] [SPARK-1837] NumericRange should be partitioned in the same way as other sequences	2014-06-14 14:31:28 -07:00
nravi	70c8116c0a	Workaround in Spark for ConcurrentModification issue (JIRA Hadoop-10456, Spark-1097) This fix has gone into Hadoop 2.4.1. For developers using < 2.4.1, it would be good to have a workaround in Spark as well. Fix has been tested for performance as well, no regressions found. Author: nravi <nravi@c1704.halxg.cloudera.com> Closes #1000 from nishkamravi2/master and squashes the following commits: eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456) 6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed) 5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456) 681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles	2014-06-13 10:52:21 -07:00
Andrew Or	44daec5abd	[Minor] Fix style, formatting and naming in BlockManager etc. This is a precursor to a bigger change. I wanted to separate out the relatively insignificant changes so the ultimate PR is not inflated. (Warning: this PR is full of unimportant nitpicks) Author: Andrew Or <andrewor14@gmail.com> Closes #1058 from andrewor14/bm-minor and squashes the following commits: 8e12eaf [Andrew Or] SparkException -> BlockException c36fd53 [Andrew Or] Make parts of BlockManager more readable 0a5f378 [Andrew Or] Entry -> MemoryEntry e9762a5 [Andrew Or] Tone down string interpolation (minor reverts) c4de9ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into bm-minor b3470f1 [Andrew Or] More string interpolation (minor) 7f9dcab [Andrew Or] Use string interpolation (minor) 94a425b [Andrew Or] Refactor against duplicate code + minor changes 8a6a7dc [Andrew Or] Exception -> SparkException 97c410f [Andrew Or] Deal with MIMA excludes 2480f1d [Andrew Or] Fixes in StorgeLevel.scala abb0163 [Andrew Or] Style, formatting and naming fixes	2014-06-12 20:40:58 -07:00
Doris Xin	1de1d703bf	SPARK-1939 Refactor takeSample method in RDD to use ScaSRS Modified the takeSample method in RDD to use the ScaSRS sampling technique to improve performance. Added a private method that computes sampling rate > sample_size/total to ensure sufficient sample size with success rate >= 0.9999. Added a unit test for the private method to validate choice of sampling rate. Author: Doris Xin <doris.s.xin@gmail.com> Author: dorx <doris.s.xin@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #916 from dorx/takeSample and squashes the following commits: 5b061ae [Doris Xin] merge master 444e750 [Doris Xin] edge cases 3de882b [dorx] Merge pull request #2 from mengxr/SPARK-1939 82dde31 [Xiangrui Meng] update pyspark's takeSample 48d954d [Doris Xin] remove unused imports from RDDSuite fb1452f [Doris Xin] allowing num to be greater than count in all cases 1481b01 [Doris Xin] washing test tubes and making coffee dc699f3 [Doris Xin] give back imports removed by accident in rdd.py 64e445b [Doris Xin] logwarnning as soon as it enters the while loop 55518ed [Doris Xin] added TODO for logging in rdd.py eff89e2 [Doris Xin] addressed reviewer comments. ecab508 [Doris Xin] "fixed checkstyle violation 0a9b3e3 [Doris Xin] "reviewer comment addressed" f80f270 [Doris Xin] Merge branch 'master' into takeSample ae3ad04 [Doris Xin] fixed edge cases to prevent overflow 065ebcd [Doris Xin] Merge branch 'master' into takeSample 9bdd36e [Doris Xin] Check sample size and move computeFraction e3fd6a6 [Doris Xin] Merge branch 'master' into takeSample 7cab53a [Doris Xin] fixed import bug in rdd.py ffea61a [Doris Xin] SPARK-1939: Refactor takeSample method in RDD 1441977 [Doris Xin] SPARK-1939 Refactor takeSample method in RDD to use ScaSRS	2014-06-12 19:44:27 -07:00
Ariel Rabkin	0154587ab7	document laziness of parallelize Took me several hours to figure out this behavior. It would be good to highlight it in the documentation. Author: Ariel Rabkin <asrabkin@cs.princeton.edu> Closes #1070 from asrabkin/master and squashes the following commits: 29a076e [Ariel Rabkin] doc fix	2014-06-12 17:51:33 -07:00
Patrick Wendell	1c04652c8f	SPARK-1843: Replace assemble-deps with env variable. (This change is actually small, I moved some logic into compute-classpath that was previously in spark-class). Assemble deps has existed for a while to allow developers to run local code with new changes quickly. When I'm developing I typically use a simpler approach which just prepends the Spark classes to the classpath before the assembly jar. This is well defined in the JVM and the Spark classes take precedence over those in the assembly. This approach is portable across both builds which is the main reason I'd like to switch to it. It's also a bit easier to toggle on and off quickly. The way you use this is the following: ``` $ ./bin/spark-shell # Use spark with the normal assembly $ export SPARK_PREPEND_CLASSES=true $ ./bin/spark-shell # Now it's using compiled classes $ unset SPARK_PREPEND_CLASSES $ ./bin/spark-shell # Back to normal ``` Author: Patrick Wendell <pwendell@gmail.com> Closes #877 from pwendell/assemble-deps and squashes the following commits: 8a11345 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into assemble-deps faa3168 [Patrick Wendell] Adding a warning for compatibility 3f151a7 [Patrick Wendell] Small fix bbfb73c [Patrick Wendell] Review feedback 328e9f8 [Patrick Wendell] SPARK-1843: Replace assemble-deps with env variable.	2014-06-12 15:43:32 -07:00
Marcelo Vanzin	ecde5b8375	[SPARK-2080] Yarn: report HS URL in client mode, correct user in cluster mode. Yarn client mode was not setting the app's tracking URL to the History Server's URL when configured by the user. Now client mode behaves the same as cluster mode. In SparkContext.scala, the "user.name" system property had precedence over the SPARK_USER environment variable. This means that SPARK_USER was never used, since "user.name" is always set by the JVM. In Yarn cluster mode, this means the application always reported itself as being run by user "yarn" (or whatever user was running the Yarn NM). One could argue that the correct fix would be to use UGI.getCurrentUser() here, but at least for Yarn that will match what SPARK_USER is set to. Author: Marcelo Vanzin <vanzin@cloudera.com> This patch had conflicts when merged, resolved by Committer: Thomas Graves <tgraves@apache.org> Closes #1002 from vanzin/yarn-client-url and squashes the following commits: 4046e04 [Marcelo Vanzin] Set HS link in yarn-alpha also. 4c692d9 [Marcelo Vanzin] Yarn: report HS URL in client mode, correct user in cluster mode.	2014-06-12 16:19:36 -05:00
Doris Xin	83c226d454	[SPARK-2088] fix NPE in toString After deserialization, the transient field creationSiteInfo does not get backfilled with the default value, but the toString method, which is invoked by the serializer, expects the field to always be non-null. An NPE is thrown when toString is called by the serializer when creationSiteInfo is null. Author: Doris Xin <doris.s.xin@gmail.com> Closes #1028 from dorx/toStringNPE and squashes the following commits: f20021e [Doris Xin] unit test for toString after desrialization 6f0a586 [Doris Xin] Merge branch 'master' into toStringNPE f47fecf [Doris Xin] Merge branch 'master' into toStringNPE 76199c6 [Doris Xin] [SPARK-2088] fix NPE in toString	2014-06-12 12:53:07 -07:00
Sandy Ryza	ce92a9c18f	SPARK-554. Add aggregateByKey. Author: Sandy Ryza <sandy@cloudera.com> Closes #705 from sryza/sandy-spark-554 and squashes the following commits: 2302b8f [Sandy Ryza] Add MIMA exclude f52e0ad [Sandy Ryza] Fix Python tests for real 2f3afa3 [Sandy Ryza] Fix Python test 0b735e9 [Sandy Ryza] Fix line lengths ae56746 [Sandy Ryza] Fix doc (replace T with V) c2be415 [Sandy Ryza] Java and Python aggregateByKey 23bf400 [Sandy Ryza] SPARK-554. Add aggregateByKey.	2014-06-12 08:14:25 -07:00
Henry Saputra	4d8ae709fb	Cleanup on Connection and ConnectionManager Simple cleanup on Connection and ConnectionManager to make IDE happy while working of issue: 1. Replace var with var 2. Add parentheses to Queue#dequeu to be consistent with side-effects. 3. Remove return on final line of a method. Author: Henry Saputra <henry.saputra@gmail.com> Closes #1060 from hsaputra/cleanup_connection_classes and squashes the following commits: 245fd09 [Henry Saputra] Cleanup on Connection and ConnectionManager to make IDE happy while working of issue: 1. Replace var with var 2. Add parentheses to Queue#dequeu to be consistent with side-effects. 3. Remove return on final line of a method.	2014-06-11 23:17:51 -07:00
Yadong	e056320cc8	'killFuture' is never used Author: Yadong <qiyadong2010@gmail.com> Closes #1052 from watermen/bug-fix1 and squashes the following commits: 409d09a [Yadong] 'killFuture' is never used	2014-06-11 20:58:39 -07:00
Matei Zaharia	508fd371d6	[SPARK-2044] Pluggable interface for shuffles This is a first cut at moving shuffle logic behind a pluggable interface, as described at https://issues.apache.org/jira/browse/SPARK-2044, to let us more easily experiment with new shuffle implementations. It moves the existing shuffle code to a class HashShuffleManager behind a general ShuffleManager interface. Two things are still missing to make this complete: * MapOutputTracker needs to be hidden behind the ShuffleManager interface; this will also require adding methods to ShuffleManager that will let the DAGScheduler interact with it as it does with the MapOutputTracker today * The code to do map-sides and reduce-side combine in ShuffledRDD, PairRDDFunctions, etc needs to be moved into the ShuffleManager's readers and writers However, some of these may also be done later after we merge the current interface. Author: Matei Zaharia <matei@databricks.com> Closes #1009 from mateiz/pluggable-shuffle and squashes the following commits: 7a09862 [Matei Zaharia] review comments be33d3f [Matei Zaharia] review comments 1513d4e [Matei Zaharia] Add ASF header ac56831 [Matei Zaharia] Bug fix and better error message 4f681ba [Matei Zaharia] Move write part of ShuffleMapTask to ShuffleManager f6f011d [Matei Zaharia] Move hash shuffle reader behind ShuffleManager interface 55c7717 [Matei Zaharia] Changed RDD code to use ShuffleReader 75cc044 [Matei Zaharia] Partial work to move hash shuffle in	2014-06-11 20:45:29 -07:00
Prashant Sharma	e508f599f8	[SPARK-2108] Mark SparkContext methods that return block information as developer API's Author: Prashant Sharma <prashant.s@imaginea.com> Closes #1047 from ScrapCodes/SPARK-2108/mark-as-dev-api and squashes the following commits: 073ee34 [Prashant Sharma] [SPARK-2108] Mark SparkContext methods that return block information as developer API's	2014-06-11 10:49:34 -07:00
witgo	c48b6222ea	Resolve scalatest warnings during build Author: witgo <witgo@qq.com> Closes #1032 from witgo/ShouldMatchers and squashes the following commits: 7ebf34c [witgo] Resolve scalatest warnings during build	2014-06-10 20:24:05 -07:00
Tathagata Das	4823bf470e	[SPARK-1940] Enabling rolling of executor logs, and automatic cleanup of old executor logs Currently, in the default log4j configuration, all the executor logs get sent to the file <code>[executor-working-dir]/stderr</code>. This does not all log files to be rolled, so old logs cannot be removed. Using log4j RollingFileAppender allows log4j logs to be rolled, but all the logs get sent to a different set of files, other than the files <code>stdout</code> and <code>stderr</code> . So the logs are not visible in the Spark web UI any more as Spark web UI only reads the files <code>stdout</code> and <code>stderr</code>. Furthermore, it still does not allow the stdout and stderr to be cleared periodically in case a large amount of stuff gets written to them (e.g. by explicit `println` inside map function). This PR solves this by implementing a simple `RollingFileAppender` within Spark (disabled by default). When enabled (using configuration parameter `spark.executor.rollingLogs.enabled`), the logs can get rolled over either by time interval (set with `spark.executor.rollingLogs.interval`, set to daily by default), or by size of logs (set with `spark.executor.rollingLogs.size`). Finally, old logs can be automatically deleted by specifying how many of the latest log files to keep (set with `spark.executor.rollingLogs.keepLastN`). The web UI has also been modified to show the logs across the rolled-over files. You can test this locally (without waiting a whole day) by setting configuration `spark.executor.rollingLogs.enabled=true` and `spark.executor.rollingLogs.interval=minutely`. Continuously generate logs by running spark jobs and the generated logs files would look like this (`stderr` and `stdout` are the most current log file that are being written to). ``` stderr stderr--2014-05-27--14-37 stderr--2014-05-27--14-47 stderr--2014-05-27--15-05 stdout stdout--2014-05-27--14-47 ``` The web ui should show logs across these files. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #895 from tdas/rolling-logs and squashes the following commits: fd8f87f [Tathagata Das] Minor change. d326aee [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into rolling-logs ad956c1 [Tathagata Das] Scala style fix. 1f0a6ec [Tathagata Das] Some more changes based on Patrick's PR comments. c8bfe4e [Tathagata Das] Refactore FileAppender to a package spark.util.logging and broke up the file into multiple files. Changed configuration parameter names. 4224409 [Tathagata Das] Style fix. 108a9f8 [Tathagata Das] Added better constraint handling for rolling policies. f7da977 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into rolling-logs 9134495 [Tathagata Das] Simplified rolling logs by removing Daily/Hourly/MinutelyRollingFileAppender, and removing the setting rollingLogs.enabled 312d874 [Tathagata Das] Minor fixes based on PR comments. 8a67d83 [Tathagata Das] Fixed comments. b36cfd6 [Tathagata Das] Implemented RollingPolicy, TimeBasedRollingPolicy and SizeBasedRollingPolicy, and changed RollingFileAppender accordingly. b7e8272 [Tathagata Das] Style fix, 374c9a9 [Tathagata Das] Added missing license. 24354ea [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into rolling-logs 6cc09c7 [Tathagata Das] Fixed bugs in rolling logs, and added more debug statements. adf4910 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into rolling-logs 931f8fb [Tathagata Das] Changed log viewer in Spark web UI to handle rolling log files. cb4fb6d [Tathagata Das] Added FileAppender and RollingFileAppender to generate rolling executor logs.	2014-06-10 20:22:02 -07:00
Nick Pentreath	f971d6cb60	SPARK-1416: PySpark support for SequenceFile and Hadoop InputFormats So I finally resurrected this PR. It seems the old one against the incubator mirror is no longer available, so I cannot reference it. This adds initial support for reading Hadoop ```SequenceFile```s, as well as arbitrary Hadoop ```InputFormat```s, in PySpark. # Overview The basics are as follows: 1. ```PythonRDD``` object contains the relevant methods, that are in turn invoked by ```SparkContext``` in PySpark 2. The SequenceFile or InputFormat is read on the Scala side and converted from ```Writable``` instances to the relevant Scala classes (in the case of primitives) 3. Pyrolite is used to serialize Java objects. If this fails, the fallback is ```toString``` 4. ```PickleSerializer``` on the Python side deserializes. This works "out the box" for simple ```Writable```s: * ```Text``` * ```IntWritable```, ```DoubleWritable```, ```FloatWritable``` * ```NullWritable``` * ```BooleanWritable``` * ```BytesWritable``` * ```MapWritable``` It also works for simple, "struct-like" classes. Due to the way Pyrolite works, this requires that the classes satisfy the JavaBeans convenstions (i.e. with fields and a no-arg constructor and getters/setters). (Perhaps in future some sugar for case classes and reflection could be added). I've tested it out with ```ESInputFormat``` as an example and it works very nicely: ```python conf = {"es.resource" : "index/type" } rdd = sc.newAPIHadoopRDD("org.elasticsearch.hadoop.mr.EsInputFormat", "org.apache.hadoop.io.NullWritable", "org.elasticsearch.hadoop.mr.LinkedMapWritable", conf=conf) rdd.first() ``` I suspect for things like HBase/Cassandra it will be a bit trickier to get it to work out the box. # Some things still outstanding: 1. ~~Requires ```msgpack-python``` and will fail without it. As originally discussed with Josh, add a ```as_strings``` argument that defaults to ```False```, that can be used if ```msgpack-python``` is not available~~ 2. ~~I see from https://github.com/apache/spark/pull/363 that Pyrolite is being used there for SerDe between Scala and Python. @ahirreddy @mateiz what is the plan behind this - is Pyrolite preferred? It seems from a cursory glance that adapting the ```msgpack```-based SerDe here to use Pyrolite wouldn't be too hard~~ 3. ~~Support the key and value "wrapper" that would allow a Scala/Java function to be plugged in that would transform whatever the key/value Writable class is into something that can be serialized (e.g. convert some custom Writable to a JavaBean or ```java.util.Map``` that can be easily serialized)~~ 4. Support ```saveAsSequenceFile``` and ```saveAsHadoopFile``` etc. This would require SerDe in the reverse direction, that can be handled by Pyrolite. Will work on this as a separate PR Author: Nick Pentreath <nick.pentreath@gmail.com> Closes #455 from MLnick/pyspark-inputformats and squashes the following commits: 268df7e [Nick Pentreath] Documentation changes mer @pwendell comments 761269b [Nick Pentreath] Address @pwendell comments, simplify default writable conversions and remove registry. 4c972d8 [Nick Pentreath] Add license headers d150431 [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats cde6af9 [Nick Pentreath] Parameterize converter trait 5ebacfa [Nick Pentreath] Update docs for PySpark input formats a985492 [Nick Pentreath] Move Converter examples to own package 365d0be [Nick Pentreath] Make classes private[python]. Add docs and @Experimental annotation to Converter interface. eeb8205 [Nick Pentreath] Fix path relative to SPARK_HOME in tests 1eaa08b [Nick Pentreath] HBase -> Cassandra app name oversight 3f90c3e [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats 2c18513 [Nick Pentreath] Add examples for reading HBase and Cassandra InputFormats from Python b65606f [Nick Pentreath] Add converter interface 5757f6e [Nick Pentreath] Default key/value classes for sequenceFile asre None 085b55f [Nick Pentreath] Move input format tests to tests.py and clean up docs 43eb728 [Nick Pentreath] PySpark InputFormats docs into programming guide 94beedc [Nick Pentreath] Clean up args in PythonRDD. Set key/value converter defaults to None for PySpark context.py methods 1a4a1d6 [Nick Pentreath] Address @mateiz style comments 01e0813 [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats 15a7d07 [Nick Pentreath] Remove default args for key/value classes. Arg names to camelCase 9fe6bd5 [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats 84fe8e3 [Nick Pentreath] Python programming guide space formatting d0f52b6 [Nick Pentreath] Python programming guide 7caa73a [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats 93ef995 [Nick Pentreath] Add back context.py changes 9ef1896 [Nick Pentreath] Recover earlier changes lost in previous merge for serializers.py 077ecb2 [Nick Pentreath] Recover earlier changes lost in previous merge for context.py 5af4770 [Nick Pentreath] Merge branch 'master' into pyspark-inputformats 35b8e3a [Nick Pentreath] Another fix for test ordering bef3afb [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats e001b94 [Nick Pentreath] Fix test failures due to ordering 78978d9 [Nick Pentreath] Add doc for SequenceFile and InputFormat support to Python programming guide 64eb051 [Nick Pentreath] Scalastyle fix e7552fa [Nick Pentreath] Merge branch 'master' into pyspark-inputformats 44f2857 [Nick Pentreath] Remove msgpack dependency and switch serialization to Pyrolite, plus some clean up and refactoring c0ebfb6 [Nick Pentreath] Change sequencefile test data generator to easily be called from PySpark tests 1d7c17c [Nick Pentreath] Amend tests to auto-generate sequencefile data in temp dir 17a656b [Nick Pentreath] remove binary sequencefile for tests f60959e [Nick Pentreath] Remove msgpack dependency and serializer from PySpark 450e0a2 [Nick Pentreath] Merge branch 'master' into pyspark-inputformats 31a2fff [Nick Pentreath] Scalastyle fixes fc5099e [Nick Pentreath] Add Apache license headers 4e08983 [Nick Pentreath] Clean up docs for PySpark context methods b20ec7e [Nick Pentreath] Clean up merge duplicate dependencies 951c117 [Nick Pentreath] Merge branch 'master' into pyspark-inputformats f6aac55 [Nick Pentreath] Bring back msgpack 9d2256e [Nick Pentreath] Merge branch 'master' into pyspark-inputformats 1bbbfb0 [Nick Pentreath] Clean up SparkBuild from merge a67dfad [Nick Pentreath] Clean up Msgpack serialization and registering 7237263 [Nick Pentreath] Add back msgpack serializer and hadoop file code lost during merging 25da1ca [Nick Pentreath] Add generator for nulls, bools, bytes and maps 65360d5 [Nick Pentreath] Adding test SequenceFiles 0c612e5 [Nick Pentreath] Merge branch 'master' into pyspark-inputformats d72bf18 [Nick Pentreath] msgpack dd57922 [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats e67212a [Nick Pentreath] Add back msgpack dependency f2d76a0 [Nick Pentreath] Merge branch 'master' into pyspark-inputformats 41856a5 [Nick Pentreath] Merge branch 'master' into pyspark-inputformats 97ef708 [Nick Pentreath] Remove old writeToStream 2beeedb [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats 795a763 [Nick Pentreath] Change name to WriteInputFormatTestDataGenerator. Cleanup some var names. Use SPARK_HOME in path for writing test sequencefile data. 174f520 [Nick Pentreath] Add back graphx settings 703ee65 [Nick Pentreath] Add back msgpack 619c0fa [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats 1c8efbc [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats eb40036 [Nick Pentreath] Remove unused comment lines 4d7ef2e [Nick Pentreath] Fix indentation f1d73e3 [Nick Pentreath] mergeConfs returns a copy rather than mutating one of the input arguments 0f5cd84 [Nick Pentreath] Remove unused pair UTF8 class. Add comments to msgpack deserializer 4294cbb [Nick Pentreath] Add old Hadoop api methods. Clean up and expand comments. Clean up argument names 818a1e6 [Nick Pentreath] Add seqencefile and Hadoop InputFormat support to PythonRDD 4e7c9e3 [Nick Pentreath] Merge remote-tracking branch 'upstream/master' into pyspark-inputformats c304cc8 [Nick Pentreath] Adding supporting sequncefiles for tests. Cleaning up 4b0a43f [Nick Pentreath] Refactoring utils into own objects. Cleaning up old commented-out code d86325f [Nick Pentreath] Initial WIP of PySpark support for SequenceFile and arbitrary Hadoop InputFormat	2014-06-09 22:21:03 -07:00
Kay Ousterhout	6cf335d79a	Added a TaskSetManager unit test. This test ensures that when there are no alive executors that satisfy a particular locality level, the TaskSetManager doesn't ever use that as the maximum allowed locality level (this optimization ensures that a job doesn't wait extra time in an attempt to satisfy a scheduling locality level that is impossible). @mateiz and @lirui-intel this unit test illustrates an issue with #892 (it fails with that patch). Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #1024 from kayousterhout/scheduler_unit_test and squashes the following commits: de6a08f [Kay Ousterhout] Added a TaskSetManager unit test.	2014-06-09 13:13:53 -07:00
Andrew Ash	35630c86ff	SPARK-1944 Document --verbose in spark-shell -h https://issues.apache.org/jira/browse/SPARK-1944 Author: Andrew Ash <andrew@andrewash.com> Closes #1020 from ash211/SPARK-1944 and squashes the following commits: a831c4d [Andrew Ash] SPARK-1944 Document --verbose in spark-shell -h	2014-06-09 10:21:21 -07:00
Andrew Ash	32ee9f0668	Grammar: read -> reads Author: Andrew Ash <andrew@andrewash.com> Closes #1016 from ash211/patch-6 and squashes the following commits: e3865c8 [Andrew Ash] Grammar: read -> reads	2014-06-08 23:20:10 -07:00
Neville Li	15ddbef414	[SPARK-2067] use relative path for Spark logo in UI Author: Neville Li <neville@spotify.com> Closes #1006 from nevillelyh/gh/SPARK-2067 and squashes the following commits: 9ee64cf [Neville Li] [SPARK-2067] use relative path for Spark logo in UI	2014-06-08 23:18:27 -07:00
Reynold Xin	219dc00b30	SPARK-1628 follow up: Improve RangePartitioner's documentation. Adding a paragraph clarifying a weird behavior in RangePartitioner. See also #549. Author: Reynold Xin <rxin@apache.org> Closes #1012 from rxin/partitioner-doc and squashes the following commits: 6f0109e [Reynold Xin] SPARK-1628 follow up: Improve RangePartitioner's documentation.	2014-06-08 18:39:57 -07:00
zsxwing	a71c6d1cf0	SPARK-1628: Add missing hashCode methods in Partitioner subclasses JIRA: https://issues.apache.org/jira/browse/SPARK-1628 Added `hashCode` in HashPartitioner, RangePartitioner, PythonPartitioner and PageRankUtils.CustomPartitioner. Author: zsxwing <zsxwing@gmail.com> Closes #549 from zsxwing/SPARK-1628 and squashes the following commits: 2620936 [zsxwing] SPARK-1628: Add missing hashCode methods in Partitioner subclasses	2014-06-08 14:18:52 -07:00
Neville Li	7b877b2705	SPARK-2056 Set RDD name to input path Author: Neville Li <neville@spotify.com> Closes #992 from nevillelyh/master and squashes the following commits: 3011739 [Neville Li] [SPARK-2056] Set RDD name to input path	2014-06-07 16:22:26 -07:00
witgo	41c4a33105	[SPARK-1841]: update scalatest to version 2.1.5 Author: witgo <witgo@qq.com> Closes #713 from witgo/scalatest and squashes the following commits: b627a6a [witgo] merge master 51fb3d6 [witgo] merge master 3771474 [witgo] fix RDDSuite 996d6f9 [witgo] fix TimeStampedWeakValueHashMap test 9dfa4e7 [witgo] merge bug 1479b22 [witgo] merge master 29b9194 [witgo] fix code style 022a7a2 [witgo] fix test dependency a52c0fa [witgo] fix test dependency cd8f59d [witgo] Merge branch 'master' of https://github.com/apache/spark into scalatest 046540d [witgo] fix RDDSuite.scala 2c543b9 [witgo] fix ReplSuite.scala c458928 [witgo] update scalatest to version 2.1.5	2014-06-06 11:45:21 -07:00
Matei Zaharia	b45c13e7d7	SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys The current implementation reads one key with the next hash code as it finishes reading the keys with the current hash code, which may cause it to miss some matches of the next key. This can cause operations like join to give the wrong result when reduce tasks spill to disk and there are hash collisions, as values won't be matched together. This PR fixes it by not reading in that next key, using a peeking iterator instead. Author: Matei Zaharia <matei@databricks.com> Closes #986 from mateiz/spark-2043 and squashes the following commits: 0959514 [Matei Zaharia] Added unit test for having many hash collisions 892debb [Matei Zaharia] SPARK-2043: don't read a key with the next hash code in ExternalAppendOnlyMap, instead use a buffered iterator to only read values with the current hash code.	2014-06-05 23:01:48 -07:00
CrazyJvm	3d3f8c8004	Use pluggable clock in DAGSheduler #SPARK-2031 DAGScheduler supports pluggable clock like what TaskSetManager does. Author: CrazyJvm <crazyjvm@gmail.com> Closes #976 from CrazyJvm/clock and squashes the following commits: 6779a4c [CrazyJvm] Use pluggable clock in DAGSheduler	2014-06-05 17:44:46 -07:00
CodingCat	89cdbb087c	SPARK-1677: allow user to disable output dir existence checking https://issues.apache.org/jira/browse/SPARK-1677 For compatibility with older versions of Spark it would be nice to have an option `spark.hadoop.validateOutputSpecs` (default true) for the user to disable the output directory existence checking Author: CodingCat <zhunansjtu@gmail.com> Closes #947 from CodingCat/SPARK-1677 and squashes the following commits: 7930f83 [CodingCat] miao c0c0e03 [CodingCat] bug fix and doc update 5318562 [CodingCat] bug fix 13219b5 [CodingCat] allow user to disable output dir existence checking	2014-06-05 11:39:35 -07:00
Colin McCabe	1765c8d0dd	SPARK-1518: FileLogger: Fix compile against Hadoop trunk In Hadoop trunk (currently Hadoop 3.0.0), the deprecated FSDataOutputStream#sync() method has been removed. Instead, we should call FSDataOutputStream#hflush, which does the same thing as the deprecated method used to do. Author: Colin McCabe <cmccabe@cloudera.com> Closes #898 from cmccabe/SPARK-1518 and squashes the following commits: 752b9d7 [Colin McCabe] FileLogger: Fix compile against Hadoop trunk	2014-06-04 15:56:29 -07:00
Sean Owen	d341b17c2a	SPARK-1973. Add randomSplit to JavaRDD (with tests, and tidy Java tests) I'd like to use randomSplit through the Java API, and would like to add a convenience wrapper for this method to JavaRDD. This is fairly trivial. (In fact, is the intent that JavaRDD not wrap every RDD method? and that sometimes users should just use JavaRDD.wrapRDD()?) Along the way, I added tests for it, and also touched up the Java API test style and behavior. This is maybe the more useful part of this small change. Author: Sean Owen <sowen@cloudera.com> Author: Xiangrui Meng <meng@databricks.com> This patch had conflicts when merged, resolved by Committer: Xiangrui Meng <meng@databricks.com> Closes #919 from srowen/SPARK-1973 and squashes the following commits: 148cb7b [Sean Owen] Some final Java test polish, while we are at it 1fc3f3e [Xiangrui Meng] more cleaning on Java 8 tests 9ebc57f [Sean Owen] Use accumulator instead of temp files to test foreach 5efb0be [Sean Owen] Add Java randomSplit, and unit tests (including for sample) 5dcc158 [Sean Owen] Simplified Java 8 test with new language features, and fixed the name of MLB's greatest team 91a1769 [Sean Owen] Touch up minor style issues in existing Java API suite test	2014-06-04 11:27:08 -07:00
Kan Zhang	c402a4a685	[SPARK-1817] RDD.zip() should verify partition sizes for each partition RDD.zip() will throw an exception if it finds partition sizes are not the same. Author: Kan Zhang <kzhang@apache.org> Closes #944 from kanzhang/SPARK-1817 and squashes the following commits: c073848 [Kan Zhang] [SPARK-1817] Cosmetic updates 524c670 [Kan Zhang] [SPARK-1817] RDD.zip() should verify partition sizes for each partition	2014-06-03 22:47:18 -07:00
Sean Owen	4ca0625669	SPARK-1806 (addendum) Use non-deprecated methods in Mesos 0.18 The update to Mesos 0.18 caused some deprecation warnings in the build. The change to the non-deprecated version is straightforward as it emulates what the Mesos driver does with the deprecated method anyway (`c5aa1dd221/src/sched/sched.cpp (L1354)`) Author: Sean Owen <sowen@cloudera.com> Closes #920 from srowen/SPARK-1806 and squashes the following commits: 8d76b6a [Sean Owen] Use non-deprecated methods in Mesos 0.18	2014-06-03 22:37:20 -07:00
Reynold Xin	1faef149f7	SPARK-1941: Update streamlib to 2.7.0 and use HyperLogLogPlus instead of HyperLogLog. I also corrected some errors made in the previous HLL count approximate API, including relativeSD wasn't really a measure for error (and we used it to test error bounds in test results). Author: Reynold Xin <rxin@apache.org> Closes #897 from rxin/hll and squashes the following commits: 4d83f41 [Reynold Xin] New error bound and non-randomness. f154ea0 [Reynold Xin] Added a comment on the value bound for testing. e367527 [Reynold Xin] One more round of code review. 41e649a [Reynold Xin] Update final mima list. 9e320c8 [Reynold Xin] Incorporate code review feedback. e110d70 [Reynold Xin] Merge branch 'master' into hll 354deb8 [Reynold Xin] Added comment on the Mima exclude rules. acaa524 [Reynold Xin] Added the right exclude rules in MimaExcludes. 6555bfe [Reynold Xin] Added a default method and re-arranged MimaExcludes. 1db1522 [Reynold Xin] Excluded util.SerializableHyperLogLog from MIMA check. 9221b27 [Reynold Xin] Merge branch 'master' into hll 88cfe77 [Reynold Xin] Updated documentation and restored the old incorrect API to maintain API compatibility. 1294be6 [Reynold Xin] Updated HLL+. e7786cb [Reynold Xin] Merge branch 'master' into hll c0ef0c2 [Reynold Xin] SPARK-1941: Update streamlib to 2.7.0 and use HyperLogLogPlus instead of HyperLogLog.	2014-06-03 18:37:40 -07:00
Ankur Dave	b1feb60209	[SPARK-1991] Support custom storage levels for vertices and edges This PR adds support for specifying custom storage levels for the vertices and edges of a graph. This enables GraphX to handle graphs larger than memory size by specifying MEMORY_AND_DISK and then repartitioning the graph to use many small partitions, each of which does fit in memory. Spark will then automatically load partitions from disk as needed. The user specifies the desired vertex and edge storage levels when building the graph by passing them to the graph constructor. These are then stored in the `targetStorageLevel` attribute of the VertexRDD and EdgeRDD respectively. Whenever GraphX needs to cache a VertexRDD or EdgeRDD (because it plans to use it more than once, for example), it uses the specified target storage level. Also, when the user calls `Graph#cache()`, the vertices and edges are persisted using their target storage levels. In order to facilitate propagating the target storage levels across VertexRDD and EdgeRDD operations, we remove raw calls to the constructors and instead introduce the `withPartitionsRDD` and `withTargetStorageLevel` methods. I tested this change by running PageRank and triangle count on a severely memory-constrained cluster (1 executor with 300 MB of memory, and a 1 GB graph). Before this PR, these algorithms used to fail with OutOfMemoryErrors. With this PR, and using the DISK_ONLY storage level, they succeed. Author: Ankur Dave <ankurdave@gmail.com> Closes #946 from ankurdave/SPARK-1991 and squashes the following commits: ce17d95 [Ankur Dave] Move pickStorageLevel to StorageLevel.fromString ccaf06f [Ankur Dave] Shadow members in withXYZ() methods rather than using underscores c34abc0 [Ankur Dave] Exclude all of GraphX from compatibility checks vs. 1.0.0 c5ca068 [Ankur Dave] Revert "Exclude all of GraphX from binary compatibility checks" 34bcefb [Ankur Dave] Exclude all of GraphX from binary compatibility checks 6fdd137 [Ankur Dave] [SPARK-1991] Support custom storage levels for vertices and edges	2014-06-03 14:54:26 -07:00
Wenchen Fan(Cloud)	45e9bc85db	[SPARK-1912] fix compress memory issue during reduce When we need to read a compressed block, we will first create a compress stream instance(LZF or Snappy) and use it to wrap that block. Let's say a reducer task need to read 1000 local shuffle blocks, it will first prepare to read that 1000 blocks, which means create 1000 compression stream instance to wrap them. But the initialization of compression instance will allocate some memory and when we have many compression instance at the same time, it is a problem. Actually reducer reads the shuffle blocks one by one, so we can do the compression instance initialization lazily. Author: Wenchen Fan(Cloud) <cloud0fan@gmail.com> Closes #860 from cloud-fan/fix-compress and squashes the following commits: 0924a6b [Wenchen Fan(Cloud)] rename 'doWork' into 'getIterator' `07f32c2` [Wenchen Fan(Cloud)] move the LazyProxyIterator to dataDeserialize d80c426 [Wenchen Fan(Cloud)] remove empty lines in short class 2c8adb2 [Wenchen Fan(Cloud)] add inline comment 8ebff77 [Wenchen Fan(Cloud)] fix compress memory issue during reduce	2014-06-03 13:18:20 -07:00
Syed Hashmi	7782a304ad	[SPARK-1942] Stop clearing spark.driver.port in unit tests stop resetting spark.driver.port in unit tests (scala, java and python). Author: Syed Hashmi <shashmi@cloudera.com> Author: CodingCat <zhunansjtu@gmail.com> Closes #943 from syedhashmi/master and squashes the following commits: 885f210 [Syed Hashmi] Removing unnecessary file (created by mergetool) b8bd4b5 [Syed Hashmi] Merge remote-tracking branch 'upstream/master' b895e59 [Syed Hashmi] Revert "[SPARK-1784] Add a new partitioner" 57b6587 [Syed Hashmi] Revert "[SPARK-1784] Add a balanced partitioner" 1574769 [Syed Hashmi] [SPARK-1942] Stop clearing spark.driver.port in unit tests 4354836 [Syed Hashmi] Revert "SPARK-1686: keep schedule() calling in the main thread" fd36542 [Syed Hashmi] [SPARK-1784] Add a balanced partitioner 6668015 [CodingCat] SPARK-1686: keep schedule() calling in the main thread 4ca94cc [Syed Hashmi] [SPARK-1784] Add a new partitioner	2014-06-03 12:04:47 -07:00
Aaron Davidson	9909efc10a	SPARK-1839: PySpark RDD#take() shouldn't always read from driver This patch simply ports over the Scala implementation of RDD#take(), which reads the first partition at the driver, then decides how many more partitions it needs to read and will possibly start a real job if it's more than 1. (Note that SparkContext#runJob(allowLocal=true) only runs the job locally if there's 1 partition selected and no parent stages.) Author: Aaron Davidson <aaron@databricks.com> Closes #922 from aarondav/take and squashes the following commits: fa06df9 [Aaron Davidson] SPARK-1839: PySpark RDD#take() shouldn't always read from driver	2014-05-31 13:04:57 -07:00
Aaron Davidson	7d52777eff	Super minor: Close inputStream in SparkSubmitArguments `Properties#load()` doesn't close the InputStream, but it'd be closed after being GC'd anyway... Also changed file.getName to file, because getName only shows the filename. This will show the full (possibly relative) path, which is less confusing if it's not found. Author: Aaron Davidson <aaron@databricks.com> Closes #914 from aarondav/tiny and squashes the following commits: db9d072 [Aaron Davidson] Super minor: Close inputStream in SparkSubmitArguments	2014-05-31 12:36:58 -07:00
Chen Chao	9ecc40d3ae	correct tiny comment error Author: Chen Chao <crazyjvm@gmail.com> Closes #928 from CrazyJvm/patch-8 and squashes the following commits: 144328b [Chen Chao] correct tiny comment error	2014-05-31 00:06:49 -07:00
Zhen Peng	ff562b2396	[SPARK-1901] worker should make sure executor has exited before updating executor's info https://issues.apache.org/jira/browse/SPARK-1901 Author: Zhen Peng <zhenpeng01@baidu.com> Closes #854 from zhpengg/bugfix-worker-kills-executor and squashes the following commits: 21d380b [Zhen Peng] add some error messages 506cea6 [Zhen Peng] add some docs for killProcess() a0b9860 [Zhen Peng] [SPARK-1901] worker should make sure executor has exited before updating executor's info	2014-05-30 10:12:51 -07:00
witgo	4dbb27b0cf	[SPARK-1712]: TaskDescription instance is too big causes Spark to hang Author: witgo <witgo@qq.com> Closes #694 from witgo/SPARK-1712_new and squashes the following commits: 0f52483 [witgo] review commit 83ce29b [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 52e6752 [witgo] reset test SparkContext 63636b6 [witgo] review commit 44a59ee [witgo] review commit 3b6d48c [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 926bd6a [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 9a5cfad [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 03cc562 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new b0930b0 [witgo] review commit b1174bd [witgo] merge master f76679b [witgo] merge master 689495d [witgo] fix scala style bug 1d35c3c [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 062c182 [witgo] fix small bug for code style 0a428cf [witgo] add unit tests 158b2dc [witgo] review commit 4afe71d [witgo] review commit 9e4ffa7 [witgo] review commit 1d35c7d [witgo] fix hang 7965580 [witgo] fix Statement order 0e29eac [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 3ea1ca1 [witgo] remove duplicate serialize 743a7ad [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 86e2048 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 2a89adc [witgo] SPARK-1712: TaskDescription instance is too big causes Spark to hang	2014-05-28 15:57:05 -07:00
lianhuiwang	95e4c9c6fb	bugfix worker DriverStateChanged state should match DriverState.FAILED bugfix worker DriverStateChanged state should match DriverState.FAILED Author: lianhuiwang <lianhuiwang09@gmail.com> Closes #864 from lianhuiwang/master and squashes the following commits: 480ce94 [lianhuiwang] address aarondav comments f2b5970 [lianhuiwang] bugfix worker DriverStateChanged state should match DriverState.FAILED	2014-05-27 11:53:38 -07:00
zsxwing	549830b0db	SPARK-1932: Fix race conditions in onReceiveCallback and cachedPeers `var cachedPeers: Seq[BlockManagerId] = null` is used in `def replicate(blockId: BlockId, data: ByteBuffer, level: StorageLevel)` without proper protection. There are two place will call `replicate(blockId, bytesAfterPut, level)` * `17f3075bc4/core/src/main/scala/org/apache/spark/storage/BlockManager.scala (L644)` runs in `connectionManager.futureExecContext` * `17f3075bc4/core/src/main/scala/org/apache/spark/storage/BlockManager.scala (L752)` `doPut` runs in `connectionManager.handleMessageExecutor`. `org.apache.spark.storage.BlockManagerWorker` calls `blockManager.putBytes` in `connectionManager.handleMessageExecutor`. As they run in different `Executor`s, this is a race condition which may cause the memory pointed by `cachedPeers` is not correct even if `cachedPeers != null`. The race condition of `onReceiveCallback` is that it's set in `BlockManagerWorker` but read in a different thread in `ConnectionManager.handleMessageExecutor`. Author: zsxwing <zsxwing@gmail.com> Closes #887 from zsxwing/SPARK-1932 and squashes the following commits: 524f69c [zsxwing] SPARK-1932: Fix race conditions in onReceiveCallback and cachedPeers	2014-05-26 23:17:39 -07:00
Reynold Xin	90e281b55a	SPARK-1933: Throw a more meaningful exception when a directory is passed to addJar/addFile. https://issues.apache.org/jira/browse/SPARK-1933 Author: Reynold Xin <rxin@apache.org> Closes #888 from rxin/addfile and squashes the following commits: 8c402a3 [Reynold Xin] Updated comment. ff6c162 [Reynold Xin] SPARK-1933: Throw a more meaningful exception when a directory is passed to addJar/addFile.	2014-05-26 22:05:23 -07:00
Reynold Xin	ef690e1f69	Fixed the error message for OutOfMemoryError in DAGScheduler.	2014-05-26 21:31:27 -07:00
Zhen Peng	8d271c90fa	SPARK-1929 DAGScheduler suspended by local task OOM DAGScheduler does not handle local task OOM properly, and will wait for the job result forever. Author: Zhen Peng <zhenpeng01@baidu.com> Closes #883 from zhpengg/bugfix-dag-scheduler-oom and squashes the following commits: 76f7eda [Zhen Peng] remove redundant memory allocations aa63161 [Zhen Peng] SPARK-1929 DAGScheduler suspended by local task OOM	2014-05-26 21:30:25 -07:00
Patrick Wendell	b6d22af040	HOTFIX: Add no-arg SparkContext constructor in Java Self explanatory. Author: Patrick Wendell <pwendell@gmail.com> Closes #878 from pwendell/java-constructor and squashes the following commits: 2cc1605 [Patrick Wendell] HOTFIX: Add no-arg SparkContext constructor in Java	2014-05-25 20:13:32 -07:00
Zhen Peng	4e4831b8fa	[SPARK-1886] check executor id existence when executor exit Author: Zhen Peng <zhenpeng01@baidu.com> Closes #827 from zhpengg/bugfix-executor-id-not-found and squashes the following commits: cd8bb65 [Zhen Peng] bugfix: check executor id existence when executor exit	2014-05-24 20:40:19 -07:00
Andrew Or	5081a0a9d4	[SPARK-1900 / 1918] PySpark on YARN is broken If I run the following on a YARN cluster ``` bin/spark-submit sheep.py --master yarn-client ``` it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file: ``` bin/spark-submit file:/path/to/sheep.py --master yarn-client ``` However, this also fails. This time it is because python does not understand URI schemes. This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it. Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending. Author: Andrew Or <andrewor14@gmail.com> Closes #853 from andrewor14/submit-paths and squashes the following commits: 0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH 323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell 3c36587 [Andrew Or] Improve error messages (minor) 854aa6a [Andrew Or] Guard against NPE if user gives pathological paths 6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in 3bb0359 [Andrew Or] Update more comments (minor) 2a1f8a0 [Andrew Or] Update comments (minor) 6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths a68c4d1 [Andrew Or] Handle Windows python file path correctly 427a250 [Andrew Or] Resolve paths properly for Windows a591a4a [Andrew Or] Update tests for resolving URIs 6c8621c [Andrew Or] Move resolveURIs to Utils db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths f542dce [Andrew Or] Fix outdated tests 691c4ce [Andrew Or] Ignore special primary resource names 5342ac7 [Andrew Or] Add missing space in error message 02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly	2014-05-24 18:01:49 -07:00
Aaron Davidson	f9f5fd5f4e	Fix UISuite unit test that fails under Jenkins contention Due to perhaps zombie processes on Jenkins, it seems that at least 10 Spark ports are in use. It also doesn't matter that the port increases when used, it could in fact go down -- the only part that matters is that it selects a different port rather than failing to bind. Changed test to match this. Thanks to @andrewor14 for helping diagnose this. Author: Aaron Davidson <aaron@databricks.com> Closes #857 from aarondav/tiny and squashes the following commits: c199ec8 [Aaron Davidson] Fix UISuite unit test that fails under Jenkins contention	2014-05-22 15:11:05 -07:00
Xiangrui Meng	dba314029b	[SPARK-1870] Make spark-submit --jars work in yarn-cluster mode. Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0). `spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing! I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet. CC: @dbtsai @sryza Author: Xiangrui Meng <meng@databricks.com> Closes #848 from mengxr/yarn-classpath and squashes the following commits: 23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods a40f6ed [Xiangrui Meng] standalone -> cluster 65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client 11e5354 [Xiangrui Meng] minor changes 3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn	2014-05-22 01:52:50 -07:00
Andrew Or	7c79ef7d43	[Minor] Move JdbcRDDSuite to the correct package It was in the wrong package Author: Andrew Or <andrewor14@gmail.com> Closes #839 from andrewor14/jdbc-suite and squashes the following commits: f948c5a [Andrew Or] cache -> cache() b215279 [Andrew Or] Move JdbcRDDSuite to the correct package	2014-05-21 01:25:10 -07:00
Tathagata Das	52eb54d024	[Spark 1877] ClassNotFoundException when loading RDD with serialized objects Updated version of #821 Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Ghidireac <bogdang@u448a5b0a73d45358d94a.ant.amazon.com> Closes #835 from tdas/SPARK-1877 and squashes the following commits: f346f71 [Tathagata Das] Addressed Patrick's comments. fee0c5d [Ghidireac] SPARK-1877: ClassNotFoundException when loading RDD with serialized objects	2014-05-19 22:36:24 -07:00
Aaron Davidson	b0ce22e071	SPARK-1689: Spark application should die when removed by Master scheduler.error() will mask the error if there are active tasks. Being removed is a cataclysmic event for Spark applications, and should probably be treated as such. Author: Aaron Davidson <aaron@databricks.com> Closes #832 from aarondav/i-love-u and squashes the following commits: 9f1200f [Aaron Davidson] SPARK-1689: Spark application should die when removed by Master	2014-05-19 20:55:26 -07:00
Matei Zaharia	5af99d7617	SPARK-1879. Increase MaxPermSize since some of our builds have many classes See https://issues.apache.org/jira/browse/SPARK-1879 -- builds with Hadoop2 and Hive ran out of PermGen space in spark-shell, when those things added up with the Scala compiler. Note that users can still override it by setting their own Java options with this change. Their options will come later in the command string than the -XX:MaxPermSize=128m. Author: Matei Zaharia <matei@databricks.com> Closes #823 from mateiz/spark-1879 and squashes the following commits: 6bc0ee8 [Matei Zaharia] Increase MaxPermSize to 128m since some of our builds have lots of classes	2014-05-19 18:42:28 -07:00
Matei Zaharia	7b70a70718	[SPARK-1876] Windows fixes to deal with latest distribution layout changes - Look for JARs in the right place - Launch examples the same way as on Unix - Load datanucleus JARs if they exist - Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs - Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was) Author: Matei Zaharia <matei@databricks.com> Closes #819 from mateiz/win-fixes and squashes the following commits: d558f96 [Matei Zaharia] Fix comment 228577b [Matei Zaharia] Review comments d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly 144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout	2014-05-19 15:02:35 -07:00
Patrick Wendell	442808a748	Make deprecation warning less severe Just a small change. I think it's good not to scare people who are using the old options. Author: Patrick Wendell <pwendell@gmail.com> Closes #810 from pwendell/warnings and squashes the following commits: cb8a311 [Patrick Wendell] Make deprecation warning less severe	2014-05-16 22:58:47 -07:00
Andrew Or	4b8ec6fcfd	[SPARK-1808] Route bin/pyspark through Spark submit Problem. For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`. Solution. Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent. Details. `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest. For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case. This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too. Author: Andrew Or <andrewor14@gmail.com> Closes #799 from andrewor14/pyspark-submit and squashes the following commits: bf37e36 [Andrew Or] Minor changes 01066fa [Andrew Or] bin/pyspark for Windows c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes) 1866f85 [Andrew Or] Windows is not cooperating 456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set 7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit b7ba0d8 [Andrew Or] Address a few comments (minor) 06eb138 [Andrew Or] Use shlex instead of writing our own parser 05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly 6fba412 [Andrew Or] Deal with quotes + address various comments fe4c8a7 [Andrew Or] Update --help for bin/pyspark afe47bf [Andrew Or] Fix spark shell f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a371d26 [Andrew Or] Route bin/pyspark through Spark submit	2014-05-16 22:34:38 -07:00
Michael Armbrust	a80a6a139e	SPARK-1864 Look in spark conf instead of system properties when propagating configuration to executors. Author: Michael Armbrust <michael@databricks.com> Closes #808 from marmbrus/confClasspath and squashes the following commits: 4c31d57 [Michael Armbrust] Look in spark conf instead of system properties when propagating configuration to executors.	2014-05-16 20:25:10 -07:00
Aaron Davidson	bb98ecafce	SPARK-1860: Do not cleanup application work/ directories by default This causes an unrecoverable error for applications that are running for longer than 7 days that have jars added to the SparkContext, as the jars are cleaned up even though the application is still running. Author: Aaron Davidson <aaron@databricks.com> Closes #800 from aarondav/shitty-defaults and squashes the following commits: a573fbb [Aaron Davidson] SPARK-1860: Do not cleanup application work/ directories by default	2014-05-15 21:37:58 -07:00
Huajian Mao	94c5139607	Typos in Spark Author: Huajian Mao <huajianmao@gmail.com> Closes #798 from huajianmao/patch-1 and squashes the following commits: 208a454 [Huajian Mao] A typo in Task 1b515af [Huajian Mao] A typo in the message	2014-05-15 18:20:16 -07:00
Prashant Sharma	46324279da	Package docs This is a few changes based on the original patch by @scrapcodes. Author: Prashant Sharma <prashant.s@imaginea.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #785 from pwendell/package-docs and squashes the following commits: c32b731 [Patrick Wendell] Changes based on Prashant's patch c0463d3 [Prashant Sharma] added eof new line ce8bf73 [Prashant Sharma] Added eof new line to all files. 4c35f2e [Prashant Sharma] SPARK-1563 Add package-info.java and package.scala files for all packages that appear in docs	2014-05-14 22:24:41 -07:00
Patrick Wendell	21570b4633	Documentation: Encourage use of reduceByKey instead of groupByKey. Author: Patrick Wendell <pwendell@gmail.com> Closes #784 from pwendell/group-by-key and squashes the following commits: 9b4505f [Patrick Wendell] Small fix 6347924 [Patrick Wendell] Documentation: Encourage use of reduceByKey instead of groupByKey.	2014-05-14 22:24:04 -07:00
Tathagata Das	ad4e60ee7e	[SPARK-1840] SparkListenerBus prints out scary error message when terminated normally Running SparkPi example gave this error. ``` Pi is roughly 3.14374 14/05/14 18:16:19 ERROR Utils: Uncaught exception in thread SparkListenerBus scala.runtime.NonLocalReturnControl$mcV$sp ``` This is due to the catch-all in the SparkListenerBus, which logged control throwable used by scala system Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #783 from tdas/controlexception-fix and squashes the following commits: a466c8d [Tathagata Das] Ignored control exceptions when logging all exceptions.	2014-05-14 21:13:41 -07:00
andrewor14	9ad096d55a	[Typo] propertes -> properties Author: andrewor14 <andrewor14@gmail.com> Closes #780 from andrewor14/submit-typo and squashes the following commits: e70e057 [andrewor14] propertes -> properties	2014-05-14 17:54:53 -07:00
Jacek Laskowski	601e37198b	String interpolation + some other small changes After having been invited to make the change in `6bee01dd04 (commitcomment-6284165)` by @witgo. Author: Jacek Laskowski <jacek@japila.pl> Closes #748 from jaceklaskowski/sparkenv-string-interpolation and squashes the following commits: be6ebac [Jacek Laskowski] String interpolation + some other small changes	2014-05-14 15:45:52 -07:00
Patrick Wendell	65533c7ec0	SPARK-1833 - Have an empty SparkContext constructor. This is nicer than relying on new SparkContext(new SparkConf()) Author: Patrick Wendell <pwendell@gmail.com> Closes #774 from pwendell/spark-context and squashes the following commits: ef9f12f [Patrick Wendell] SPARK-1833 - Have an empty SparkContext constructor.	2014-05-14 12:53:30 -07:00
Andrew Ash	a3315d7f4c	SPARK-1829 Sub-second durations shouldn't round to "0 s" As "99 ms" up to 99 ms As "0.1 s" from 0.1 s up to 0.9 s https://issues.apache.org/jira/browse/SPARK-1829 Compare the first image to the second here: http://imgur.com/RaLEsSZ,7VTlgfo#0 Author: Andrew Ash <andrew@andrewash.com> Closes #768 from ash211/spark-1829 and squashes the following commits: 1c15b8e [Andrew Ash] SPARK-1829 Format sub-second durations more appropriately	2014-05-14 12:01:14 -07:00
Mark Hamstra	17f3075bc4	[SPARK-1620] Handle uncaught exceptions in function run by Akka scheduler If the intended behavior was that uncaught exceptions thrown in functions being run by the Akka scheduler would end up being handled by the default uncaught exception handler set in Executor, and if that behavior is, in fact, correct, then this is a way to accomplish that. I'm not certain, though, that we shouldn't be doing something different to handle uncaught exceptions from some of these scheduled functions. In any event, this PR covers all of the cases I comment on in [SPARK-1620](https://issues.apache.org/jira/browse/SPARK-1620). Author: Mark Hamstra <markhamstra@gmail.com> Closes #622 from markhamstra/SPARK-1620 and squashes the following commits: 071d193 [Mark Hamstra] refactored post-SPARK-1772 1a6a35e [Mark Hamstra] another style fix d30eb94 [Mark Hamstra] scalastyle 3573ecd [Mark Hamstra] Use wrapped try/catch in Utils.tryOrExit 8fc0439 [Mark Hamstra] Make functions run by the Akka scheduler use Executor's UncaughtExceptionHandler	2014-05-14 10:07:25 -07:00
Andrew Or	69f750228f	[SPARK-1769] Executor loss causes NPE race condition This PR replaces the Schedulable data structures in Pool.scala with thread-safe ones from java. Note that Scala's `with SynchronizedBuffer` trait is soon to be deprecated in 2.11 because it is ["inherently unreliable"](http://www.scala-lang.org/api/2.11.0/index.html#scala.collection.mutable.SynchronizedBuffer). We should slowly drift away from `SynchronizedBuffer` in other places too. Note that this PR introduces an API-breaking change; `sc.getAllPools` now returns an Array rather than an ArrayBuffer. This is because we want this method to return an immutable copy rather than one may potentially confuse the user if they try to modify the copy, which takes no effect on the original data structure. Author: Andrew Or <andrewor14@gmail.com> Closes #762 from andrewor14/pool-npe and squashes the following commits: 383e739 [Andrew Or] JavaConverters -> JavaConversions 3f32981 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pool-npe 769be19 [Andrew Or] Assorted minor changes 2189247 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pool-npe 05ad9e9 [Andrew Or] Fix test - contains is not the same as containsKey 0921ea0 [Andrew Or] var -> val 07d720c [Andrew Or] Synchronize Schedulable data structures	2014-05-14 00:54:33 -07:00
Koert Kuipers	b22952fa1f	SPARK-1801. expose InterruptibleIterator and TaskKilledException in deve... ...loper api Author: Koert Kuipers <koert@tresata.com> Closes #764 from koertkuipers/feat-rdd-developerapi and squashes the following commits: 8516dd2 [Koert Kuipers] SPARK-1801. expose InterruptibleIterator and TaskKilledException in developer api	2014-05-14 00:12:35 -07:00
Patrick Wendell	7bb9a521f3	Revert "[SPARK-1784] Add a new partitioner to allow specifying # of keys per partition" This reverts commit `92cebada09`.	2014-05-13 23:24:51 -07:00
larvaboy	c33b8dcbf6	Implement ApproximateCountDistinct for SparkSql Add the implementation for ApproximateCountDistinct to SparkSql. We use the HyperLogLog algorithm implemented in stream-lib, and do the count in two phases: 1) counting the number of distinct elements in each partitions, and 2) merge the HyperLogLog results from different partitions. A simple serializer and test cases are added as well. Author: larvaboy <larvaboy@gmail.com> Closes #737 from larvaboy/master and squashes the following commits: bd8ef3f [larvaboy] Add support of user-provided standard deviation to ApproxCountDistinct. 9ba8360 [larvaboy] Fix alignment and null handling issues. 95b4067 [larvaboy] Add a test case for count distinct and approximate count distinct. f57917d [larvaboy] Add the parser for the approximate count. a2d5d10 [larvaboy] Add ApproximateCountDistinct aggregates and functions. 7ad273a [larvaboy] Add SparkSql serializer for HyperLogLog. 1d9aacf [larvaboy] Fix a minor typo in the toString method of the Count case class. 653542b [larvaboy] Fix a couple of minor typos.	2014-05-13 21:26:08 -07:00
Syed Hashmi	92cebada09	[SPARK-1784] Add a new partitioner to allow specifying # of keys per partition This change adds a new partitioner which allows users to specify # of keys per partition. Author: Syed Hashmi <shashmi@cloudera.com> Closes #721 from syedhashmi/master and squashes the following commits: 4ca94cc [Syed Hashmi] [SPARK-1784] Add a new partitioner	2014-05-13 21:24:23 -07:00
Ye Xianjin	753b04dea4	[SPARK-1527] change rootDir.getName to rootDir.getAbsolutePath JIRA issue: [SPARK-1527](https://issues.apache.org/jira/browse/SPARK-1527) getName() only gets the last component of the file path. When deleting test-generated directories, we should pass the generated directory's absolute path to DiskBlockManager. Author: Ye Xianjin <advancedxy@gmail.com> This patch had conflicts when merged, resolved by Committer: Patrick Wendell <pwendell@gmail.com> Closes #436 from advancedxy/SPARK-1527 and squashes the following commits: 4678bab [Ye Xianjin] change rootDir.getname to rootDir.getAbsolutePath so the temporary directories are deleted when the test is finished.	2014-05-13 19:03:51 -07:00
Andrew Or	5c0dafc2c8	[SPARK-1816] LiveListenerBus dies if a listener throws an exception The solution is to wrap a try / catch / log around the posting of each event to each listener. Author: Andrew Or <andrewor14@gmail.com> Closes #759 from andrewor14/listener-die and squashes the following commits: aee5107 [Andrew Or] Merge branch 'master' of github.com:apache/spark into listener-die 370939f [Andrew Or] Remove two layers of indirection 422d278 [Andrew Or] Explicitly throw an exception instead of 1 / 0 0df0e2a [Andrew Or] Try/catch and log exceptions when posting events	2014-05-13 18:32:32 -07:00
William Benton	16ffadcc4a	SPARK-571: forbid return statements in cleaned closures This patch checks top-level closure arguments to `ClosureCleaner.clean` for `return` statements and raises an exception if it finds any. This is mainly a user-friendliness addition, since programs with return statements in closure arguments will currently fail upon RDD actions with a less-than-intuitive error message. Author: William Benton <willb@redhat.com> Closes #717 from willb/spark-571 and squashes the following commits: c41eb7d [William Benton] Another test case for SPARK-571 30c42f4 [William Benton] Stylistic cleanups 559b16b [William Benton] Stylistic cleanups from review de13b79 [William Benton] Style fixes 295b6a5 [William Benton] Forbid return statements in closure arguments. b017c47 [William Benton] Added a test for SPARK-571	2014-05-13 13:45:23 -07:00
Sandy Ryza	2792bd016a	SPARK-1815. SparkContext should not be marked DeveloperApi Author: Sandy Ryza <sandy@cloudera.com> Closes #753 from sryza/sandy-spark-1815 and squashes the following commits: 957a8ac [Sandy Ryza] SPARK-1815. SparkContext should not be marked DeveloperApi	2014-05-12 20:08:30 -07:00
Andrew Or	ba96bb3d59	[SPARK-1780] Non-existent SPARK_DAEMON_OPTS is lurking around What they really mean is SPARK_DAEMON_*JAVA*_OPTS Author: Andrew Or <andrewor14@gmail.com> Closes #751 from andrewor14/spark-daemon-opts and squashes the following commits: 70c41f9 [Andrew Or] SPARK_DAEMON_OPTS -> SPARK_DAEMON_JAVA_OPTS	2014-05-12 19:42:35 -07:00
Andrew Ash	a5150d199c	Typo: resond -> respond Author: Andrew Ash <andrew@andrewash.com> Closes #743 from ash211/patch-4 and squashes the following commits: c959f3b [Andrew Ash] Typo: resond -> respond	2014-05-12 18:46:28 -07:00
Patrick Wendell	925d8b249b	SPARK-1623: Use File objects instead of String's in HTTPBroadcast This seems strictly better, and I think it's justified only the grounds of clean-up. It might also fix issues with path conversions, but I haven't yet isolated any instance of that happening. /cc @srowen @tdas Author: Patrick Wendell <pwendell@gmail.com> Closes #749 from pwendell/broadcast-cleanup and squashes the following commits: d6d54f2 [Patrick Wendell] SPARK-1623: Use File objects instead of string's in HTTPBroadcast	2014-05-12 17:27:28 -07:00
Patrick Wendell	3ce526b168	Rename testExecutorEnvs --> executorEnvs. This was changed, but in fact, it's used for things other than tests. So I've changed it back. Author: Patrick Wendell <pwendell@gmail.com> Closes #747 from pwendell/executor-env and squashes the following commits: 36a60a5 [Patrick Wendell] Rename testExecutorEnvs --> executorEnvs.	2014-05-12 17:09:13 -07:00
Sean Owen	7120a2979d	SPARK-1798. Tests should clean up temp files Three issues related to temp files that tests generate – these should be touched up for hygiene but are not urgent. Modules have a log4j.properties which directs the unit-test.log output file to a directory like `[module]/target/unit-test.log`. But this ends up creating `[module]/[module]/target/unit-test.log` instead of former. The `work/` directory is not deleted by "mvn clean", in the parent and in modules. Neither is the `checkpoint/` directory created under the various external modules. Many tests create a temp directory, which is not usually deleted. This can be largely resolved by calling `deleteOnExit()` at creation and trying to call `Utils.deleteRecursively` consistently to clean up, sometimes in an `@After` method. _If anyone seconds the motion, I can create a more significant change that introduces a new test trait along the lines of `LocalSparkContext`, which provides management of temp directories for subclasses to take advantage of._ Author: Sean Owen <sowen@cloudera.com> Closes #732 from srowen/SPARK-1798 and squashes the following commits: 5af578e [Sean Owen] Try to consistently delete test temp dirs and files, and set deleteOnExit() for each b21b356 [Sean Owen] Remove work/ and checkpoint/ dirs with mvn clean bdd0f41 [Sean Owen] Remove duplicate module dir in log4j.properties output path for tests	2014-05-12 14:16:19 -07:00
Bernardo Gomez Palacio	d9c97ba397	SPARK-1806: Upgrade Mesos dependency to 0.18.1 Enabled Mesos (0.18.1) dependency with shaded protobuf Why is this needed? Avoids any protobuf version collision between Mesos and any other dependency in Spark e.g. Hadoop HDFS 2.2+ or 1.0.4. Ticket: https://issues.apache.org/jira/browse/SPARK-1806 * Should close https://issues.apache.org/jira/browse/SPARK-1433 Author berngp Author: Bernardo Gomez Palacio <bernardo.gomezpalacio@gmail.com> Closes #741 from berngp/feature/SPARK-1806 and squashes the following commits: 5d70646 [Bernardo Gomez Palacio] SPARK-1806: Upgrade Mesos dependency to 0.18.1	2014-05-12 11:10:28 -07:00
Aaron Davidson	3af1f38643	SPARK-1772 Stop catching Throwable, let Executors die The main issue this patch fixes is [SPARK-1772](https://issues.apache.org/jira/browse/SPARK-1772), in which Executors may not die when fatal exceptions (e.g., OOM) are thrown. This patch causes Executors to delegate to the ExecutorUncaughtExceptionHandler when a fatal exception is thrown. This patch also continues the fight in the neverending war against `case t: Throwable =>`, by only catching Exceptions in many places, and adding a wrapper for Threads and Runnables to make sure any uncaught exceptions are at least printed to the logs. It also turns out that it is unlikely that the IndestructibleActorSystem actually works, given testing ([here](https://gist.github.com/aarondav/ca1f0cdcd50727f89c0d)). The uncaughtExceptionHandler is not called from the places that we expected it would be. [SPARK-1620](https://issues.apache.org/jira/browse/SPARK-1620) deals with part of this issue, but refactoring our Actor Systems to ensure that exceptions are dealt with properly is a much bigger change, outside the scope of this PR. Author: Aaron Davidson <aaron@databricks.com> Closes #715 from aarondav/throwable and squashes the following commits: f9b9bfe [Aaron Davidson] Remove other redundant 'throw e' e937a0a [Aaron Davidson] Address Prashant and Matei's comments 1867867 [Aaron Davidson] [RFC] SPARK-1772 Stop catching Throwable, let Executors die	2014-05-12 11:08:52 -07:00
Patrick Wendell	7d9cc9214b	SPARK-1770: Load balance elements when repartitioning. This patch adds better balancing when performing a repartition of an RDD. Previously the elements in the RDD were hash partitioned, meaning if the RDD was skewed certain partitions would end up being very large. This commit adds load balancing of elements across the repartitioned RDD splits. The load balancing is not perfect: a given output partition can have up to N more elements than the average if there are N input partitions. However, some randomization is used to minimize the probabiliy that this happens. Author: Patrick Wendell <pwendell@gmail.com> Closes #727 from pwendell/load-balance and squashes the following commits: f9da752 [Patrick Wendell] Response to Matei's feedback acfa46a [Patrick Wendell] SPARK-1770: Load balance elements when repartitioning.	2014-05-11 17:11:55 -07:00
witgo	6bee01dd04	remove outdated runtime Information scala home Author: witgo <witgo@qq.com> Closes #728 from witgo/scala_home and squashes the following commits: cdfd8be [witgo] Merge branch 'master' of https://github.com/apache/spark into scala_home fac094a [witgo] remove outdated runtime Information scala home	2014-05-11 14:34:27 -07:00
Andrew Or	83e0424d87	[SPARK-1774] Respect SparkSubmit --jars on YARN (client) SparkSubmit ignores `--jars` for YARN client. This is a bug. This PR also automatically adds the application jar to `spark.jar`. Previously, when running as yarn-client, you must specify the jar additionally through `--files` (because `--jars` didn't work). Now you don't have to explicitly specify it through either. Tested on a YARN cluster. Author: Andrew Or <andrewor14@gmail.com> Closes #710 from andrewor14/yarn-jars and squashes the following commits: 35d1928 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars c27bf6c [Andrew Or] For yarn-cluster and python, do not add primaryResource to spark.jar c92c5bf [Andrew Or] Minor cleanups 269f9f3 [Andrew Or] Fix format 013d840 [Andrew Or] Fix tests 1407474 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars 3bb75e8 [Andrew Or] Allow SparkSubmit --jars to take effect in yarn-client mode	2014-05-10 20:58:02 -07:00
Sean Owen	2b7bd29eb6	SPARK-1789. Multiple versions of Netty dependencies cause FlumeStreamSuite failure TL;DR is there is a bit of JAR hell trouble with Netty, that can be mostly resolved and will resolve a test failure. I hit the error described at http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-td1753.html while running FlumeStreamingSuite, and have for a short while (is it just me?) velvia notes: "I have found a workaround. If you add akka 2.2.4 to your dependencies, then everything works, probably because akka 2.2.4 brings in newer version of Jetty." There are at least 3 versions of Netty in play in the build: - the new Flume 1.4.0 dependency brings in io.netty:netty:3.4.0.Final, and that is the immediate problem - the custom version of akka 2.2.3 depends on io.netty:netty:3.6.6. - but, Spark Core directly uses io.netty:netty-all:4.0.17.Final The POMs try to exclude other versions of netty, but are excluding org.jboss.netty:netty, when in fact older versions of io.netty:netty (not netty-all) are also an issue. The org.jboss.netty:netty excludes are largely unnecessary. I replaced many of them with io.netty:netty exclusions until everything agreed on io.netty:netty-all:4.0.17.Final. But this didn't work, since Akka 2.2.3 doesn't work with Netty 4.x. Down-grading to 3.6.6.Final across the board made some Spark code not compile. If the build keeps io.netty:netty:3.6.6.Final as well, everything seems to work. Part of the reason seems to be that Netty 3.x used the old `org.jboss.netty` packages. This is less than ideal, but is no worse than the current situation. So this PR resolves the issue and improves the JAR hell, even if it leaves the existing theoretical Netty 3-vs-4 conflict: - Remove org.jboss.netty excludes where possible, for clarity; they're not needed except with Hadoop artifacts - Add io.netty:netty excludes where needed -- except, let akka keep its io.netty:netty - Change a bit of test code that actually depended on Netty 3.x, to use 4.x equivalent - Update SBT build accordingly A better change would be to update Akka far enough such that it agrees on Netty 4.x, but I don't know if that's feasible. Author: Sean Owen <sowen@cloudera.com> Closes #723 from srowen/SPARK-1789 and squashes the following commits: 43661b7 [Sean Owen] Update and add Netty excludes to prevent some JAR conflicts that cause test issues	2014-05-10 20:50:40 -07:00
Kan Zhang	6c2691d0a0	[SPARK-1690] Tolerating empty elements when saving Python RDD to text files Tolerate empty strings in PythonRDD Author: Kan Zhang <kzhang@apache.org> Closes #644 from kanzhang/SPARK-1690 and squashes the following commits: c62ad33 [Kan Zhang] Adding Python doctest 473ec4b [Kan Zhang] [SPARK-1690] Tolerating empty elements when saving Python RDD to text files	2014-05-10 14:01:08 -07:00
Bouke van der Bijl	3776f2f283	Add Python includes to path before depickling broadcast values This fixes https://issues.apache.org/jira/browse/SPARK-1731 by adding the Python includes to the PYTHONPATH before depickling the broadcast values @airhorns Author: Bouke van der Bijl <boukevanderbijl@gmail.com> Closes #656 from bouk/python-includes-before-broadcast and squashes the following commits: 7b0dfe4 [Bouke van der Bijl] Add Python includes to path before depickling broadcast values	2014-05-10 13:02:13 -07:00
Matei Zaharia	7eefc9d2b3	SPARK-1708. Add a ClassTag on Serializer and things that depend on it This pull request contains a rebased patch from @heathermiller (https://github.com/heathermiller/spark/pull/1) to add ClassTags on Serializer and types that depend on it (Broadcast and AccumulableCollection). Putting these in the public API signatures now will allow us to use Scala Pickling for serialization down the line without breaking binary compatibility. One question remaining is whether we also want them on Accumulator -- Accumulator is passed as part of a bigger Task or TaskResult object via the closure serializer so it doesn't seem super useful to add the ClassTag there. Broadcast and AccumulableCollection in contrast were being serialized directly. CC @rxin, @pwendell, @heathermiller Author: Matei Zaharia <matei@databricks.com> Closes #700 from mateiz/spark-1708 and squashes the following commits: 1a3d8b0 [Matei Zaharia] Use fake ClassTag in Java 3b449ed [Matei Zaharia] test fix 2209a27 [Matei Zaharia] Code style fixes 9d48830 [Matei Zaharia] Add a ClassTag on Serializer and things that depend on it	2014-05-10 12:10:24 -07:00
CodingCat	2f452cbaf3	SPARK-1686: keep schedule() calling in the main thread https://issues.apache.org/jira/browse/SPARK-1686 moved from original JIRA (by @markhamstra): In deploy.master.Master, the completeRecovery method is the last thing to be called when a standalone Master is recovering from failure. It is responsible for resetting some state, relaunching drivers, and eventually resuming its scheduling duties. There are currently four places in Master.scala where completeRecovery is called. Three of them are from within the actor's receive method, and aren't problems. The last starts from within receive when the ElectedLeader message is received, but the actual completeRecovery() call is made from the Akka scheduler. That means that it will execute on a different scheduler thread, and Master itself will end up running (i.e., schedule() ) from that Akka scheduler thread. In this PR, I added a new master message TriggerSchedule to trigger the "local" call of schedule() in the scheduler thread Author: CodingCat <zhunansjtu@gmail.com> Closes #639 from CodingCat/SPARK-1686 and squashes the following commits: 81bb4ca [CodingCat] rename variable 69e0a2a [CodingCat] style fix 36a2ac0 [CodingCat] address Aaron's comments ec9b7bb [CodingCat] address the comments 02b37ca [CodingCat] keep schedule() calling in the main thread	2014-05-09 21:50:23 -07:00
Aaron Davidson	59577df14c	SPARK-1770: Revert accidental(?) fix Looks like this change was accidentally committed here: `06b15baab2` but the change does not show up in the PR itself (#704). Other than not intending to go in with that PR, this also broke the test JavaAPISuite.repartition. Author: Aaron Davidson <aaron@databricks.com> Closes #716 from aarondav/shufflerand and squashes the following commits: b1cf70b [Aaron Davidson] SPARK-1770: Revert accidental(?) fix	2014-05-09 14:51:34 -07:00
Tathagata Das	32868f31f8	Converted bang to ask to avoid scary warning when a block is removed Removing a block through the blockmanager gave a scary warning messages in the driver. ``` 2014-05-08 20:16:19,172 WARN BlockManagerMasterActor: Got unknown message: true 2014-05-08 20:16:19,172 WARN BlockManagerMasterActor: Got unknown message: true 2014-05-08 20:16:19,172 WARN BlockManagerMasterActor: Got unknown message: true ``` This is because the [BlockManagerSlaveActor](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManagerSlaveActor.scala#L44) would send back an acknowledgement ("true"). But the BlockManagerMasterActor would have sent the RemoveBlock message as a send, not as ask(), so would reject the receiver "true" as a unknown message. @pwendell Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #708 from tdas/bm-fix and squashes the following commits: ed4ef15 [Tathagata Das] Converted bang to ask to avoid scary warning when a block is removed.	2014-05-08 22:34:08 -07:00
Patrick Wendell	4c60fd1e8c	MINOR: Removing dead code. Meant to do this when patching up the last merge.	2014-05-08 22:33:06 -07:00
Sandeep	7db47c463f	SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo This was used in the past to have a cache of deserialized ShuffleMapTasks, but that's been removed, so there's no need for a lock. It slows down Spark when task descriptions are large, e.g. due to large lineage graphs or local variables. Author: Sandeep <sandeep@techaddict.me> Closes #707 from techaddict/SPARK-1775 and squashes the following commits: 18d8ebf [Sandeep] SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo This was used in the past to have a cache of deserialized ShuffleMapTasks, but that's been removed, so there's no need for a lock. It slows down Spark when task descriptions are large, e.g. due to large lineage graphs or local variables.	2014-05-08 22:30:17 -07:00
Patrick Wendell	06b15baab2	SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`. Gives a nicely formatted message to the user when `run-example` is run to tell them to use `spark-submit`. Author: Patrick Wendell <pwendell@gmail.com> Closes #704 from pwendell/examples and squashes the following commits: 1996ee8 [Patrick Wendell] Feedback form Andrew 3eb7803 [Patrick Wendell] Suggestions from TD 2474668 [Patrick Wendell] SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`.	2014-05-08 22:26:36 -07:00
Andrew Or	8b78412994	[SPARK-1755] Respect SparkSubmit --name on YARN Right now, SparkSubmit ignores the `--name` flag for both yarn-client and yarn-cluster. This is a bug. In client mode, SparkSubmit treats `--name` as a [cluster config](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L170) and does not propagate this to SparkContext. In cluster mode, SparkSubmit passes this flag to `org.apache.spark.deploy.yarn.Client`, which only uses it for the [YARN ResourceManager](https://github.com/apache/spark/blob/master/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L80), but does not propagate this to SparkContext. This PR ensures that `spark.app.name` is always set if SparkSubmit receives the `--name` flag, which is what the usage promises. This makes it possible for applications to start a SparkContext with an empty conf `val sc = new SparkContext(new SparkConf)`, and inherit the app name from SparkSubmit. Tested both modes on a YARN cluster. Author: Andrew Or <andrewor14@gmail.com> Closes #699 from andrewor14/yarn-app-name and squashes the following commits: 98f6a79 [Andrew Or] Fix tests dea932f [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-app-name c86d9ca [Andrew Or] Respect SparkSubmit --name on YARN	2014-05-08 20:45:29 -07:00
Andrew Or	c3f8b78c21	[SPARK-1745] Move interrupted flag from TaskContext constructor (minor) It makes little sense to start a TaskContext that is interrupted. Indeed, I searched for all use cases of it and didn't find a single instance in which `interrupted` is true on construction. This was inspired by reviewing #640, which adds an additional `@volatile var completed` that is similar. These are not the most urgent changes, but I wanted to push them out before I forget. Author: Andrew Or <andrewor14@gmail.com> Closes #675 from andrewor14/task-context and squashes the following commits: 9575e02 [Andrew Or] Add space 69455d1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into task-context c471490 [Andrew Or] Oops, removed one flag too many. Adding it back. 85311f8 [Andrew Or] Move interrupted flag from TaskContext constructor	2014-05-08 12:13:07 -07:00
Prashant Sharma	44dd57fb66	SPARK-1565, update examples to be used with spark-submit script. Commit for initial feedback, basically I am curious if we should prompt user for providing args esp. when its mandatory. And can we skip if they are not ? Also few other things that did not work like `bin/spark-submit examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop1.0.4.jar --class org.apache.spark.examples.SparkALS --arg 100 500 10 5 2` Not all the args get passed properly, may be I have messed up something will try to sort it out hopefully. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #552 from ScrapCodes/SPARK-1565/update-examples and squashes the following commits: 669dd23 [Prashant Sharma] Review comments 2727e70 [Prashant Sharma] SPARK-1565, update examples to be used with spark-submit script.	2014-05-08 10:23:05 -07:00
Andrew Or	5200872243	[SPARK-1688] Propagate PySpark worker stderr to driver When at least one of the following conditions is true, PySpark cannot be loaded: 1. PYTHONPATH is not set 2. PYTHONPATH does not contain the python directory (or jar, in the case of YARN) 3. The jar does not contain pyspark files (YARN) 4. The jar does not contain py4j files (YARN) However, we currently throw the same random `java.io.EOFException` for all of the above cases, when trying to read from the python daemon's output. This message is super unhelpful. This PR includes the python stderr and the PYTHONPATH in the exception propagated to the driver. Now, the exception message looks something like: ``` Error from python worker: : No module named pyspark PYTHONPATH was: /path/to/spark/python:/path/to/some/jar java.io.EOFException <stack trace> ``` whereas before it was just ``` java.io.EOFException <stack trace> ``` Author: Andrew Or <andrewor14@gmail.com> Closes #603 from andrewor14/pyspark-exception and squashes the following commits: 10d65d3 [Andrew Or] Throwable -> Exception, worker -> daemon 862d1d7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception a5ed798 [Andrew Or] Use block string and interpolation instead of var (minor) cc09c45 [Andrew Or] Account for the fact that the python daemon may not have terminated yet 444f019 [Andrew Or] Use the new RedirectThread + include system PYTHONPATH aab00ae [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception 0cc2402 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception 783efe2 [Andrew Or] Make python daemon stderr indentation consistent 9524172 [Andrew Or] Avoid potential NPE / error stream contention + Move things around 29f9688 [Andrew Or] Add back original exception type e92d36b [Andrew Or] Include python worker stderr in the exception propagated to the driver 7c69360 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception cdbc185 [Andrew Or] Fix python attribute not found exception when PYTHONPATH is not set dcc0353 [Andrew Or] Check both python and system environment variables for PYTHONPATH 6c09c21 [Andrew Or] Validate PYTHONPATH and PySpark modules before starting python workers	2014-05-07 14:35:22 -07:00
Andrew Ash	7f6f4a1035	Nicer logging for SecurityManager startup Happy to open a jira ticket if you'd like to track one there. Author: Andrew Ash <andrew@andrewash.com> Closes #678 from ash211/SecurityManagerLogging and squashes the following commits: 2aa0b7a [Andrew Ash] Nicer logging for SecurityManager startup	2014-05-07 17:24:12 -04:00
Aaron Davidson	3308722ca0	SPARK-1579: Clean up PythonRDD and avoid swallowing IOExceptions This patch includes several cleanups to PythonRDD, focused around fixing [SPARK-1579](https://issues.apache.org/jira/browse/SPARK-1579) cleanly. Listed in order of approximate importance: - The Python daemon waits for Spark to close the socket before exiting, in order to avoid causing spurious IOExceptions in Spark's `PythonRDD::WriterThread`. - Removes the Python Monitor Thread, which polled for task cancellations in order to kill the Python worker. Instead, we do this in the onCompleteCallback, since this is guaranteed to be called during cancellation. - Adds a "completed" variable to TaskContext to avoid the issue noted in [SPARK-1019](https://issues.apache.org/jira/browse/SPARK-1019), where onCompleteCallbacks may be execution-order dependent. Along with this, I removed the "context.interrupted = true" flag in the onCompleteCallback. - Extracts PythonRDD::WriterThread to its own class. Since this patch provides an alternative solution to [SPARK-1019](https://issues.apache.org/jira/browse/SPARK-1019), I did test it with ``` sc.textFile("latlon.tsv").take(5) ``` many times without error. Additionally, in order to test the unswallowed exceptions, I performed ``` sc.textFile("s3n://<big file>").count() ``` and cut my internet during execution. Prior to this patch, we got the "stdin writer exited early" message, which was unhelpful. Now, we get the SocketExceptions propagated through Spark to the user and get proper (though unsuccessful) task retries. Author: Aaron Davidson <aaron@databricks.com> Closes #640 from aarondav/pyspark-io and squashes the following commits: b391ff8 [Aaron Davidson] Detect "clean socket shutdowns" and stop waiting on the socket c0c49da [Aaron Davidson] SPARK-1579: Clean up PythonRDD and avoid swallowing IOExceptions	2014-05-07 09:48:31 -07:00
Kan Zhang	967635a242	[SPARK-1460] Returning SchemaRDD instead of normal RDD on Set operations... ... that do not change schema Author: Kan Zhang <kzhang@apache.org> Closes #448 from kanzhang/SPARK-1460 and squashes the following commits: 111e388 [Kan Zhang] silence MiMa errors in EdgeRDD and VertexRDD 91dc787 [Kan Zhang] Taking into account newly added Ordering param 79ed52a [Kan Zhang] [SPARK-1460] Returning SchemaRDD on Set operations that do not change schema	2014-05-07 09:41:31 -07:00
Patrick Wendell	913a0a9c0a	SPARK-1746: Support setting SPARK_JAVA_OPTS on executors for backwards compatibility Author: Patrick Wendell <pwendell@gmail.com> Closes #676 from pwendell/worker-opts and squashes the following commits: 54456c4 [Patrick Wendell] SPARK-1746: Support setting SPARK_JAVA_OPTS on executors for backwards compatibility	2014-05-07 00:11:05 -07:00
Matei Zaharia	951a5d9398	[SPARK-1549] Add Python support to spark-submit This PR updates spark-submit to allow submitting Python scripts (currently only with deploy-mode=client, but that's all that was supported before) and updates the PySpark code to properly find various paths, etc. One significant change is that we assume we can always find the Python files either from the Spark assembly JAR (which will happen with the Maven assembly build in make-distribution.sh) or from SPARK_HOME (which will exist in local mode even if you use sbt assembly, and should be enough for testing). This means we no longer need a weird hack to modify the environment for YARN. This patch also updates the Python worker manager to run python with -u, which means unbuffered output (send it to our logs right away instead of waiting a while after stuff was written); this should simplify debugging. In addition, it fixes https://issues.apache.org/jira/browse/SPARK-1709, setting the main class from a JAR's Main-Class attribute if not specified by the user, and fixes a few help strings and style issues in spark-submit. In the future we may want to make the `pyspark` shell use spark-submit as well, but it seems unnecessary for 1.0. Author: Matei Zaharia <matei@databricks.com> Closes #664 from mateiz/py-submit and squashes the following commits: 15e9669 [Matei Zaharia] Fix some uses of path.separator property 051278c [Matei Zaharia] Small style fixes 0afe886 [Matei Zaharia] Add license headers 4650412 [Matei Zaharia] Add pyFiles to PYTHONPATH in executors, remove old YARN stuff, add tests 15f8e1e [Matei Zaharia] Set PYTHONPATH in PythonWorkerFactory in case it wasn't set from outside 47c0655 [Matei Zaharia] More work to make spark-submit work with Python: d4375bd [Matei Zaharia] Clean up description of spark-submit args a bit and add Python ones	2014-05-06 15:12:35 -07:00
witgo	ec09acdd4a	SPARK-1734: spark-submit throws an exception: Exception in thread "main"... ... java.lang.ClassNotFoundException: org.apache.spark.broadcast.TorrentBroadcastFactory Author: witgo <witgo@qq.com> Closes #665 from witgo/SPARK-1734 and squashes the following commits: cacf238 [witgo] SPARK-1734: spark-submit throws an exception: Exception in thread "main" java.lang.ClassNotFoundException: org.apache.spark.broadcast.TorrentBroadcastFactory	2014-05-06 14:17:39 -07:00
Mark Hamstra	fbfe69de69	[SPARK-1685] Cancel retryTimer on restart of Worker or AppClient See https://issues.apache.org/jira/browse/SPARK-1685 for a more complete description, but in essence: If the Worker or AppClient actor restarts before successfully registering with Master, multiple retryTimers will be running, which will lead to less than the full number of registration retries being attempted before the new actor is forced to give up. Author: Mark Hamstra <markhamstra@gmail.com> Closes #602 from markhamstra/SPARK-1685 and squashes the following commits: 11cc088 [Mark Hamstra] retryTimer -> registrationRetryTimer 69c348c [Mark Hamstra] Cancel retryTimer on restart of Worker or AppClient	2014-05-06 12:53:39 -07:00
ArcherShao	0a5a468114	Update OpenHashSet.scala Modify wrong comment of function addWithoutResize. Author: ArcherShao <ArcherShao@users.noreply.github.com> Closes #667 from ArcherShao/patch-3 and squashes the following commits: a607358 [ArcherShao] Update OpenHashSet.scala	2014-05-06 10:12:59 -07:00
Andrew Or	ea10b31261	Expose SparkListeners and relevant classes as DeveloperApi Hopefully this can go into 1.0, as a few people on the user list have asked for this. Author: Andrew Or <andrewor14@gmail.com> Closes #648 from andrewor14/expose-listeners and squashes the following commits: e45e1ef [Andrew Or] Add missing colons (minor) 350d643 [Andrew Or] Expose SparkListeners and relevant classes as DeveloperApi	2014-05-05 18:32:14 -07:00
Sandy Ryza	8e724dcbad	SPARK-1728. JavaRDDLike.mapPartitionsWithIndex requires ClassTag Author: Sandy Ryza <sandy@cloudera.com> Closes #657 from sryza/sandy-spark-1728 and squashes the following commits: 4751443 [Sandy Ryza] SPARK-1728. JavaRDDLike.mapPartitionsWithIndex requires ClassTag	2014-05-05 18:26:34 -07:00
Bouke van der Bijl	3292e2a71b	SPARK-1721: Reset the thread classLoader in the Mesos Executor This is because Mesos calls it with a different environment or something, the result is that the Spark jar is missing and it can't load classes. This fixes http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html I have no idea whether this is the right fix, I can only confirm that it fixes the issue for us. The `registered` method is called from mesos (`765ff9bc2a/src/java/jni/org_apache_mesos_MesosExecutorDriver.cpp`) I am unsure which commit caused this regression Author: Bouke van der Bijl <boukevanderbijl@gmail.com> Closes #620 from bouk/mesos-classloader-fix and squashes the following commits: c13eae0 [Bouke van der Bijl] Use getContextOrSparkClassLoader in SparkEnv and CompressionCodec	2014-05-05 11:19:36 -07:00
Sandeep	b48a55ae9f	SPARK-1710: spark-submit should print better errors than "InvocationTargetException" Catching the InvocationTargetException, printing getTargetException. Author: Sandeep <sandeep@techaddict.me> Closes #630 from techaddict/SPARK-1710 and squashes the following commits: 834d79b [Sandeep] changes from srowen suggestions 109d604 [Sandeep] SPARK-1710: spark-submit should print better errors than "InvocationTargetException"	2014-05-04 20:51:53 -07:00
Sean Owen	f5041579ff	SPARK-1629. Addendum: Depend on commons lang3 (already used by tachyon) as it's used in ReplSuite, and return to use lang3 utility in Utils.scala For consideration. This was proposed in related discussion: https://github.com/apache/spark/pull/569 Author: Sean Owen <sowen@cloudera.com> Closes #635 from srowen/SPARK-1629.2 and squashes the following commits: a442b98 [Sean Owen] Depend on commons lang3 (already used by tachyon) as it's used in ReplSuite, and return to use lang3 utility in Utils.scala	2014-05-04 17:43:35 -07:00
Aaron Davidson	34719ba32e	SPARK-1689 AppClient should indicate app is dead() when removed Previously, we indicated disconnected(), which keeps the application in a limbo state where it has no executors but thinks it will get them soon. This is a bug fix that hopefully can be included in 1.0. Author: Aaron Davidson <aaron@databricks.com> Closes #605 from aarondav/appremoved and squashes the following commits: bea02a2 [Aaron Davidson] SPARK-1689 AppClient should indicate app is dead() when removed	2014-05-03 13:27:10 -07:00
Cheng Lian	ce72c72aec	[Bugfix] Tachyon file cleanup logical error Should lookup `shutdownDeleteTachyonPaths` instead of `shutdownDeletePaths`. Together with a minor style clean up: `find {...}.isDefined` to `exists {...}`. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #575 from liancheng/tachyonFix and squashes the following commits: deb8f31 [Cheng Lian] Fixed logical error in when cleanup Tachyon files and minor style cleanup	2014-05-03 13:23:52 -07:00
Thomas Graves	3d0a02dff3	[WIP] SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems. Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user. Note this hasn't been fully tested yet. Need to test in standalone mode. Putting this up for people to look at and possibly test. I don't have access to a mesos cluster. This is alternative to https://github.com/apache/spark/pull/607 Author: Thomas Graves <tgraves@apache.org> Closes #621 from tgravescs/SPARK-1676 and squashes the following commits: 244d55a [Thomas Graves] fix line length 44163d4 [Thomas Graves] Rework 9398853 [Thomas Graves] change to have doAs in executor higher up.	2014-05-03 10:59:05 -07:00
Aaron Davidson	0a14421765	SPARK-1700: Close socket file descriptors on task completion This will ensure that sockets do not build up over the course of a job, and that cancellation successfully cleans up sockets. Tested in standalone mode. More file descriptors spawn than expected (around 1000ish rather than the expected 8ish) but they do not pile up between runs, or as high as before (where they went up to around 5k). Author: Aaron Davidson <aaron@databricks.com> Closes #623 from aarondav/pyspark2 and squashes the following commits: 0ca13bb [Aaron Davidson] SPARK-1700: Close socket file descriptors on task completion	2014-05-02 23:55:13 -07:00
wangfei	4bf24f7897	delete no use var Author: wangfei <wangfei_hello@126.com> Closes #613 from scwf/masterIndex and squashes the following commits: 1463056 [wangfei] delete no use var: masterIndex	2014-05-02 21:34:54 -07:00
Andrew Or	394d8cb1c4	Add tests for FileLogger, EventLoggingListener, and ReplayListenerBus Modifications to Spark core are limited to exposing functionality to test files + minor style fixes. (728 / 769 lines are from tests) Author: Andrew Or <andrewor14@gmail.com> Closes #591 from andrewor14/event-log-tests and squashes the following commits: 2883837 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-tests c3afcea [Andrew Or] Compromise 2d5daf8 [Andrew Or] Use temp directory provided by the OS rather than /tmp 2b52151 [Andrew Or] Remove unnecessary file delete + add a comment 62010fd [Andrew Or] More cleanup (renaming variables, updating comments etc) ad2beff [Andrew Or] Clean up EventLoggingListenerSuite + modify a few comments 862e752 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-tests e0ba2f8 [Andrew Or] Fix test failures caused by race condition in processing/mutating events b990453 [Andrew Or] ReplayListenerBus suite - tests do not all pass yet ab66a84 [Andrew Or] Tests for FileLogger + delete file after tests 187bb25 [Andrew Or] Formatting and renaming variables 769336f [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-tests 5d38ffe [Andrew Or] Clean up EventLoggingListenerSuite + add comments e12f4b1 [Andrew Or] Preliminary tests for EventLoggingListener (need major cleanup)	2014-05-01 21:42:06 -07:00
witgo	40cf6d3101	SPARK-1659: improvements spark-submit usage Author: witgo <witgo@qq.com> Closes #581 from witgo/SPARK-1659 and squashes the following commits: 0b2cf98 [witgo] Delete spark-submit obsolete usage: "--arg ARG"	2014-05-01 21:39:40 -07:00
wangfei	55c760ff9b	fix the spelling mistake Author: wangfei <wangfei_hello@126.com> Closes #614 from scwf/pxcw and squashes the following commits: d1016ba [wangfei] fix spelling mistake	2014-05-01 21:37:22 -07:00
witgo	55100daa65	Fix SPARK-1629: Spark should inline use of commons-lang `SystemUtils.IS_... ...OS_WINDOWS` Author: witgo <witgo@qq.com> Closes #569 from witgo/SPARK-1629 and squashes the following commits: 31520eb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1629 fcaafd7 [witgo] merge mastet 49e248e [witgo] Fix SPARK-1629: Spark should inline use of commons-lang `SystemUtils.IS_OS_WINDOWS`	2014-04-30 09:49:45 -07:00
Sandy Ryza	ff5be9a41e	SPARK-1004. PySpark on YARN This reopens https://github.com/apache/incubator-spark/pull/640 against the new repo Author: Sandy Ryza <sandy@cloudera.com> Closes #30 from sryza/sandy-spark-1004 and squashes the following commits: 89889d4 [Sandy Ryza] Move unzipping py4j to the generate-resources phase so that it gets included in the jar the first time 5165a02 [Sandy Ryza] Fix docs fd0df79 [Sandy Ryza] PySpark on YARN	2014-04-29 23:24:34 -07:00
WangTao	7025dda8fa	Handle the vals that never used In XORShiftRandom.scala, use val "million" instead of constant "1e6.toInt". Delete vals that never used in other files. Author: WangTao <barneystinson@aliyun.com> Closes #565 from WangTaoTheTonic/master and squashes the following commits: 17cacfc [WangTao] Handle the unused assignment, method parameters and symbol inspected by Intellij IDEA 37b4090 [WangTao] Handle the vals that never used	2014-04-29 22:07:20 -07:00
Chen Chao	b3d2ab6b35	Args for worker rather than master Args for worker rather than master Author: Chen Chao <crazyjvm@gmail.com> Closes #587 from CrazyJvm/patch-6 and squashes the following commits: b54b89f [Chen Chao] Args for worker rather than master	2014-04-29 22:05:40 -07:00
witgo	7d15058410	SPARK-1509: add zipWithIndex zipWithUniqueId methods to java api Author: witgo <witgo@qq.com> Closes #423 from witgo/zipWithIndex and squashes the following commits: 039ec04 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 24d74c9 [witgo] review commit 763a5e4 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 59747d1 [witgo] review commit 7bf4d06 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex daa8f84 [witgo] review commit 4070613 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 18e6c97 [witgo] java api zipWithIndex test 11e2e7f [witgo] add zipWithIndex zipWithUniqueId methods to java api	2014-04-29 11:30:47 -07:00
Thomas Graves	8db0f7e28f	SPARK-1557 Set permissions on event log files/directories This adds minimal setting of event log directory/files permissions. To have a secure environment the user must manually create the top level event log directory and set permissions up. We can add logic to do that automatically later if we want. Author: Thomas Graves <tgraves@apache.org> Closes #538 from tgravescs/SPARK-1557 and squashes the following commits: e471d8e [Thomas Graves] rework d8b6620 [Thomas Graves] update use of octal 3ca9b79 [Thomas Graves] Updated based on comments 5a09709 [Thomas Graves] add in missing import 3150ed6 [Thomas Graves] SPARK-1557 Set permissions on event log files/directories	2014-04-29 09:19:48 -05:00
Patrick Wendell	9f7a095184	SPARK-1652: Remove incorrect deprecation warning in spark-submit This is a straightforward fix. Author: Patrick Wendell <pwendell@gmail.com> This patch had conflicts when merged, resolved by Committer: Patrick Wendell <pwendell@gmail.com> Closes #578 from pwendell/spark-submit-yarn and squashes the following commits: 96027c7 [Patrick Wendell] Test fixes b5be173 [Patrick Wendell] Review feedback 4ac9cac [Patrick Wendell] SPARK-1652: spark-submit for yarn prints warnings even though calling as expected	2014-04-28 18:14:59 -07:00
Patrick Wendell	949e393101	SPARK-1654 and SPARK-1653: Fixes in spark-submit. Deals with two issues: 1. Spark shell didn't correctly pass quoted arguments to spark-submit. ```./bin/spark-shell --driver-java-options "-Dfoo=f -Dbar=b"``` 2. Spark submit used deprecated environment variables (SPARK_CLASSPATH) which triggered warnings. Now we use new, more narrowly scoped, variables. Author: Patrick Wendell <pwendell@gmail.com> Closes #576 from pwendell/spark-submit and squashes the following commits: 67004c9 [Patrick Wendell] SPARK-1654 and SPARK-1653: Fixes in spark-submit.	2014-04-28 17:29:22 -07:00
Patrick Wendell	cae054aaf4	SPARK-1652: Spark submit should fail gracefully if YARN not enabled Author: Patrick Wendell <pwendell@gmail.com> Closes #579 from pwendell/spark-submit-yarn-2 and squashes the following commits: 05e1b11 [Patrick Wendell] Small fix d2a40ad [Patrick Wendell] SPARK-1652: Spark submit should fail gracefully if YARN support not enabled	2014-04-28 17:26:57 -07:00
witgo	71f4d2612a	Fix SPARK-1609: Executor fails to start when Command.extraJavaOptions contains multiple Java options Author: witgo <witgo@qq.com> Closes #547 from witgo/SPARK-1609 and squashes the following commits: deb6a4c [witgo] review commit 91da0bb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609 0640852 [witgo] review commit 8f90b22 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609 bcf36cb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609 1185605 [witgo] fix extraJavaOptions split f7c0ab7 [witgo] bugfix 86fc4bb [witgo] bugfix 8a265b7 [witgo] Fix SPARK-1609: Executor fails to start when use spark-submit	2014-04-27 19:41:02 -07:00
Patrick Wendell	6b3c6e5dd8	SPARK-1145: Memory mapping with many small blocks can cause JVM allocation failures This includes some minor code clean-up as well. The main change is that small files are not memory mapped. There is a nicer way to write that code block using Scala's `Try` but to make it easy to back port and as simple as possible, I opted for the more explicit but less pretty format. Author: Patrick Wendell <pwendell@gmail.com> Closes #43 from pwendell/block-iter-logging and squashes the following commits: 1cff512 [Patrick Wendell] Small issue from merge. 49f6c269 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into block-iter-logging 4943351 [Patrick Wendell] Added a test and feedback on mateis review a637a18 [Patrick Wendell] Review feedback and adding rewind() when reading byte buffers. b76b95f [Patrick Wendell] Review feedback 4e1514e [Patrick Wendell] Don't memory map for small files d238b88 [Patrick Wendell] Some logging and clean-up	2014-04-27 17:40:56 -07:00

1 2 3 4 5 ...

3428 commits