ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Sean Owen	2a28bc6100	SPARK-785 [CORE] ClosureCleaner not invoked on most PairRDDFunctions This looked like perhaps a simple and important one. `combineByKey` looks like it should clean its arguments' closures, and that in turn covers apparently all remaining functions in `PairRDDFunctions` which delegate to it. Author: Sean Owen <sowen@cloudera.com> Closes #3690 from srowen/SPARK-785 and squashes the following commits: 8df68fe [Sean Owen] Clean context of most remaining functions in PairRDDFunctions, which ultimately call combineByKey	2014-12-15 16:06:15 -08:00
Ryan Williams	8176b7a02e	[SPARK-4668] Fix some documentation typos. Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #3523 from ryan-williams/tweaks and squashes the following commits: d2eddaa [Ryan Williams] code review feedback ce27fc1 [Ryan Williams] CoGroupedRDD comment nit c6cfad9 [Ryan Williams] remove unnecessary if statement b74ea35 [Ryan Williams] comment fix b0221f0 [Ryan Williams] fix a gendered pronoun c71ffed [Ryan Williams] use names on a few boolean parameters 89954aa [Ryan Williams] clarify some comments in {Security,Shuffle}Manager e465dac [Ryan Williams] Saved building-spark.md with Dillinger.io 83e8358 [Ryan Williams] fix pom.xml typo dc4662b [Ryan Williams] typo fixes in tuning.md, configuration.md	2014-12-15 14:52:17 -08:00
Ilya Ganelin	38703bbca8	[SPARK-1037] The name of findTaskFromList & findTask in TaskSetManager.scala is confusing Hi all - I've renamed the methods referenced in this JIRA to clarify that they modify the provided arrays (find vs. deque). Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #3665 from ilganeli/SPARK-1037B and squashes the following commits: 64c177c [Ilya Ganelin] Renamed deque to dequeue f27d85e [Ilya Ganelin] Renamed private methods to clarify that they modify the provided parameters 683482a [Ilya Ganelin] Renamed private methods to clarify that they modify the provided parameters	2014-12-15 14:51:15 -08:00
Josh Rosen	f6b8591a08	[SPARK-4826] Fix generation of temp file names in WAL tests This PR should fix SPARK-4826, an issue where a bug in how we generate temp. file names was causing spurious test failures in the write ahead log suites. Closes #3695. Closes #3701. Author: Josh Rosen <joshrosen@databricks.com> Closes #3704 from JoshRosen/SPARK-4826 and squashes the following commits: f2307f5 [Josh Rosen] Use Spark Utils class for directory creation/deletion a693ddb [Josh Rosen] remove unused Random import b275e41 [Josh Rosen] Move creation of temp. dir to beforeEach/afterEach. 9362919 [Josh Rosen] [SPARK-4826] Fix bug in generation of temp file names. in WAL suites. 86c1944 [Josh Rosen] Revert "HOTFIX: Disabling failing block manager test"	2014-12-15 14:33:43 -08:00
Yuu ISHIKAWA	8098fab06c	[SPARK-4494][mllib] IDFModel.transform() add support for single vector I improved `IDFModel.transform` to allow using a single vector. [[SPARK-4494] IDFModel.transform() add support for single vector - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-4494) Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #3603 from yu-iskw/idf and squashes the following commits: 256ff3d [Yuu ISHIKAWA] Fix typo a3bf566 [Yuu ISHIKAWA] - Fix typo - Optimize import order - Aggregate the assertion tests - Modify `IDFModel.transform` API for pyspark d25e49b [Yuu ISHIKAWA] Add the implementation of `IDFModel.transform` for a term frequency vector	2014-12-15 13:44:15 -08:00
Patrick Wendell	4c0673879b	HOTFIX: Disabling failing block manager test	2014-12-15 10:54:45 -08:00
Peter Klipfel	2a2983f7c5	fixed spelling errors in documentation changed "form" to "from" in 3 documentation entries for Kafka integration Author: Peter Klipfel <peter@klipfel.me> Closes #3691 from peterklipfel/master and squashes the following commits: 0fe7fc5 [Peter Klipfel] fixed spelling errors in documentation	2014-12-14 00:01:16 -08:00
Patrick Wendell	ef84dab8c6	MAINTENANCE: Automated closing of pull requests. This commit exists to close the following pull requests on Github: Closes #3488 (close requested by 'pwendell') Closes #2939 (close requested by 'marmbrus') Closes #3173 (close requested by 'marmbrus')	2014-12-11 23:38:40 -08:00
Daoyuan Wang	41a3f93438	[SPARK-4829] [SQL] add rule to fold count(expr) if expr is not null Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3676 from adrian-wang/countexpr and squashes the following commits: dc5765b [Daoyuan Wang] add rule to fold count(expr) if expr is not null	2014-12-11 22:56:42 -08:00
Sasaki Toru	8091dd62ea	[SPARK-4742][SQL] The name of Parquet File generated by AppendingParquetOutputFormat should be zero padded When I use Parquet File as a output file using ParquetOutputFormat#getDefaultWorkFile, the file name is not zero padded while RDD#saveAsText does zero padding. Author: Sasaki Toru <sasakitoa@nttdata.co.jp> Closes #3602 from sasakitoa/parquet-zeroPadding and squashes the following commits: 6b0e58f [Sasaki Toru] Merge branch 'master' of git://github.com/apache/spark into parquet-zeroPadding 20dc79d [Sasaki Toru] Fixed the name of Parquet File generated by AppendingParquetOutputFormat	2014-12-11 22:54:21 -08:00
Cheng Hao	0abbff2862	[SPARK-4825] [SQL] CTAS fails to resolve when created using saveAsTable Fix bug when query like: ``` test("save join to table") { val testData = sparkContext.parallelize(1 to 10).map(i => TestData(i, i.toString)) sql("CREATE TABLE test1 (key INT, value STRING)") testData.insertInto("test1") sql("CREATE TABLE test2 (key INT, value STRING)") testData.insertInto("test2") testData.insertInto("test2") sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").saveAsTable("test") checkAnswer( table("test"), sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").collect().toSeq) } ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #3673 from chenghao-intel/spark_4825 and squashes the following commits: e8cbd56 [Cheng Hao] alternate the pattern matching order for logical plan:CTAS e004895 [Cheng Hao] fix bug	2014-12-11 22:51:49 -08:00
Daoyuan Wang	cbb634ae69	[SQL] enable empty aggr test case This is fixed by SPARK-4318 #3184 Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3445 from adrian-wang/emptyaggr and squashes the following commits: 982575e [Daoyuan Wang] enable empty aggr test case	2014-12-11 22:50:18 -08:00
Daoyuan Wang	acb3be6bc5	[SPARK-4828] [SQL] sum and avg on empty table should always return null So the optimizations are not valid. Also I think the optimization here is rarely encounter, so removing them will not have influence on performance. Can we merge #3445 before I add a comparison test case from this? Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3675 from adrian-wang/sumempty and squashes the following commits: 42df763 [Daoyuan Wang] sum and avg on empty table should always return null	2014-12-11 22:49:27 -08:00
scwf	d8cf678589	[SQL] Remove unnecessary case in HiveContext.toHiveString a follow up of #3547 /cc marmbrus Author: scwf <wangfei1@huawei.com> Closes #3563 from scwf/rnc and squashes the following commits: 9395661 [scwf] remove unnecessary condition	2014-12-11 22:48:03 -08:00
Takuya UESHIN	334480362b	[SPARK-4293][SQL] Make Cast be able to handle complex types. Inserting data of type including `ArrayType.containsNull == false` or `MapType.valueContainsNull == false` or `StructType.fields.exists(_.nullable == false)` into Hive table will fail because `Cast` inserted by `HiveMetastoreCatalog.PreInsertionCasts` rule of `Analyzer` can't handle these types correctly. Complex type cast rule proposal: - Cast for non-complex types should be able to cast the same as before. - Cast for `ArrayType` can evaluate if - Element type can cast - Nullability rule doesn't break - Cast for `MapType` can evaluate if - Key type can cast - Nullability for casted key type is `false` - Value type can cast - Nullability rule for value type doesn't break - Cast for `StructType` can evaluate if - The field size is the same - Each field can cast - Nullability rule for each field doesn't break - The nested structure should be the same. Nullability rule: - If the casted type is `nullable == true`, the target nullability should be `true` Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #3150 from ueshin/issues/SPARK-4293 and squashes the following commits: e935939 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4293 ba14003 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4293 8999868 [Takuya UESHIN] Fix a test title. f677c30 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4293 287f410 [Takuya UESHIN] Add tests to insert data of types ArrayType / MapType / StructType with nullability is false into Hive table. 4f71bb8 [Takuya UESHIN] Make Cast be able to handle complex types.	2014-12-11 22:45:25 -08:00
Jacky Li	c152dde78f	[SPARK-4639] [SQL] Pass maxIterations in as a parameter in Analyzer fix a TODO in Analyzer: // TODO: pass this in as a parameter val fixedPoint = FixedPoint(100) Author: Jacky Li <jacky.likun@huawei.com> Closes #3499 from jackylk/config and squashes the following commits: 4c1252c [Jacky Li] fix scalastyle 820f460 [Jacky Li] pass maxIterations in as a parameter	2014-12-11 22:44:27 -08:00
Cheng Hao	a7f07f511c	[SPARK-4662] [SQL] Whitelist more unittest Whitelist more hive unit test: "create_like_tbl_props" "udf5" "udf_java_method" "decimal_1" "udf_pmod" "udf_to_double" "udf_to_float" "udf7" (this will fail in Hive 0.12) Author: Cheng Hao <hao.cheng@intel.com> Closes #3522 from chenghao-intel/unittest and squashes the following commits: f54e4c7 [Cheng Hao] work around to clean up the hive.table.parameters.default in reset 16fee22 [Cheng Hao] Whitelist more unittest	2014-12-11 22:43:02 -08:00
Cheng Hao	bf40cf89e3	[SPARK-4713] [SQL] SchemaRDD.unpersist() should not raise exception if it is not persisted Unpersist a uncached RDD, will not raise exception, for example: ``` val data = Array(1, 2, 3, 4, 5) val distData = sc.parallelize(data) distData.unpersist(true) ``` But the `SchemaRDD` will raise exception if the `SchemaRDD` is not cached. Since `SchemaRDD` is the subclasses of the `RDD`, we should follow the same behavior. Author: Cheng Hao <hao.cheng@intel.com> Closes #3572 from chenghao-intel/try_uncache and squashes the following commits: 50a7a89 [Cheng Hao] SchemaRDD.unpersist() should not raise exception if it is not persisted	2014-12-11 22:41:36 -08:00
Tathagata Das	b004150adb	[SPARK-4806] Streaming doc update for 1.2 Important updates to the streaming programming guide - Make the fault-tolerance properties easier to understand, with information about write ahead logs - Update the information about deploying the spark streaming app with information about Driver HA - Update Receiver guide to discuss reliable vs unreliable receivers. Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Josh Rosen <joshrosen@databricks.com> Author: Josh Rosen <rosenville@gmail.com> Closes #3653 from tdas/streaming-doc-update-1.2 and squashes the following commits: f53154a [Tathagata Das] Addressed Josh's comments. ce299e4 [Tathagata Das] Minor update. ca19078 [Tathagata Das] Minor change f746951 [Tathagata Das] Mentioned performance problem with WAL 7787209 [Tathagata Das] Merge branch 'streaming-doc-update-1.2' of github.com:tdas/spark into streaming-doc-update-1.2 2184729 [Tathagata Das] Updated Kafka and Flume guides with reliability information. 2f3178c [Tathagata Das] Added more information about writing reliable receivers in the custom receiver guide. 91aa5aa [Tathagata Das] Improved API Docs menu 5707581 [Tathagata Das] Added Pythn API badge b9c8c24 [Tathagata Das] Merge pull request #26 from JoshRosen/streaming-programming-guide b8c8382 [Josh Rosen] minor fixes a4ef126 [Josh Rosen] Restructure parts of the fault-tolerance section to read a bit nicer when skipping over the headings 65f66cd [Josh Rosen] Fix broken link to fault-tolerance semantics section. f015397 [Josh Rosen] Minor grammar / pluralization fixes. 3019f3a [Josh Rosen] Fix minor Markdown formatting issues aa8bb87 [Tathagata Das] Small update. 195852c [Tathagata Das] Updated based on Josh's comments, updated receiver reliability and deploying section, and also updated configuration. 17b99fb [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-doc-update-1.2 a0217c0 [Tathagata Das] Changed Deploying menu layout 67fcffc [Tathagata Das] Added cluster mode + supervise example to submitting application guide. e45453b [Tathagata Das] Update streaming guide, added deploying section. 192c7a7 [Tathagata Das] Added more info about Python API, and rewrote the checkpointing section.	2014-12-11 06:21:23 -08:00
Joseph K. Bradley	2a5b5fd4cc	[SPARK-4791] [sql] Infer schema from case class with multiple constructors Modified ScalaReflection.schemaFor to take primary constructor of Product when there are multiple constructors. Added test to suite which failed before but works now. Needed for [https://github.com/apache/spark/pull/3637] CC: marmbrus Author: Joseph K. Bradley <joseph@databricks.com> Closes #3646 from jkbradley/sql-reflection and squashes the following commits: 796b2e4 [Joseph K. Bradley] Modified ScalaReflection.schemaFor to take primary constructor of Product when there are multiple constructors. Added test to suite which failed before but works now.	2014-12-10 23:41:15 -08:00
Zhang, Liye	57d37f9c71	[CORE]codeStyle: uniform ConcurrentHashMap define in StorageLevel.scala with other places Author: Zhang, Liye <liye.zhang@intel.com> Closes #2793 from liyezhang556520/uniformHashMap and squashes the following commits: 5884735 [Zhang, Liye] [CORE]codeStyle: uniform ConcurrentHashMap define in StorageLevel.scala	2014-12-10 20:44:59 -08:00
Andrew Ash	652b781a9b	SPARK-3526 Add section about data locality to the tuning guide cc kayousterhout I have a few outstanding questions from compiling this documentation: - What's the difference between NO_PREF and ANY? I understand the implications of the ordering but don't know what an example of each would be - Why is NO_PREF ahead of RACK_LOCAL? I would think it'd be better to schedule rack-local tasks ahead of no preference if you could only do one or the other. Is the idea to wait longer and hope for the rack-local tasks to turn into node-local or better? - Will there be a datacenter-local locality level in the future? Apache Cassandra for example has this level Author: Andrew Ash <andrew@andrewash.com> Closes #2519 from ash211/SPARK-3526 and squashes the following commits: 44cff28 [Andrew Ash] Link to spark.locality parameters rather than copying the list 6d5d966 [Andrew Ash] Stay focused on Spark, no astronaut architecture mumbo-jumbo 20e0e31 [Andrew Ash] SPARK-3526 Add section about data locality to the tuning guide	2014-12-10 15:01:15 -08:00
Patrick Wendell	36bdb5b748	MAINTENANCE: Automated closing of pull requests. This commit exists to close the following pull requests on Github: Closes #2883 (close requested by 'pwendell') Closes #3364 (close requested by 'pwendell') Closes #4458 (close requested by 'pwendell') Closes #1574 (close requested by 'andrewor14') Closes #2546 (close requested by 'andrewor14') Closes #2516 (close requested by 'andrewor14') Closes #154 (close requested by 'andrewor14')	2014-12-10 14:41:16 -08:00
Andrew Or	4f93d0cabe	[SPARK-4759] Fix driver hanging from coalescing partitions The driver hangs sometimes when we coalesce RDD partitions. See JIRA for more details and reproduction. This is because our use of empty string as default preferred location in `CoalescedRDDPartition` causes the `TaskSetManager` to schedule the corresponding task on host `""` (empty string). The intended semantics here, however, is that the partition does not have a preferred location, and the TSM should schedule the corresponding task accordingly. Author: Andrew Or <andrew@databricks.com> Closes #3633 from andrewor14/coalesce-preferred-loc and squashes the following commits: e520d6b [Andrew Or] Oops 3ebf8bd [Andrew Or] A few comments f370a4e [Andrew Or] Fix tests 2f7dfb6 [Andrew Or] Avoid using empty string as default preferred location	2014-12-10 14:27:53 -08:00
Ilya Ganelin	447ae2de5d	[SPARK-4569] Rename 'externalSorting' in Aggregator Hi all - I've renamed the unhelpfully named variable and added a comment clarifying what's actually happening. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #3666 from ilganeli/SPARK-4569B and squashes the following commits: 1810394 [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator e2d2092 [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator d7cefec [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator 5b3f39c [Ilya Ganelin] [SPARK-4569] Rename in Aggregator	2014-12-10 14:19:37 -08:00
Daoyuan Wang	e230da18f8	[SPARK-4793] [Deploy] ensure .jar at end of line sometimes I switch between different version and do not want to rebuild spark, so I rename assembly.jar into .jar.bak, but still caught by `compute-classpath.sh` Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3641 from adrian-wang/jar and squashes the following commits: 45cbfd0 [Daoyuan Wang] ensure .jar at end of line	2014-12-10 13:30:45 -08:00
Andrew Or	faa8fd8178	[SPARK-4215] Allow requesting / killing executors only in YARN mode Currently this doesn't do anything in other modes, so we might as well just disable it rather than having the user mistakenly rely on it. Author: Andrew Or <andrew@databricks.com> Closes #3615 from andrewor14/dynamic-allocation-yarn-only and squashes the following commits: ce6487a [Andrew Or] Allow requesting / killing executors only in YARN mode	2014-12-10 12:48:24 -08:00
Andrew Or	56212831c6	[SPARK-4771][Docs] Document standalone cluster supervise mode tdas looks like streaming already refers to the supervise mode. The link from there is broken though. Author: Andrew Or <andrew@databricks.com> Closes #3627 from andrewor14/document-supervise and squashes the following commits: 9ca0908 [Andrew Or] Wording changes 2b55ed2 [Andrew Or] Document standalone cluster supervise mode	2014-12-10 12:41:36 -08:00
Kousuke Saruta	0fc637b4c2	[SPARK-4329][WebUI] HistoryPage pagenation Current HistoryPage have links only to previous page or next page. I suggest to add index to access history pages easily. I implemented like following pics. If there are many pages, current page +/- N pages, head page and last page are indexed. ![2014-11-10 16 13 25](https://cloud.githubusercontent.com/assets/4736016/4986246/9c7bbac4-6937-11e4-8695-8634d039d5b6.png) ![2014-11-10 16 03 21](https://cloud.githubusercontent.com/assets/4736016/4986210/3951bb74-6937-11e4-8b4e-9f90d266d736.png) ![2014-11-10 16 03 39](https://cloud.githubusercontent.com/assets/4736016/4986211/3b196ad8-6937-11e4-9f81-74bc0a6dad5b.png) ![2014-11-10 16 03 49](https://cloud.githubusercontent.com/assets/4736016/4986213/40686138-6937-11e4-86c0-41100f0404f6.png) ![2014-11-10 16 04 04](https://cloud.githubusercontent.com/assets/4736016/4986215/4326c9b4-6937-11e4-87ac-0f30c86ec6e3.png) Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3194 from sarutak/history-page-indexing and squashes the following commits: 15d3d2d [Kousuke Saruta] Simplified code c93932e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing 1c2f605 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing 76b05e3 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing b2240f8 [Kousuke Saruta] Fixed style ec7922e [Kousuke Saruta] Simplified code 755a004 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing cfa242b [Kousuke Saruta] Added index to HistoryPage	2014-12-10 12:30:45 -08:00
GuoQiang Li	742e7093ec	[SPARK-4161]Spark shell class path is not correctly set if "spark.driver.extraClassPath" is set in defaults.conf Author: GuoQiang Li <witgo@qq.com> Closes #3050 from witgo/SPARK-4161 and squashes the following commits: abb6fa4 [GuoQiang Li] move usejavacp opt to spark-shell 89e39e7 [GuoQiang Li] review commit c2a6f04 [GuoQiang Li] Spark shell class path is not correctly set if "spark.driver.extraClassPath" is set in defaults.conf	2014-12-10 12:26:42 -08:00
Nathan Kronenfeld	94b377f944	[SPARK-4772] Clear local copies of accumulators as soon as we're done with them Accumulators keep thread-local copies of themselves. These copies were only cleared at the beginning of a task. This meant that (a) the memory they used was tied up until the next task ran on that thread, and (b) if a thread died, the memory it had used for accumulators was locked up forever on that worker. This PR clears the thread-local copies of accumulators at the end of each task, in the tasks finally block, to make sure they are cleaned up between tasks. It also stores them in a ThreadLocal object, so that if, for some reason, the thread dies, any memory they are using at the time should be freed up. Author: Nathan Kronenfeld <nkronenfeld@oculusinfo.com> Closes #3570 from nkronenfeld/Accumulator-Improvements and squashes the following commits: a581f3f [Nathan Kronenfeld] Change Accumulators to private[spark] instead of adding mima exclude to get around false positive in mima tests b6c2180 [Nathan Kronenfeld] Include MiMa exclude as per build error instructions - this version incompatibility should be irrelevent, as it will only surface if a master is talking to a worker running a different version of spark. 537baad [Nathan Kronenfeld] Fuller refactoring as intended, incorporating JR's suggestions for ThreadLocal localAccums, and keeping clear(), but also calling it in tasks' finally block, rather than just at the beginning of the task. 39a82f2 [Nathan Kronenfeld] Clear local copies of accumulators as soon as we're done with them	2014-12-09 23:53:17 -08:00
Josh Rosen	f79c1cfc99	[Minor] Use <sup> tag for help icon in web UI page header This small commit makes the `(?)` web UI help link into a superscript, which should address feedback that the current design makes it look like an error occurred or like information is missing. Before: ![image](https://cloud.githubusercontent.com/assets/50748/5370611/a3ed0034-7fd9-11e4-870f-05bd9faad5b9.png) After: ![image](https://cloud.githubusercontent.com/assets/50748/5370602/6c5ca8d6-7fd9-11e4-8d1a-568d71290aa7.png) Author: Josh Rosen <joshrosen@databricks.com> Closes #3659 from JoshRosen/webui-help-sup and squashes the following commits: bd72899 [Josh Rosen] Use <sup> tag for help icon in web UI page header.	2014-12-09 23:47:05 -08:00
Reynold Xin	9bd9334f58	Config updates for the new shuffle transport. Author: Reynold Xin <rxin@databricks.com> Closes #3657 from rxin/conf-update and squashes the following commits: 7370eab [Reynold Xin] Config updates for the new shuffle transport.	2014-12-09 19:29:09 -08:00
Reynold Xin	2b9b72682e	[SPARK-4740] Create multiple concurrent connections between two peer nodes in Netty. It's been reported that when the number of disks is large and the number of nodes is small, Netty network throughput is low compared with NIO. We suspect the problem is that only a small number of disks are utilized to serve shuffle files at any given point, due to connection reuse. This patch adds a new config parameter to specify the number of concurrent connections between two peer nodes, default to 2. Author: Reynold Xin <rxin@databricks.com> Closes #3625 from rxin/SPARK-4740 and squashes the following commits: ad4241a [Reynold Xin] Updated javadoc. f33c72b [Reynold Xin] Code review feedback. 0fefabb [Reynold Xin] Use double check in synchronization. 41dfcb2 [Reynold Xin] Added test case. 9076b4a [Reynold Xin] Fixed two NPEs. 3e1306c [Reynold Xin] Minor style fix. 4f21673 [Reynold Xin] [SPARK-4740] Create multiple concurrent connections between two peer nodes in Netty.	2014-12-09 17:49:59 -08:00
Sean Owen	d8f84f26e3	SPARK-4805 [CORE] BlockTransferMessage.toByteArray() trips assertion Allocate enough room for type byte as well as message, to avoid tripping assertion about capacity of the buffer Author: Sean Owen <sowen@cloudera.com> Closes #3650 from srowen/SPARK-4805 and squashes the following commits: 9e1d502 [Sean Owen] Allocate enough room for type byte as well as message, to avoid tripping assertion about capacity of the buffer	2014-12-09 16:38:27 -08:00
Sandy Ryza	5e4c06f8e5	SPARK-4567. Make SparkJobInfo and SparkStageInfo serializable Author: Sandy Ryza <sandy@cloudera.com> Closes #3426 from sryza/sandy-spark-4567 and squashes the following commits: cb4b8d2 [Sandy Ryza] SPARK-4567. Make SparkJobInfo and SparkStageInfo serializable	2014-12-09 16:26:07 -08:00
hushan[胡珊]	30dca924df	[SPARK-4714] BlockManager.dropFromMemory() should check whether block has been removed after synchronizing on BlockInfo instance. After synchronizing on the `info` lock in the `removeBlock`/`dropOldBlocks`/`dropFromMemory` methods in BlockManager, the block that `info` represented may have already removed. The three methods have the same logic to get the `info` lock: ``` info = blockInfo.get(id) if (info != null) { info.synchronized { // do something } } ``` So, there is chance that when a thread enters the `info.synchronized` block, `info` has already been removed from the `blockInfo` map by some other thread who entered `info.synchronized` first. The `removeBlock` and `dropOldBlocks` methods are idempotent, so it's safe for them to run on blocks that have already been removed. But in `dropFromMemory` it may be problematic since it may drop block data which already removed into the diskstore, and this calls data store operations that are not designed to handle missing blocks. This patch fixes this issue by adding a check to `dropFromMemory` to test whether blocks have been removed by a racing thread. Author: hushan[胡珊] <hushan@xiaomi.com> Closes #3574 from suyanNone/refine-block-concurrency and squashes the following commits: edb989d [hushan[胡珊]] Refine code style and comments position 55fa4ba [hushan[胡珊]] refine code e57e270 [hushan[胡珊]] add check info is already remove or not while having gotten info.syn	2014-12-09 15:54:40 -08:00
Kay Ousterhout	1f5110630c	[SPARK-4765] Make GC time always shown in UI. This commit removes the GC time for each task from the set of optional, additional metrics, and instead always shows it for each task. cc pwendell Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #3622 from kayousterhout/gc_time and squashes the following commits: 15ac242 [Kay Ousterhout] Make TaskDetailsClassNames private[spark] e71d893 [Kay Ousterhout] [SPARK-4765] Make GC time always shown in UI.	2014-12-09 15:10:36 -08:00
maji2014	b31074466a	[SPARK-4691][shuffle] Restructure a few lines in shuffle code In HashShuffleReader.scala and HashShuffleWriter.scala, no need to judge "dep.aggregator.isEmpty" again as this is judged by "dep.aggregator.isDefined" In SortShuffleWriter.scala, "dep.aggregator.isEmpty" is better than "!dep.aggregator.isDefined" ? Author: maji2014 <maji3@asiainfo.com> Closes #3553 from maji2014/spark-4691 and squashes the following commits: bf7b14d [maji2014] change a elegant way for SortShuffleWriter.scala 10d0cf0 [maji2014] change a elegant way d8f52dc [maji2014] code optimization for judgement	2014-12-09 13:13:12 -08:00
jbencook	61f1a70227	[SPARK-874] adding a --wait flag This PR adds a --wait flag to the `./sbin/stop-all.sh` script. Author: jbencook <jbenjamincook@gmail.com> Closes #3567 from jbencook/master and squashes the following commits: d05c5bb [jbencook] [SPARK-874] adding a --wait flag	2014-12-09 12:16:19 -08:00
Sandy Ryza	912563aa35	SPARK-4338. [YARN] Ditch yarn-alpha. Sorry if this is a little premature with 1.2 still not out the door, but it will make other work like SPARK-4136 and SPARK-2089 a lot easier. Author: Sandy Ryza <sandy@cloudera.com> Closes #3215 from sryza/sandy-spark-4338 and squashes the following commits: 1c5ac08 [Sandy Ryza] Update building Spark docs and remove unnecessary newline 9c1421c [Sandy Ryza] SPARK-4338. Ditch yarn-alpha.	2014-12-09 11:02:43 -08:00
Cheng Hao	383c5555c9	[SPARK-4785][SQL] Initilize Hive UDFs on the driver and serialize them with a wrapper Different from Hive 0.12.0, in Hive 0.13.1 UDF/UDAF/UDTF (aka Hive function) objects should only be initialized once on the driver side and then serialized to executors. However, not all function objects are serializable (e.g. GenericUDF doesn't implement Serializable). Hive 0.13.1 solves this issue with Kryo or XML serializer. Several utility ser/de methods are provided in class o.a.h.h.q.e.Utilities for this purpose. In this PR we chose Kryo for efficiency. The Kryo serializer used here is created in Hive. Spark Kryo serializer wasn't used because there's no available SparkConf instance. Author: Cheng Hao <hao.cheng@intel.com> Author: Cheng Lian <lian@databricks.com> Closes #3640 from chenghao-intel/udf_serde and squashes the following commits: 8e13756 [Cheng Hao] Update the comment 74466a3 [Cheng Hao] refactor as feedbacks 396c0e1 [Cheng Hao] avoid Simple UDF to be serialized e9c3212 [Cheng Hao] update the comment 19cbd46 [Cheng Hao] support udf instance ser/de after initialization	2014-12-09 10:28:33 -08:00
zsxwing	bcb5cdad61	[SPARK-3154][STREAMING] Replace ConcurrentHashMap with mutable.HashMap and remove @volatile from 'stopped' Since `sequenceNumberToProcessor` and `stopped` are both protected by the lock `sequenceNumberToProcessor`, `ConcurrentHashMap` and `volatile` is unnecessary. So this PR updated them accordingly. Author: zsxwing <zsxwing@gmail.com> Closes #3634 from zsxwing/SPARK-3154 and squashes the following commits: 0d087ac [zsxwing] Replace ConcurrentHashMap with mutable.HashMap and remove @volatile from 'stopped'	2014-12-08 23:54:15 -08:00
Cheng Hao	51b1fe1426	[SPARK-4769] [SQL] CTAS does not work when reading from temporary tables This is the code refactor and follow ups for #2570 Author: Cheng Hao <hao.cheng@intel.com> Closes #3336 from chenghao-intel/createtbl and squashes the following commits: 3563142 [Cheng Hao] remove the unused variable e215187 [Cheng Hao] eliminate the compiling warning 4f97f14 [Cheng Hao] fix bug in unittest 5d58812 [Cheng Hao] revert the API changes b85b620 [Cheng Hao] fix the regression of temp tabl not found in CTAS	2014-12-08 17:39:12 -08:00
Jacky Li	944384363d	[SQL] remove unnecessary import in spark-sql Author: Jacky Li <jacky.likun@huawei.com> Closes #3630 from jackylk/remove and squashes the following commits: 150e7e0 [Jacky Li] remove unnecessary import	2014-12-08 17:27:46 -08:00
Sandy Ryza	cda94d15ea	SPARK-4770. [DOC] [YARN] spark.scheduler.minRegisteredResourcesRatio doc... ...umented default is incorrect for YARN Author: Sandy Ryza <sandy@cloudera.com> Closes #3624 from sryza/sandy-spark-4770 and squashes the following commits: bd81a3a [Sandy Ryza] SPARK-4770. [DOC] [YARN] spark.scheduler.minRegisteredResourcesRatio documented default is incorrect for YARN	2014-12-08 16:28:36 -08:00
Sean Owen	e829bfa1ab	SPARK-3926 [CORE] Reopened: result of JavaRDD collectAsMap() is not serializable My original 'fix' didn't fix at all. Now, there's a unit test to check whether it works. Of the two options to really fix it -- copy the `Map` to a `java.util.HashMap`, or copy and modify Scala's implementation in `Wrappers.MapWrapper`, I went with the latter. Author: Sean Owen <sowen@cloudera.com> Closes #3587 from srowen/SPARK-3926 and squashes the following commits: 8586bb9 [Sean Owen] Remove unneeded no-arg constructor, and add additional note about copied code in LICENSE 7bb0e66 [Sean Owen] Make SerializableMapWrapper actually serialize, and add unit test	2014-12-08 16:13:03 -08:00
Andrew Or	65f929d5b3	[SPARK-4750] Dynamic allocation - synchronize kills Simple omission on my part. Author: Andrew Or <andrew@databricks.com> Closes #3612 from andrewor14/dynamic-allocation-synchronization and squashes the following commits: 1f03b60 [Andrew Or] Synchronize kills	2014-12-08 16:02:33 -08:00
Kostas Sakellis	d6a972b3e4	[SPARK-4774] [SQL] Makes HiveFromSpark more portable HiveFromSpark read the kv1.txt file from SPARK_HOME/examples/src/main/resources/kv1.txt which assumed you had a source tree checked out. Now we copy the kv1.txt file to a temporary file and delete it when the jvm shuts down. This allows us to run this example outside of a spark source tree. Author: Kostas Sakellis <kostas@cloudera.com> Closes #3628 from ksakellis/kostas-spark-4774 and squashes the following commits: 6770f83 [Kostas Sakellis] [SPARK-4774] [SQL] Makes HiveFromSpark more portable	2014-12-08 15:44:18 -08:00
Christophe Préaud	ab2abcb5ef	[SPARK-4764] Ensure that files are fetched atomically tempFile is created in the same directory than targetFile, so that the move from tempFile to targetFile is always atomic Author: Christophe Préaud <christophe.preaud@kelkoo.com> Closes #2855 from preaudc/master and squashes the following commits: 9ba89ca [Christophe Préaud] Ensure that files are fetched atomically 54419ae [Christophe Préaud] Merge remote-tracking branch 'upstream/master' c6a5590 [Christophe Préaud] Revert commit 8ea871f8130b2490f1bad7374a819bf56f0ccbbd 7456a33 [Christophe Préaud] Merge remote-tracking branch 'upstream/master' 8ea871f [Christophe Préaud] Ensure that files are fetched atomically	2014-12-08 11:44:54 -08:00

1 2 3 4 5 ...

9052 commits