Commit graph

5416 commits

Author SHA1 Message Date
Prashant Sharma 08ec10de17 Removed a repeated test and changed tests to not use uncommons jar 2014-01-02 17:32:11 +05:30
Prashant Sharma 436f3d2856 ignoring tests for now, contrary to what I assumed these tests make sense given what they are testing. 2014-01-02 16:08:35 +05:30
Prashant Sharma 6be4c11194 Removed sbt folder and changed docs accordingly 2014-01-02 14:09:37 +05:30
Prashant Sharma 8821c3a526 Deleted py4j jar and added to assembly dependency 2014-01-02 13:09:46 +05:30
Patrick Wendell 3713f8129a Merge pull request #309 from mateiz/conf2
SPARK-544. Migrate configuration to a SparkConf class

This is still a work in progress based on Prashant and Evan's code. So far I've done the following:

- Got rid of global SparkContext.globalConf
- Passed SparkConf to serializers and compression codecs
- Made SparkConf public instead of private[spark]
- Improved API of SparkContext and SparkConf
- Switched executor environment vars to be passed through SparkConf
- Fixed some places that were still using system properties
- Fixed some tests, though others are still failing

This still fails several tests in core, repl and streaming, likely due to properties not being set or cleared correctly (some of the tests run fine in isolation). But the API at least is hopefully ready for review. Unfortunately there was a lot of global stuff before due to a "SparkContext.globalConf" method that let you set a "default" configuration of sorts, which meant I had to make some pretty big changes.
2014-01-01 21:29:12 -08:00
Matei Zaharia 7e8d2e8a5c Fix Python code after change of getOrElse 2014-01-01 23:21:34 -05:00
Matei Zaharia 0f6060733d Fixed two uses of conf.get with no default value in Mesos 2014-01-01 22:09:42 -05:00
Matei Zaharia e2c68642c6 Miscellaneous fixes from code review.
Also replaced SparkConf.getOrElse with just a "get" that takes a default
value, and added getInt, getLong, etc to make code that uses this
simpler later on.
2014-01-01 22:03:39 -05:00
Matei Zaharia 45ff8f413d Merge remote-tracking branch 'apache/master' into conf2
Conflicts:
	core/src/main/scala/org/apache/spark/SparkContext.scala
	core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala
	core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala
2014-01-01 21:25:00 -05:00
Patrick Wendell c1d928a897 Merge pull request #312 from pwendell/log4j-fix-2
SPARK-1008: Logging improvments

1. Adds a default log4j file that gets loaded if users haven't specified a log4j file.
2. Isolates use of the tools assembly jar. I found this produced SLF4J warnings
   after building with SBT (and I've seen similar warnings on the mailing list).
2014-01-01 17:03:48 -08:00
Patrick Wendell f8d245bdfc Merge remote-tracking branch 'apache-github/master' into log4j-fix-2
Conflicts:
	streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
2014-01-01 16:10:51 -08:00
Andrew Or 92c304fd03 Simplify ExternalAppendOnlyMap on the assumption that the mergeCombiners function is specified 2014-01-01 11:42:33 -08:00
Matei Zaharia 0e5b2adb5c Merge remote-tracking branch 'apache/master' into conf2
Conflicts:
	project/SparkBuild.scala
2014-01-01 13:28:54 -05:00
Lian, Cheng dd6033e685 Aggregated all sample points to driver without any shuffle 2014-01-02 01:38:24 +08:00
Reynold Xin 9a0ff721c9 Merge pull request #314 from witgo/master
restore core/pom.xml file modification
2013-12-31 21:50:24 -08:00
Andrew Or 3bc9e391a3 Merge branch 'master' of github.com:andrewor14/incubator-spark 2013-12-31 20:02:12 -08:00
Andrew Or 83dfa16664 Address Patrick's and Reynold's comments 2013-12-31 20:02:05 -08:00
liguoqiang b5d0b3b0f7 restore core/pom.xml file modification 2014-01-01 11:30:08 +08:00
Reynold Xin 8b8e70ebde Merge pull request #73 from falaki/ApproximateDistinctCount
Approximate distinct count

Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.
2013-12-31 17:48:24 -08:00
Aaron Davidson 08302b113a Rename IntermediateBlockId to TempBlockId 2013-12-31 17:44:15 -08:00
Patrick Wendell 37c43c9dd1 Adding outer checkout when initializing logging 2013-12-31 17:36:56 -08:00
Andrew Or 8bbe08b21e Merge branch 'master' of github.com:andrewor14/incubator-spark 2013-12-31 17:26:26 -08:00
Andrew Or 53d8d36684 Add support and test for null keys in ExternalAppendOnlyMap
Also add safeguard against use of destructively sorted AppendOnlyMap
2013-12-31 17:19:02 -08:00
Hossein Falaki bee445c927 Made the code more compact and readable 2013-12-31 16:58:18 -08:00
Hossein Falaki acb0323053 minor improvements 2013-12-31 15:34:26 -08:00
Matei Zaharia 42bcfb2bb2 Fix two compile errors introduced in merge 2013-12-31 18:26:23 -05:00
Matei Zaharia ba9338f104 Merge remote-tracking branch 'apache/master' into conf2
Conflicts:
	core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala
	streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
	streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
2013-12-31 18:23:14 -05:00
Patrick Wendell 63b411dd86 Merge pull request #238 from ngbinh/upgradeNetty
upgrade Netty from 4.0.0.Beta2 to 4.0.13.Final

the changes are listed at https://github.com/netty/netty/wiki/New-and-noteworthy
2013-12-31 14:31:28 -08:00
Andrew Or 3ce22df954 Add warning message for spilling 2013-12-31 11:33:10 -08:00
Andrew Or 94ddc91d06 Address Aaron's and Jerry's comments 2013-12-31 10:50:08 -08:00
Patrick Wendell 55b7e2fdff Merge pull request #289 from tdas/filestream-fix
Bug fixes for file input stream and checkpointing

- Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.)
- Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration.
- Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten.
- Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.
2013-12-31 10:12:51 -08:00
Tathagata Das fcd17a1e8e Fixed comments and long lines based on comments on PR 289. 2013-12-31 02:01:45 -08:00
Tathagata Das 977bcc36d4 Merge branch 'apache-master' into project-refactor 2013-12-31 00:43:38 -08:00
Tathagata Das 87b915f221 Removed extra empty lines. 2013-12-31 00:42:10 -08:00
Tathagata Das 3ab297adaa Removed unnecessary comments. 2013-12-31 00:38:19 -08:00
Tathagata Das 97630849ff Added pom.xml for external projects and removed unnecessary dependencies and repositoris from other poms and sbt. 2013-12-31 00:28:57 -08:00
Patrick Wendell 4abb0c57ab Tiny typo fix 2013-12-31 00:05:03 -08:00
Patrick Wendell 4d009dcac6 Removing use in test 2013-12-31 00:01:44 -08:00
Patrick Wendell 3c254f2eec Minor fixes 2013-12-30 23:55:33 -08:00
Aaron Davidson 375d11743c Add new line at end of file 2013-12-30 23:42:37 -08:00
Patrick Wendell 18181e6c41 Removing initLogging entirely 2013-12-30 23:39:47 -08:00
Aaron Davidson daa7792ad6 Refactor SamplingSizeTracker into SizeTrackingAppendOnlyMap 2013-12-30 23:39:02 -08:00
Hossein Falaki d6cded7155 Added Java unit tests for countApproxDistinct and countApproxDistinctByKey 2013-12-30 19:32:05 -08:00
Hossein Falaki c3073b6cf2 Added Java API for countApproxDistinct 2013-12-30 19:31:06 -08:00
Hossein Falaki ed06500d30 Added Java API for countApproxDistinctByKey 2013-12-30 19:30:42 -08:00
Hossein Falaki b75d7c98bc Added stream 2.5.1 jar depenency 2013-12-30 19:29:17 -08:00
Hossein Falaki a7de8e9b1c Renamed countDistinct and countDistinctByKey methods to include Approx 2013-12-30 19:28:03 -08:00
Matei Zaharia 0fa5809768 Updated docs for SparkConf and handled review comments 2013-12-30 22:17:28 -05:00
Hossein Falaki d50ccc5ca9 Using origin version 2013-12-30 15:08:34 -08:00
Andrew Or 347fafe4fc Fix CheckpointSuite test fail 2013-12-30 13:10:33 -08:00