Commit graph

12756 commits

Author SHA1 Message Date
Andrew Or ece00566e4 [SPARK-9561] Re-enable BroadcastJoinSuite
We can do this now that SPARK-9580 is resolved.

Author: Andrew Or <andrew@databricks.com>

Closes #8208 from andrewor14/reenable-sql-tests.
2015-08-14 12:37:21 -07:00
Davies Liu 3bc5528722 [SPARK-9946] [SPARK-9589] [SQL] fix NPE and thread-safety in TaskMemoryManager
Currently, we access the `page.pageNumer` after it's freed, that could be modified by other thread, cause NPE.

The same TaskMemoryManager could be used by multiple threads (for example, Python UDF and TransportScript), so it should be thread safe to allocate/free memory/page. The underlying Bitset and HashSet are not thread safe, we should put them inside a synchronized block.

cc JoshRosen

Author: Davies Liu <davies@databricks.com>

Closes #8177 from davies/memory_manager.
2015-08-14 12:32:35 -07:00
Neelesh Srinivas Salian 57c2d08800 [SPARK-9923] [CORE] ShuffleMapStage.numAvailableOutputs should be an Int instead of Long
Modified type of ShuffleMapStage.numAvailableOutputs from Long to Int

Author: Neelesh Srinivas Salian <nsalian@cloudera.com>

Closes #8183 from nssalian/SPARK-9923.
2015-08-14 20:03:50 +01:00
Wenchen Fan 34d610be85 [SPARK-9929] [SQL] support metadata in withColumn
in MLlib sometimes we need to set metadata for the new column, thus we will alias the new column with metadata before call `withColumn` and in `withColumn` we alias this clolumn again. Here I overloaded `withColumn` to allow user set metadata, just like what we did  for `Column.as`.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #8159 from cloud-fan/withColumn.
2015-08-14 12:00:01 -07:00
Holden Karau a7317ccdc2 [SPARK-8744] [ML] Add a public constructor to StringIndexer
It would be helpful to allow users to pass a pre-computed index to create an indexer, rather than always going through StringIndexer to create the model.

Author: Holden Karau <holden@pigscanfly.ca>

Closes #7267 from holdenk/SPARK-8744-StringIndexerModel-should-have-public-constructor.
2015-08-14 11:22:10 -07:00
Joseph K. Bradley 7ecf0c4699 [SPARK-9956] [ML] Make trees work with one-category features
This modifies DecisionTreeMetadata construction to treat 1-category features as continuous, so that trees do not fail with such features.  It is important for the pipelines API, where VectorIndexer can automatically categorize certain features as categorical.

As stated in the JIRA, this is a temp fix which we can improve upon later by automatically filtering out those features. That will take longer, though, since it will require careful indexing.

Targeted for 1.5 and master

CC: manishamde  mengxr yanboliang

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #8187 from jkbradley/tree-1cat.
2015-08-14 10:48:02 -07:00
Xiangrui Meng a0e1abbd01 [SPARK-9661] [MLLIB] minor clean-up of SPARK-9661
Some minor clean-ups after SPARK-9661. See my inline comments. MechCoder jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #8190 from mengxr/SPARK-9661-fix.
2015-08-14 10:25:11 -07:00
zsxwing c8677d7366 [SPARK-9958] [SQL] Make HiveThriftServer2Listener thread-safe and update the tab name to "JDBC/ODBC Server"
This PR fixed the thread-safe issue of HiveThriftServer2Listener, and also changed the tab name to "JDBC/ODBC Server" since it's conflict with the new SQL tab.

<img width="1377" alt="thriftserver" src="https://cloud.githubusercontent.com/assets/1000778/9265707/c46f3f2c-4269-11e5-8d7e-888c9113ab4f.png">

Author: zsxwing <zsxwing@gmail.com>

Closes #8185 from zsxwing/SPARK-9958.
2015-08-14 14:41:53 +08:00
Liang-Chi Hsieh 7c7c7529a1 [MINOR] [SQL] Remove canEqual in Row
As `InternalRow` does not extend `Row` now, I think we can remove it.

Author: Liang-Chi Hsieh <viirya@appier.com>

Closes #8170 from viirya/remove_canequal.
2015-08-13 22:06:09 -07:00
Davies Liu bd35385d53 [SPARK-9945] [SQL] pageSize should be calculated from executor.memory
Currently, pageSize of TungstenSort is calculated from driver.memory, it should use executor.memory instead.

Also, in the worst case, the safeFactor could be 4 (because of rounding), increase it to 16.

cc rxin

Author: Davies Liu <davies@databricks.com>

Closes #8175 from davies/page_size.
2015-08-13 21:12:59 -07:00
Andrew Or 8187b3ae47 [SPARK-9580] [SQL] Replace singletons in SQL tests
A fundamental limitation of the existing SQL tests is that *there is simply no way to create your own `SparkContext`*. This is a serious limitation because the user may wish to use a different master or config. As a case in point, `BroadcastJoinSuite` is entirely commented out because there is no way to make it pass with the existing infrastructure.

This patch removes the singletons `TestSQLContext` and `TestData`, and instead introduces a `SharedSQLContext` that starts a context per suite. Unfortunately the singletons were so ingrained in the SQL tests that this patch necessarily needed to touch *all* the SQL test files.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/8111)
<!-- Reviewable:end -->

Author: Andrew Or <andrew@databricks.com>

Closes #8111 from andrewor14/sql-tests-refactor.
2015-08-13 17:42:01 -07:00
Davies Liu c50f97dafd [SPARK-9943] [SQL] deserialized UnsafeHashedRelation should be serializable
When the free memory in executor goes low, the cached broadcast objects need to serialized into disk, but currently the deserialized UnsafeHashedRelation can't be serialized , fail with NPE. This PR fixes that.

cc rxin

Author: Davies Liu <davies@databricks.com>

Closes #8174 from davies/serialize_hashed.
2015-08-13 17:35:11 -07:00
Davies Liu 693949ba40 [SPARK-8976] [PYSPARK] fix open mode in python3
This bug only happen on Python 3 and Windows.

I tested this manually with python 3 and disable python daemon, no unit test yet.

Author: Davies Liu <davies@databricks.com>

Closes #8181 from davies/open_mode.
2015-08-13 17:33:37 -07:00
Xiangrui Meng 6c5858bc65 [SPARK-9922] [ML] rename StringIndexerReverse to IndexToString
What `StringIndexerInverse` does is not strictly associated with `StringIndexer`, and the name is not clearly describing the transformation. Renaming to `IndexToString` might be better.

~~I also changed `invert` to `inverse` without arguments. `inputCol` and `outputCol` could be set after.~~
I also removed `invert`.

jkbradley holdenk

Author: Xiangrui Meng <meng@databricks.com>

Closes #8152 from mengxr/SPARK-9922.
2015-08-13 16:52:17 -07:00
hyukjinkwon c2520f501a [SPARK-9935] [SQL] EqualNotNull not processed in ORC
https://issues.apache.org/jira/browse/SPARK-9935

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #8163 from HyukjinKwon/master.
2015-08-13 16:07:03 -07:00
Davies Liu a8d2f4c5f9 [SPARK-9942] [PYSPARK] [SQL] ignore exceptions while try to import pandas
If pandas is broken (can't be imported, raise other exceptions other than ImportError), pyspark can't be imported, we should ignore all the exceptions.

Author: Davies Liu <davies@databricks.com>

Closes #8173 from davies/fix_pandas.
2015-08-13 14:03:55 -07:00
MechCoder 864de8eaf4 [SPARK-9661] [MLLIB] [ML] Java compatibility
I skimmed through the docs for various instance of Object and replaced them with Java compaible versions of the same.

1. Some methods in LDAModel.
2. runMiniBatchSGD
3. kolmogorovSmirnovTest

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #8126 from MechCoder/java_incop.
2015-08-13 13:42:35 -07:00
Andrew Or 8815ba2f67 [SPARK-9649] Fix MasterSuite, third time's a charm
This particular test did not load the default configurations so
it continued to start the REST server, which causes port bind
exceptions.
2015-08-13 11:31:10 -07:00
Xiangrui Meng 65fec798ce [MINOR] [DOC] fix mllib pydoc warnings
Switch to correct Sphinx syntax. MechCoder

Author: Xiangrui Meng <meng@databricks.com>

Closes #8169 from mengxr/mllib-pydoc-fix.
2015-08-13 10:16:40 -07:00
Yanbo Liang 4b70798c96 [MINOR] [ML] change MultilayerPerceptronClassifierModel to MultilayerPerceptronClassificationModel
To follow the naming rule of ML, change `MultilayerPerceptronClassifierModel` to `MultilayerPerceptronClassificationModel` like `DecisionTreeClassificationModel`, `GBTClassificationModel` and so on.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8164 from yanboliang/mlp-name.
2015-08-13 09:31:14 -07:00
Rosstin 7a539ef3b1 [SPARK-8965] [DOCS] Add ml-guide Python Example: Estimator, Transformer, and Param
Added ml-guide Python Example: Estimator, Transformer, and Param
/docs/_site/ml-guide.html

Author: Rosstin <asterazul@gmail.com>

Closes #8081 from Rosstin/SPARK-8965.
2015-08-13 09:18:39 -07:00
lewuathe 2932e25da4 [SPARK-9073] [ML] spark.ml Models copy() should call setParent when there is a parent
Copied ML models must have the same parent of original ones

Author: lewuathe <lewuathe@me.com>
Author: Lewuathe <lewuathe@me.com>

Closes #7447 from Lewuathe/SPARK-9073.
2015-08-13 09:17:19 -07:00
Cheng Lian 6993031011 [SPARK-9757] [SQL] Fixes persistence of Parquet relation with decimal column
PR #7967 enables us to save data source relations to metastore in Hive compatible format when possible. But it fails to persist Parquet relations with decimal column(s) to Hive metastore of versions lower than 1.2.0. This is because `ParquetHiveSerDe` in Hive versions prior to 1.2.0 doesn't support decimal. This PR checks for this case and falls back to Spark SQL specific metastore table format.

Author: Yin Huai <yhuai@databricks.com>
Author: Cheng Lian <lian@databricks.com>

Closes #8130 from liancheng/spark-9757/old-hive-parquet-decimal.
2015-08-13 16:16:50 +08:00
Yin Huai 84a27916a6 [SPARK-9885] [SQL] Also pass barrierPrefixes and sharedPrefixes to IsolatedClientLoader when hiveMetastoreJars is set to maven.
https://issues.apache.org/jira/browse/SPARK-9885

cc marmbrus liancheng

Author: Yin Huai <yhuai@databricks.com>

Closes #8158 from yhuai/classloaderMaven.
2015-08-13 15:08:57 +08:00
Xiangrui Meng 68f9957149 [SPARK-9918] [MLLIB] remove runs from k-means and rename epsilon to tol
This requires some discussion. I'm not sure whether `runs` is a useful parameter. It certainly complicates the implementation. We might want to optimize the k-means implementation with block matrix operations. In this case, having `runs` may not be worth the trade-off. Also it increases the communication cost in a single job, which might cause other issues.

This PR also renames `epsilon` to `tol` to have consistent naming among algorithms. The Python constructor is updated to include all parameters.

jkbradley yu-iskw

Author: Xiangrui Meng <meng@databricks.com>

Closes #8148 from mengxr/SPARK-9918 and squashes the following commits:

149b9e5 [Xiangrui Meng] fix constructor in Python and rename epsilon to tol
3cc15b3 [Xiangrui Meng] fix test and change initStep to initSteps in python
a0a0274 [Xiangrui Meng] remove runs from k-means in the pipeline API
2015-08-12 23:04:59 -07:00
Yijie Shen d0b18919d1 [SPARK-9927] [SQL] Revert 8049 since it's pushing wrong filter down
I made a mistake in #8049 by casting literal value to attribute's data type, which would cause simply truncate the literal value and push a wrong filter down.

JIRA: https://issues.apache.org/jira/browse/SPARK-9927

Author: Yijie Shen <henry.yijieshen@gmail.com>

Closes #8157 from yjshen/rever8049.
2015-08-13 13:33:39 +08:00
Xiangrui Meng d7eb371eb6 [SPARK-9914] [ML] define setters explicitly for Java and use setParam group in RFormula
The problem with defining setters in the base class is that it doesn't return the correct type in Java.

ericl

Author: Xiangrui Meng <meng@databricks.com>

Closes #8143 from mengxr/SPARK-9914 and squashes the following commits:

d36c887 [Xiangrui Meng] remove setters from model
a49021b [Xiangrui Meng] define setters explicitly for Java and use setParam group
2015-08-12 22:30:33 -07:00
shikai.tang df54389212 [SPARK-8922] [DOCUMENTATION, MLLIB] Add @since tags to mllib.evaluation
Author: shikai.tang <tar.sky06@gmail.com>

Closes #7429 from mosessky/master.
2015-08-12 21:53:15 -07:00
Xiangrui Meng 5fc058a1fc [SPARK-9917] [ML] add getMin/getMax and doc for originalMin/origianlMax in MinMaxScaler
hhbyyh

Author: Xiangrui Meng <meng@databricks.com>

Closes #8145 from mengxr/SPARK-9917.
2015-08-12 21:33:38 -07:00
Davies Liu a8ab2634c1 [SPARK-9832] [SQL] add a thread-safe lookup for BytesToBytseMap
This patch add a thread-safe lookup for BytesToBytseMap, and use that in broadcasted HashedRelation.

Author: Davies Liu <davies@databricks.com>

Closes #8151 from davies/safeLookup.
2015-08-12 21:26:00 -07:00
Yin Huai 2278219054 [SPARK-9920] [SQL] The simpleString of TungstenAggregate does not show its output
https://issues.apache.org/jira/browse/SPARK-9920

Taking `sqlContext.sql("select i, sum(j1) as sum from testAgg group by i").explain()` as an example, the output of our current master is
```
== Physical Plan ==
TungstenAggregate(key=[i#0], value=[(sum(cast(j1#1 as bigint)),mode=Final,isDistinct=false)]
 TungstenExchange hashpartitioning(i#0)
  TungstenAggregate(key=[i#0], value=[(sum(cast(j1#1 as bigint)),mode=Partial,isDistinct=false)]
   Scan ParquetRelation[file:/user/hive/warehouse/testagg][i#0,j1#1]
```
With this PR, the output will be
```
== Physical Plan ==
TungstenAggregate(key=[i#0], functions=[(sum(cast(j1#1 as bigint)),mode=Final,isDistinct=false)], output=[i#0,sum#18L])
 TungstenExchange hashpartitioning(i#0)
  TungstenAggregate(key=[i#0], functions=[(sum(cast(j1#1 as bigint)),mode=Partial,isDistinct=false)], output=[i#0,currentSum#22L])
   Scan ParquetRelation[file:/user/hive/warehouse/testagg][i#0,j1#1]
```

Author: Yin Huai <yhuai@databricks.com>

Closes #8150 from yhuai/SPARK-9920.
2015-08-12 21:24:15 -07:00
Burak Yavuz 2fb4901b71 [SPARK-9916] [BUILD] [SPARKR] removed left-over sparkr.zip copy/create commands from codebase
sparkr.zip is now built by SparkSubmit on a need-to-build basis.

cc shivaram

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #8147 from brkyvz/make-dist-fix.
2015-08-12 20:59:38 -07:00
Xiangrui Meng d7053bea98 [SPARK-9903] [MLLIB] skip local processing in PrefixSpan if there are no small prefixes
There exists a chance that the prefixes keep growing to the maximum pattern length. Then the final local processing step becomes unnecessary. feynmanliang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8136 from mengxr/SPARK-9903.
2015-08-12 20:44:40 -07:00
Joseph K. Bradley d2d5e7fe2d [SPARK-9704] [ML] Made ProbabilisticClassifier, Identifiable, VectorUDT public APIs
Made ProbabilisticClassifier, Identifiable, VectorUDT public.  All are annotated as DeveloperApi.

CC: mengxr EronWright

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #8004 from jkbradley/ml-api-public-items and squashes the following commits:

7ebefda [Joseph K. Bradley] update per code review
7ff0768 [Joseph K. Bradley] attepting to add mima fix
756d84c [Joseph K. Bradley] VectorUDT annotated as AlphaComponent
ae7767d [Joseph K. Bradley] added another warning
94fd553 [Joseph K. Bradley] Made ProbabilisticClassifier, Identifiable, VectorUDT public APIs
2015-08-12 20:43:36 -07:00
Yin Huai 4413d0855a [SPARK-9908] [SQL] When spark.sql.tungsten.enabled is false, broadcast join does not work
https://issues.apache.org/jira/browse/SPARK-9908

Author: Yin Huai <yhuai@databricks.com>

Closes #8149 from yhuai/SPARK-9908.
2015-08-12 20:03:55 -07:00
Davies Liu 7c35746c91 [SPARK-9827] [SQL] fix fd leak in UnsafeRowSerializer
Currently, UnsafeRowSerializer does not close the InputStream, will cause fd leak if the InputStream has an open fd in it.

TODO: the fd could still be leaked, if any items in the stream is not consumed. Currently it replies on GC to close the fd in this case.

cc JoshRosen

Author: Davies Liu <davies@databricks.com>

Closes #8116 from davies/fd_leak.
2015-08-12 20:02:55 -07:00
Josh Rosen 7b13ed27c1 [SPARK-9870] Disable driver UI and Master REST server in SparkSubmitSuite
I think that we should pass additional configuration flags to disable the driver UI and Master REST server in SparkSubmitSuite and HiveSparkSubmitSuite. This might cut down on port-contention-related flakiness in Jenkins.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #8124 from JoshRosen/disable-ui-in-sparksubmitsuite.
2015-08-12 18:52:11 -07:00
Yu ISHIKAWA f4bc01f1f3 [SPARK-9855] [SPARKR] Add expression functions into SparkR whose params are simple
I added lots of expression functions for SparkR. This PR includes only functions whose params  are only `(Column)` or `(Column, Column)`.  And I think we need to improve how to test those functions. However, it would be better to work on another issue.

## Diff Summary

- Add lots of functions in `functions.R` and their generic in `generic.R`
- Add aliases for `ceiling` and `sign`
- Move expression functions from `column.R` to `functions.R`
- Modify `rdname` from `column` to `functions`

I haven't supported `not` function, because the name has a collesion with `testthat` package. I didn't think of the way  to define it.

## New Supported Functions

```
approxCountDistinct
ascii
base64
bin
bitwiseNOT
ceil (alias: ceiling)
crc32
dayofmonth
dayofyear
explode
factorial
hex
hour
initcap
isNaN
last_day
length
log2
ltrim
md5
minute
month
negate
quarter
reverse
round
rtrim
second
sha1
signum (alias: sign)
size
soundex
to_date
trim
unbase64
unhex
weekofyear
year

datediff
levenshtein
months_between
nanvl
pmod
```

## JIRA
[[SPARK-9855] Add expression functions into SparkR whose params are simple - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9855)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8123 from yu-iskw/SPARK-9855.
2015-08-12 18:33:27 -07:00
Rohit Agarwal 0d1d146c22 [SPARK-9724] [WEB UI] Avoid unnecessary redirects in the Spark Web UI.
Author: Rohit Agarwal <rohita@qubole.com>

Closes #8014 from mindprince/SPARK-9724 and squashes the following commits:

a7af5ff [Rohit Agarwal] [SPARK-9724] [WEB UI] Inline attachPrefix and attachPrefixForRedirect. Fix logic of attachPrefix
8a977cd [Rohit Agarwal] [SPARK-9724] [WEB UI] Address review comments: Remove unneeded code, update scaladoc.
b257844 [Rohit Agarwal] [SPARK-9724] [WEB UI] Avoid unnecessary redirects in the Spark Web UI.
2015-08-12 17:48:43 -07:00
cody koeninger 8ce60963cb [SPARK-9780] [STREAMING] [KAFKA] prevent NPE if KafkaRDD instantiation …
…fails

Author: cody koeninger <cody@koeninger.org>

Closes #8133 from koeninger/SPARK-9780 and squashes the following commits:

406259d [cody koeninger] [SPARK-9780][Streaming][Kafka] prevent NPE if KafkaRDD instantiation fails
2015-08-12 17:44:16 -07:00
Michael Armbrust 660e6dcff8 [SPARK-9449] [SQL] Include MetastoreRelation's inputFiles
Author: Michael Armbrust <michael@databricks.com>

Closes #8119 from marmbrus/metastoreInputFiles.
2015-08-12 17:07:29 -07:00
Xiangrui Meng fc1c7fd66e [SPARK-9915] [ML] stopWords should use StringArrayParam
hhbyyh

Author: Xiangrui Meng <meng@databricks.com>

Closes #8141 from mengxr/SPARK-9915.
2015-08-12 17:06:12 -07:00
Xiangrui Meng e6aef55766 [SPARK-9912] [MLLIB] QRDecomposition should use QType and RType for type names instead of UType and VType
hhbyyh

Author: Xiangrui Meng <meng@databricks.com>

Closes #8140 from mengxr/SPARK-9912.
2015-08-12 17:04:31 -07:00
Holden Karau 6e409bc135 [SPARK-9909] [ML] [TRIVIAL] move weightCol to shared params
As per the TODO move weightCol to Shared Params.

Author: Holden Karau <holden@pigscanfly.ca>

Closes #8144 from holdenk/SPARK-9909-move-weightCol-toSharedParams.
2015-08-12 16:54:45 -07:00
Xiangrui Meng caa14d9dc9 [SPARK-9913] [MLLIB] LDAUtils should be private
feynmanliang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8142 from mengxr/SPARK-9913.
2015-08-12 16:53:47 -07:00
Yin Huai 7035d880a0 [SPARK-9894] [SQL] Json writer should handle MapData.
https://issues.apache.org/jira/browse/SPARK-9894

Author: Yin Huai <yhuai@databricks.com>

Closes #8137 from yhuai/jsonMapData.
2015-08-12 16:45:15 -07:00
Michel Lemay ab7e721cfe [SPARK-9826] [CORE] Fix cannot use custom classes in log4j.properties
Refactor Utils class and create ShutdownHookManager.

NOTE: Wasn't able to run /dev/run-tests on windows machine.
Manual tests were conducted locally using custom log4j.properties file with Redis appender and logstash formatter (bundled in the fat-jar submitted to spark)

ex:
log4j.rootCategory=WARN,console,redis
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.spark.graphx.Pregel=INFO

log4j.appender.redis=com.ryantenney.log4j.FailoverRedisAppender
log4j.appender.redis.endpoints=hostname:port
log4j.appender.redis.key=mykey
log4j.appender.redis.alwaysBatch=false
log4j.appender.redis.layout=net.logstash.log4j.JSONEventLayoutV1

Author: michellemay <mlemay@gmail.com>

Closes #8109 from michellemay/SPARK-9826.
2015-08-12 16:41:35 -07:00
Niranjan Padmanabhan 738f353988 [SPARK-9092] Fixed incompatibility when both num-executors and dynamic...
… allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.

Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com>

Closes #7657 from neurons/SPARK-9092.
2015-08-12 16:10:21 -07:00
Reynold Xin a17384fa34 [SPARK-9907] [SQL] Python crc32 is mistakenly calling md5
Author: Reynold Xin <rxin@databricks.com>

Closes #8138 from rxin/SPARK-9907.
2015-08-12 15:27:52 -07:00
Xiangrui Meng 6f60298b1d [SPARK-8967] [DOC] add Since annotation
Add `Since` as a Scala annotation. The benefit is that we can use it without having explicit JavaDoc. This is useful for inherited methods. The limitation is that is doesn't show up in the generated Java API documentation. This might be fixed by modifying genjavadoc. I think we could leave it as a TODO.

This is how the generated Scala doc looks:

`since` JavaDoc tag:

![screen shot 2015-08-11 at 10 00 37 pm](https://cloud.githubusercontent.com/assets/829644/9230761/fa72865c-40d8-11e5-807e-0f3c815c5acd.png)

`Since` annotation:

![screen shot 2015-08-11 at 10 00 28 pm](https://cloud.githubusercontent.com/assets/829644/9230764/0041d7f4-40d9-11e5-8124-c3f3e5d5b31f.png)

rxin

Author: Xiangrui Meng <meng@databricks.com>

Closes #8131 from mengxr/SPARK-8967.
2015-08-12 14:28:23 -07:00