[SPARK-18692][BUILD][DOCS] Test Java 8 unidoc build on Jenkins
## What changes were proposed in this pull request? This PR proposes to run Spark unidoc to test Javadoc 8 build as Javadoc 8 is easily re-breakable. There are several problems with it: - It introduces little extra bit of time to run the tests. In my case, it took 1.5 mins more (`Elapsed :[94.8746569157]`). How it was tested is described in "How was this patch tested?". - > One problem that I noticed was that Unidoc appeared to be processing test sources: if we can find a way to exclude those from being processed in the first place then that might significantly speed things up. (see joshrosen's [comment](https://issues.apache.org/jira/browse/SPARK-18692?focusedCommentId=15947627&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15947627)) To complete this automated build, It also suggests to fix existing Javadoc breaks / ones introduced by test codes as described above. There fixes are similar instances that previously fixed. Please refer https://github.com/apache/spark/pull/15999 and https://github.com/apache/spark/pull/16013 Note that this only fixes **errors** not **warnings**. Please see my observation https://github.com/apache/spark/pull/17389#issuecomment-288438704 for spurious errors by warnings. ## How was this patch tested? Manually via `jekyll build` for building tests. Also, tested via running `./dev/run-tests`. This was tested via manually adding `time.time()` as below: ```diff profiles_and_goals = build_profiles + sbt_goals print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ", " ".join(profiles_and_goals)) + import time + st = time.time() exec_sbt(profiles_and_goals) + print("Elapsed :[%s]" % str(time.time() - st)) ``` produces ``` ... ======================================================================== Building Unidoc API Documentation ======================================================================== ... [info] Main Java API documentation successful. ... Elapsed :[94.8746569157] ... Author: hyukjinkwon <gurwls223@gmail.com> Closes #17477 from HyukjinKwon/SPARK-18692.
This commit is contained in:
parent
2e1fd46e12
commit
ceaf77ae43
|
@ -35,7 +35,7 @@ private[spark] trait RpcEnvFactory {
|
|||
*
|
||||
* The life-cycle of an endpoint is:
|
||||
*
|
||||
* constructor -> onStart -> receive* -> onStop
|
||||
* {@code constructor -> onStart -> receive* -> onStop}
|
||||
*
|
||||
* Note: `receive` can be called concurrently. If you want `receive` to be thread-safe, please use
|
||||
* [[ThreadSafeRpcEndpoint]]
|
||||
|
@ -63,16 +63,16 @@ private[spark] trait RpcEndpoint {
|
|||
}
|
||||
|
||||
/**
|
||||
* Process messages from [[RpcEndpointRef.send]] or [[RpcCallContext.reply)]]. If receiving a
|
||||
* unmatched message, [[SparkException]] will be thrown and sent to `onError`.
|
||||
* Process messages from `RpcEndpointRef.send` or `RpcCallContext.reply`. If receiving a
|
||||
* unmatched message, `SparkException` will be thrown and sent to `onError`.
|
||||
*/
|
||||
def receive: PartialFunction[Any, Unit] = {
|
||||
case _ => throw new SparkException(self + " does not implement 'receive'")
|
||||
}
|
||||
|
||||
/**
|
||||
* Process messages from [[RpcEndpointRef.ask]]. If receiving a unmatched message,
|
||||
* [[SparkException]] will be thrown and sent to `onError`.
|
||||
* Process messages from `RpcEndpointRef.ask`. If receiving a unmatched message,
|
||||
* `SparkException` will be thrown and sent to `onError`.
|
||||
*/
|
||||
def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
|
||||
case _ => context.sendFailure(new SparkException(self + " won't reply anything"))
|
||||
|
|
|
@ -26,7 +26,7 @@ import org.apache.spark.SparkConf
|
|||
import org.apache.spark.util.{ThreadUtils, Utils}
|
||||
|
||||
/**
|
||||
* An exception thrown if RpcTimeout modifies a [[TimeoutException]].
|
||||
* An exception thrown if RpcTimeout modifies a `TimeoutException`.
|
||||
*/
|
||||
private[rpc] class RpcTimeoutException(message: String, cause: TimeoutException)
|
||||
extends TimeoutException(message) { initCause(cause) }
|
||||
|
|
|
@ -607,7 +607,7 @@ class DAGScheduler(
|
|||
* @param resultHandler callback to pass each result to
|
||||
* @param properties scheduler properties to attach to this job, e.g. fair scheduler pool name
|
||||
*
|
||||
* @throws Exception when the job fails
|
||||
* @note Throws `Exception` when the job fails
|
||||
*/
|
||||
def runJob[T, U](
|
||||
rdd: RDD[T],
|
||||
|
@ -644,7 +644,7 @@ class DAGScheduler(
|
|||
*
|
||||
* @param rdd target RDD to run tasks on
|
||||
* @param func a function to run on each partition of the RDD
|
||||
* @param evaluator [[ApproximateEvaluator]] to receive the partial results
|
||||
* @param evaluator `ApproximateEvaluator` to receive the partial results
|
||||
* @param callSite where in the user program this job was called
|
||||
* @param timeout maximum time to wait for the job, in milliseconds
|
||||
* @param properties scheduler properties to attach to this job, e.g. fair scheduler pool name
|
||||
|
|
|
@ -42,7 +42,7 @@ private[spark] trait ExternalClusterManager {
|
|||
|
||||
/**
|
||||
* Create a scheduler backend for the given SparkContext and scheduler. This is
|
||||
* called after task scheduler is created using [[ExternalClusterManager.createTaskScheduler()]].
|
||||
* called after task scheduler is created using `ExternalClusterManager.createTaskScheduler()`.
|
||||
* @param sc SparkContext
|
||||
* @param masterURL the master URL
|
||||
* @param scheduler TaskScheduler that will be used with the scheduler backend.
|
||||
|
|
|
@ -38,7 +38,7 @@ import org.apache.spark.util.{AccumulatorV2, ThreadUtils, Utils}
|
|||
|
||||
/**
|
||||
* Schedules tasks for multiple types of clusters by acting through a SchedulerBackend.
|
||||
* It can also work with a local setup by using a [[LocalSchedulerBackend]] and setting
|
||||
* It can also work with a local setup by using a `LocalSchedulerBackend` and setting
|
||||
* isLocal to true. It handles common logic, like determining a scheduling order across jobs, waking
|
||||
* up to launch speculative tasks, etc.
|
||||
*
|
||||
|
@ -708,8 +708,8 @@ private[spark] object TaskSchedulerImpl {
|
|||
* offers are ordered such that we'll allocate one container on each host before allocating a
|
||||
* second container on any host, and so on, in order to reduce the damage if a host fails.
|
||||
*
|
||||
* For example, given <h1, [o1, o2, o3]>, <h2, [o4]>, <h1, [o5, o6]>, returns
|
||||
* [o1, o5, o4, 02, o6, o3]
|
||||
* For example, given {@literal <h1, [o1, o2, o3]>}, {@literal <h2, [o4]>} and
|
||||
* {@literal <h3, [o5, o6]>}, returns {@literal [o1, o5, o4, o2, o6, o3]}.
|
||||
*/
|
||||
def prioritizeContainers[K, T] (map: HashMap[K, ArrayBuffer[T]]): List[T] = {
|
||||
val _keyList = new ArrayBuffer[K](map.size)
|
||||
|
|
|
@ -66,7 +66,7 @@ private[spark] trait BlockData {
|
|||
/**
|
||||
* Returns a Netty-friendly wrapper for the block's data.
|
||||
*
|
||||
* @see [[ManagedBuffer#convertToNetty()]]
|
||||
* Please see `ManagedBuffer.convertToNetty()` for more details.
|
||||
*/
|
||||
def toNetty(): Object
|
||||
|
||||
|
|
|
@ -243,7 +243,7 @@ private[spark] object AccumulatorSuite {
|
|||
import InternalAccumulator._
|
||||
|
||||
/**
|
||||
* Create a long accumulator and register it to [[AccumulatorContext]].
|
||||
* Create a long accumulator and register it to `AccumulatorContext`.
|
||||
*/
|
||||
def createLongAccum(
|
||||
name: String,
|
||||
|
@ -258,7 +258,7 @@ private[spark] object AccumulatorSuite {
|
|||
}
|
||||
|
||||
/**
|
||||
* Make an [[AccumulableInfo]] out of an [[Accumulable]] with the intent to use the
|
||||
* Make an `AccumulableInfo` out of an [[Accumulable]] with the intent to use the
|
||||
* info as an accumulator update.
|
||||
*/
|
||||
def makeInfo(a: AccumulatorV2[_, _]): AccumulableInfo = a.toInfo(Some(a.value), None)
|
||||
|
|
|
@ -27,7 +27,7 @@ import org.apache.spark.network.shuffle.{ExternalShuffleBlockHandler, ExternalSh
|
|||
/**
|
||||
* This suite creates an external shuffle server and routes all shuffle fetches through it.
|
||||
* Note that failures in this suite may arise due to changes in Spark that invalidate expectations
|
||||
* set up in [[ExternalShuffleBlockHandler]], such as changing the format of shuffle files or how
|
||||
* set up in `ExternalShuffleBlockHandler`, such as changing the format of shuffle files or how
|
||||
* we hash files into folders.
|
||||
*/
|
||||
class ExternalShuffleServiceSuite extends ShuffleSuite with BeforeAndAfterAll {
|
||||
|
|
|
@ -22,7 +22,7 @@ import org.scalatest.BeforeAndAfterAll
|
|||
import org.scalatest.BeforeAndAfterEach
|
||||
import org.scalatest.Suite
|
||||
|
||||
/** Manages a local `sc` {@link SparkContext} variable, correctly stopping it after each test. */
|
||||
/** Manages a local `sc` `SparkContext` variable, correctly stopping it after each test. */
|
||||
trait LocalSparkContext extends BeforeAndAfterEach with BeforeAndAfterAll { self: Suite =>
|
||||
|
||||
@transient var sc: SparkContext = _
|
||||
|
|
|
@ -95,12 +95,12 @@ abstract class SchedulerIntegrationSuite[T <: MockBackend: ClassTag] extends Spa
|
|||
}
|
||||
|
||||
/**
|
||||
* A map from partition -> results for all tasks of a job when you call this test framework's
|
||||
* A map from partition to results for all tasks of a job when you call this test framework's
|
||||
* [[submit]] method. Two important considerations:
|
||||
*
|
||||
* 1. If there is a job failure, results may or may not be empty. If any tasks succeed before
|
||||
* the job has failed, they will get included in `results`. Instead, check for job failure by
|
||||
* checking [[failure]]. (Also see [[assertDataStructuresEmpty()]])
|
||||
* checking [[failure]]. (Also see `assertDataStructuresEmpty()`)
|
||||
*
|
||||
* 2. This only gets cleared between tests. So you'll need to do special handling if you submit
|
||||
* more than one job in one test.
|
||||
|
|
|
@ -29,7 +29,7 @@ import org.apache.spark.serializer.KryoTest.RegistratorWithoutAutoReset
|
|||
/**
|
||||
* Tests to ensure that [[Serializer]] implementations obey the API contracts for methods that
|
||||
* describe properties of the serialized stream, such as
|
||||
* [[Serializer.supportsRelocationOfSerializedObjects]].
|
||||
* `Serializer.supportsRelocationOfSerializedObjects`.
|
||||
*/
|
||||
class SerializerPropertiesSuite extends SparkFunSuite {
|
||||
|
||||
|
|
|
@ -344,6 +344,19 @@ def build_spark_sbt(hadoop_version):
|
|||
exec_sbt(profiles_and_goals)
|
||||
|
||||
|
||||
def build_spark_unidoc_sbt(hadoop_version):
|
||||
set_title_and_block("Building Unidoc API Documentation", "BLOCK_DOCUMENTATION")
|
||||
# Enable all of the profiles for the build:
|
||||
build_profiles = get_hadoop_profiles(hadoop_version) + modules.root.build_profile_flags
|
||||
sbt_goals = ["unidoc"]
|
||||
profiles_and_goals = build_profiles + sbt_goals
|
||||
|
||||
print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ",
|
||||
" ".join(profiles_and_goals))
|
||||
|
||||
exec_sbt(profiles_and_goals)
|
||||
|
||||
|
||||
def build_spark_assembly_sbt(hadoop_version):
|
||||
# Enable all of the profiles for the build:
|
||||
build_profiles = get_hadoop_profiles(hadoop_version) + modules.root.build_profile_flags
|
||||
|
@ -352,6 +365,8 @@ def build_spark_assembly_sbt(hadoop_version):
|
|||
print("[info] Building Spark assembly (w/Hive 1.2.1) using SBT with these arguments: ",
|
||||
" ".join(profiles_and_goals))
|
||||
exec_sbt(profiles_and_goals)
|
||||
# Make sure that Java and Scala API documentation can be generated
|
||||
build_spark_unidoc_sbt(hadoop_version)
|
||||
|
||||
|
||||
def build_apache_spark(build_tool, hadoop_version):
|
||||
|
|
|
@ -21,7 +21,7 @@ import org.apache.spark.SparkConf
|
|||
import org.apache.spark.SparkContext
|
||||
|
||||
/**
|
||||
* Provides a method to run tests against a {@link SparkContext} variable that is correctly stopped
|
||||
* Provides a method to run tests against a `SparkContext` variable that is correctly stopped
|
||||
* after each test.
|
||||
*/
|
||||
trait LocalSparkContext {
|
||||
|
|
|
@ -74,7 +74,7 @@ abstract class Classifier[
|
|||
* and features (`Vector`).
|
||||
* @param numClasses Number of classes label can take. Labels must be integers in the range
|
||||
* [0, numClasses).
|
||||
* @throws SparkException if any label is not an integer >= 0
|
||||
* @note Throws `SparkException` if any label is a non-integer or is negative
|
||||
*/
|
||||
protected def extractLabeledPoints(dataset: Dataset[_], numClasses: Int): RDD[LabeledPoint] = {
|
||||
require(numClasses > 0, s"Classifier (in extractLabeledPoints) found numClasses =" +
|
||||
|
|
|
@ -230,7 +230,9 @@ class PipelineSuite extends SparkFunSuite with MLlibTestSparkContext with Defaul
|
|||
}
|
||||
|
||||
|
||||
/** Used to test [[Pipeline]] with [[MLWritable]] stages */
|
||||
/**
|
||||
* Used to test [[Pipeline]] with `MLWritable` stages
|
||||
*/
|
||||
class WritableStage(override val uid: String) extends Transformer with MLWritable {
|
||||
|
||||
final val intParam: IntParam = new IntParam(this, "intParam", "doc")
|
||||
|
@ -257,7 +259,9 @@ object WritableStage extends MLReadable[WritableStage] {
|
|||
override def load(path: String): WritableStage = super.load(path)
|
||||
}
|
||||
|
||||
/** Used to test [[Pipeline]] with non-[[MLWritable]] stages */
|
||||
/**
|
||||
* Used to test [[Pipeline]] with non-`MLWritable` stages
|
||||
*/
|
||||
class UnWritableStage(override val uid: String) extends Transformer {
|
||||
|
||||
final val intParam: IntParam = new IntParam(this, "intParam", "doc")
|
||||
|
|
|
@ -29,8 +29,10 @@ private[ml] object LSHTest {
|
|||
* the following property is satisfied.
|
||||
*
|
||||
* There exist dist1, dist2, p1, p2, so that for any two elements e1 and e2,
|
||||
* If dist(e1, e2) <= dist1, then Pr{h(x) == h(y)} >= p1
|
||||
* If dist(e1, e2) >= dist2, then Pr{h(x) == h(y)} <= p2
|
||||
* If dist(e1, e2) is less than or equal to dist1, then Pr{h(x) == h(y)} is greater than
|
||||
* or equal to p1
|
||||
* If dist(e1, e2) is greater than or equal to dist2, then Pr{h(x) == h(y)} is less than
|
||||
* or equal to p2
|
||||
*
|
||||
* This is called locality sensitive property. This method checks the property on an
|
||||
* existing dataset and calculate the probabilities.
|
||||
|
@ -38,8 +40,10 @@ private[ml] object LSHTest {
|
|||
*
|
||||
* This method hashes each elements to hash buckets using LSH, and calculate the false positive
|
||||
* and false negative:
|
||||
* False positive: Of all (e1, e2) sharing any bucket, the probability of dist(e1, e2) > distFP
|
||||
* False negative: Of all (e1, e2) not sharing buckets, the probability of dist(e1, e2) < distFN
|
||||
* False positive: Of all (e1, e2) sharing any bucket, the probability of dist(e1, e2) is greater
|
||||
* than distFP
|
||||
* False negative: Of all (e1, e2) not sharing buckets, the probability of dist(e1, e2) is less
|
||||
* than distFN
|
||||
*
|
||||
* @param dataset The dataset to verify the locality sensitive hashing property.
|
||||
* @param lsh The lsh instance to perform the hashing
|
||||
|
|
|
@ -377,7 +377,7 @@ class ParamsSuite extends SparkFunSuite {
|
|||
object ParamsSuite extends SparkFunSuite {
|
||||
|
||||
/**
|
||||
* Checks common requirements for [[Params.params]]:
|
||||
* Checks common requirements for `Params.params`:
|
||||
* - params are ordered by names
|
||||
* - param parent has the same UID as the object's UID
|
||||
* - param name is the same as the param method name
|
||||
|
|
|
@ -34,7 +34,7 @@ private[ml] object TreeTests extends SparkFunSuite {
|
|||
* Convert the given data to a DataFrame, and set the features and label metadata.
|
||||
* @param data Dataset. Categorical features and labels must already have 0-based indices.
|
||||
* This must be non-empty.
|
||||
* @param categoricalFeatures Map: categorical feature index -> number of distinct values
|
||||
* @param categoricalFeatures Map: categorical feature index to number of distinct values
|
||||
* @param numClasses Number of classes label can take. If 0, mark as continuous.
|
||||
* @return DataFrame with metadata
|
||||
*/
|
||||
|
@ -69,7 +69,9 @@ private[ml] object TreeTests extends SparkFunSuite {
|
|||
df("label").as("label", labelMetadata))
|
||||
}
|
||||
|
||||
/** Java-friendly version of [[setMetadata()]] */
|
||||
/**
|
||||
* Java-friendly version of `setMetadata()`
|
||||
*/
|
||||
def setMetadata(
|
||||
data: JavaRDD[LabeledPoint],
|
||||
categoricalFeatures: java.util.Map[java.lang.Integer, java.lang.Integer],
|
||||
|
|
|
@ -81,20 +81,20 @@ trait DefaultReadWriteTest extends TempDirectory { self: Suite =>
|
|||
/**
|
||||
* Default test for Estimator, Model pairs:
|
||||
* - Explicitly set Params, and train model
|
||||
* - Test save/load using [[testDefaultReadWrite()]] on Estimator and Model
|
||||
* - Test save/load using `testDefaultReadWrite` on Estimator and Model
|
||||
* - Check Params on Estimator and Model
|
||||
* - Compare model data
|
||||
*
|
||||
* This requires that [[Model]]'s [[Param]]s should be a subset of [[Estimator]]'s [[Param]]s.
|
||||
* This requires that `Model`'s `Param`s should be a subset of `Estimator`'s `Param`s.
|
||||
*
|
||||
* @param estimator Estimator to test
|
||||
* @param dataset Dataset to pass to [[Estimator.fit()]]
|
||||
* @param testEstimatorParams Set of [[Param]] values to set in estimator
|
||||
* @param testModelParams Set of [[Param]] values to set in model
|
||||
* @param checkModelData Method which takes the original and loaded [[Model]] and compares their
|
||||
* data. This method does not need to check [[Param]] values.
|
||||
* @tparam E Type of [[Estimator]]
|
||||
* @tparam M Type of [[Model]] produced by estimator
|
||||
* @param dataset Dataset to pass to `Estimator.fit()`
|
||||
* @param testEstimatorParams Set of `Param` values to set in estimator
|
||||
* @param testModelParams Set of `Param` values to set in model
|
||||
* @param checkModelData Method which takes the original and loaded `Model` and compares their
|
||||
* data. This method does not need to check `Param` values.
|
||||
* @tparam E Type of `Estimator`
|
||||
* @tparam M Type of `Model` produced by estimator
|
||||
*/
|
||||
def testEstimatorAndModelReadWrite[
|
||||
E <: Estimator[M] with MLWritable, M <: Model[M] with MLWritable](
|
||||
|
|
|
@ -105,8 +105,8 @@ class StopwatchSuite extends SparkFunSuite with MLlibTestSparkContext {
|
|||
private object StopwatchSuite extends SparkFunSuite {
|
||||
|
||||
/**
|
||||
* Checks the input stopwatch on a task that takes a random time (<10ms) to finish. Validates and
|
||||
* returns the duration reported by the stopwatch.
|
||||
* Checks the input stopwatch on a task that takes a random time (less than 10ms) to finish.
|
||||
* Validates and returns the duration reported by the stopwatch.
|
||||
*/
|
||||
def checkStopwatch(sw: Stopwatch): Long = {
|
||||
val ubStart = now
|
||||
|
|
|
@ -30,7 +30,9 @@ trait TempDirectory extends BeforeAndAfterAll { self: Suite =>
|
|||
|
||||
private var _tempDir: File = _
|
||||
|
||||
/** Returns the temporary directory as a [[File]] instance. */
|
||||
/**
|
||||
* Returns the temporary directory as a `File` instance.
|
||||
*/
|
||||
protected def tempDir: File = _tempDir
|
||||
|
||||
override def beforeAll(): Unit = {
|
||||
|
|
|
@ -21,7 +21,7 @@ import org.apache.spark.SparkFunSuite
|
|||
import org.apache.spark.mllib.tree.impurity.{EntropyAggregator, GiniAggregator}
|
||||
|
||||
/**
|
||||
* Test suites for [[GiniAggregator]] and [[EntropyAggregator]].
|
||||
* Test suites for `GiniAggregator` and `EntropyAggregator`.
|
||||
*/
|
||||
class ImpuritySuite extends SparkFunSuite {
|
||||
test("Gini impurity does not support negative labels") {
|
||||
|
|
|
@ -60,7 +60,7 @@ trait MLlibTestSparkContext extends TempDirectory { self: Suite =>
|
|||
* A helper object for importing SQL implicits.
|
||||
*
|
||||
* Note that the alternative of importing `spark.implicits._` is not possible here.
|
||||
* This is because we create the [[SQLContext]] immediately before the first test is run,
|
||||
* This is because we create the `SQLContext` immediately before the first test is run,
|
||||
* but the implicits import is needed in the constructor.
|
||||
*/
|
||||
protected object testImplicits extends SQLImplicits {
|
||||
|
|
|
@ -239,7 +239,7 @@ trait MesosSchedulerUtils extends Logging {
|
|||
}
|
||||
|
||||
/**
|
||||
* Converts the attributes from the resource offer into a Map of name -> Attribute Value
|
||||
* Converts the attributes from the resource offer into a Map of name to Attribute Value
|
||||
* The attribute values are the mesos attribute types and they are
|
||||
*
|
||||
* @param offerAttributes the attributes offered
|
||||
|
@ -296,7 +296,7 @@ trait MesosSchedulerUtils extends Logging {
|
|||
|
||||
/**
|
||||
* Parses the attributes constraints provided to spark and build a matching data struct:
|
||||
* Map[<attribute-name>, Set[values-to-match]]
|
||||
* {@literal Map[<attribute-name>, Set[values-to-match]}
|
||||
* The constraints are specified as ';' separated key-value pairs where keys and values
|
||||
* are separated by ':'. The ':' implies equality (for singular values) and "is one of" for
|
||||
* multiple values (comma separated). For example:
|
||||
|
@ -354,7 +354,7 @@ trait MesosSchedulerUtils extends Logging {
|
|||
* container overheads.
|
||||
*
|
||||
* @param sc SparkContext to use to get `spark.mesos.executor.memoryOverhead` value
|
||||
* @return memory requirement as (0.1 * <memoryOverhead>) or MEMORY_OVERHEAD_MINIMUM
|
||||
* @return memory requirement as (0.1 * memoryOverhead) or MEMORY_OVERHEAD_MINIMUM
|
||||
* (whichever is larger)
|
||||
*/
|
||||
def executorMemory(sc: SparkContext): Int = {
|
||||
|
|
|
@ -117,11 +117,11 @@ object RandomDataGenerator {
|
|||
}
|
||||
|
||||
/**
|
||||
* Returns a function which generates random values for the given [[DataType]], or `None` if no
|
||||
* Returns a function which generates random values for the given `DataType`, or `None` if no
|
||||
* random data generator is defined for that data type. The generated values will use an external
|
||||
* representation of the data type; for example, the random generator for [[DateType]] will return
|
||||
* instances of [[java.sql.Date]] and the generator for [[StructType]] will return a [[Row]].
|
||||
* For a [[UserDefinedType]] for a class X, an instance of class X is returned.
|
||||
* representation of the data type; for example, the random generator for `DateType` will return
|
||||
* instances of [[java.sql.Date]] and the generator for `StructType` will return a [[Row]].
|
||||
* For a `UserDefinedType` for a class X, an instance of class X is returned.
|
||||
*
|
||||
* @param dataType the type to generate values for
|
||||
* @param nullable whether null values should be generated
|
||||
|
|
|
@ -24,7 +24,7 @@ import org.apache.spark.sql.types._
|
|||
import org.apache.spark.util.Benchmark
|
||||
|
||||
/**
|
||||
* Benchmark [[UnsafeProjection]] for fixed-length/primitive-type fields.
|
||||
* Benchmark `UnsafeProjection` for fixed-length/primitive-type fields.
|
||||
*/
|
||||
object UnsafeProjectionBenchmark {
|
||||
|
||||
|
|
|
@ -510,7 +510,7 @@ abstract class Catalog {
|
|||
def refreshTable(tableName: String): Unit
|
||||
|
||||
/**
|
||||
* Invalidates and refreshes all the cached data (and the associated metadata) for any [[Dataset]]
|
||||
* Invalidates and refreshes all the cached data (and the associated metadata) for any `Dataset`
|
||||
* that contains the given data source path. Path matching is by prefix, i.e. "/" would invalidate
|
||||
* everything that is cached.
|
||||
*
|
||||
|
|
|
@ -56,7 +56,9 @@ object TestRegistrator {
|
|||
def apply(): TestRegistrator = new TestRegistrator()
|
||||
}
|
||||
|
||||
/** A [[Serializer]] that takes a [[KryoData]] and serializes it as KryoData(0). */
|
||||
/**
|
||||
* A `Serializer` that takes a [[KryoData]] and serializes it as KryoData(0).
|
||||
*/
|
||||
class ZeroKryoDataSerializer extends Serializer[KryoData] {
|
||||
override def write(kryo: Kryo, output: Output, t: KryoData): Unit = {
|
||||
output.writeInt(0)
|
||||
|
|
|
@ -44,8 +44,8 @@ abstract class FileStreamSourceTest
|
|||
import testImplicits._
|
||||
|
||||
/**
|
||||
* A subclass [[AddData]] for adding data to files. This is meant to use the
|
||||
* [[FileStreamSource]] actually being used in the execution.
|
||||
* A subclass `AddData` for adding data to files. This is meant to use the
|
||||
* `FileStreamSource` actually being used in the execution.
|
||||
*/
|
||||
abstract class AddFileData extends AddData {
|
||||
override def addData(query: Option[StreamExecution]): (Source, Offset) = {
|
||||
|
|
|
@ -569,7 +569,7 @@ class ThrowingIOExceptionLikeHadoop12074 extends FakeSource {
|
|||
|
||||
object ThrowingIOExceptionLikeHadoop12074 {
|
||||
/**
|
||||
* A latch to allow the user to wait until [[ThrowingIOExceptionLikeHadoop12074.createSource]] is
|
||||
* A latch to allow the user to wait until `ThrowingIOExceptionLikeHadoop12074.createSource` is
|
||||
* called.
|
||||
*/
|
||||
@volatile var createSourceLatch: CountDownLatch = null
|
||||
|
@ -600,7 +600,7 @@ class ThrowingInterruptedIOException extends FakeSource {
|
|||
|
||||
object ThrowingInterruptedIOException {
|
||||
/**
|
||||
* A latch to allow the user to wait until [[ThrowingInterruptedIOException.createSource]] is
|
||||
* A latch to allow the user to wait until `ThrowingInterruptedIOException.createSource` is
|
||||
* called.
|
||||
*/
|
||||
@volatile var createSourceLatch: CountDownLatch = null
|
||||
|
|
|
@ -642,8 +642,10 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter with Logging wi
|
|||
*
|
||||
* @param expectedBehavior Expected behavior (not blocked, blocked, or exception thrown)
|
||||
* @param timeoutMs Timeout in milliseconds
|
||||
* When timeoutMs <= 0, awaitTermination() is tested (i.e. w/o timeout)
|
||||
* When timeoutMs > 0, awaitTermination(timeoutMs) is tested
|
||||
* When timeoutMs is less than or equal to 0, awaitTermination() is
|
||||
* tested (i.e. w/o timeout)
|
||||
* When timeoutMs is greater than 0, awaitTermination(timeoutMs) is
|
||||
* tested
|
||||
* @param expectedReturnValue Expected return value when awaitTermination(timeoutMs) is used
|
||||
*/
|
||||
case class TestAwaitTermination(
|
||||
|
@ -667,8 +669,10 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter with Logging wi
|
|||
*
|
||||
* @param expectedBehavior Expected behavior (not blocked, blocked, or exception thrown)
|
||||
* @param timeoutMs Timeout in milliseconds
|
||||
* When timeoutMs <= 0, awaitTermination() is tested (i.e. w/o timeout)
|
||||
* When timeoutMs > 0, awaitTermination(timeoutMs) is tested
|
||||
* When timeoutMs is less than or equal to 0, awaitTermination() is
|
||||
* tested (i.e. w/o timeout)
|
||||
* When timeoutMs is greater than 0, awaitTermination(timeoutMs) is
|
||||
* tested
|
||||
* @param expectedReturnValue Expected return value when awaitTermination(timeoutMs) is used
|
||||
*/
|
||||
def assertOnQueryCondition(
|
||||
|
|
|
@ -41,11 +41,11 @@ import org.apache.spark.util.{UninterruptibleThread, Utils}
|
|||
/**
|
||||
* Helper trait that should be extended by all SQL test suites.
|
||||
*
|
||||
* This allows subclasses to plugin a custom [[SQLContext]]. It comes with test data
|
||||
* This allows subclasses to plugin a custom `SQLContext`. It comes with test data
|
||||
* prepared in advance as well as all implicit conversions used extensively by dataframes.
|
||||
* To use implicit methods, import `testImplicits._` instead of through the [[SQLContext]].
|
||||
* To use implicit methods, import `testImplicits._` instead of through the `SQLContext`.
|
||||
*
|
||||
* Subclasses should *not* create [[SQLContext]]s in the test suite constructor, which is
|
||||
* Subclasses should *not* create `SQLContext`s in the test suite constructor, which is
|
||||
* prone to leaving multiple overlapping [[org.apache.spark.SparkContext]]s in the same JVM.
|
||||
*/
|
||||
private[sql] trait SQLTestUtils
|
||||
|
@ -65,7 +65,7 @@ private[sql] trait SQLTestUtils
|
|||
* A helper object for importing SQL implicits.
|
||||
*
|
||||
* Note that the alternative of importing `spark.implicits._` is not possible here.
|
||||
* This is because we create the [[SQLContext]] immediately before the first test is run,
|
||||
* This is because we create the `SQLContext` immediately before the first test is run,
|
||||
* but the implicits import is needed in the constructor.
|
||||
*/
|
||||
protected object testImplicits extends SQLImplicits {
|
||||
|
@ -73,7 +73,7 @@ private[sql] trait SQLTestUtils
|
|||
}
|
||||
|
||||
/**
|
||||
* Materialize the test data immediately after the [[SQLContext]] is set up.
|
||||
* Materialize the test data immediately after the `SQLContext` is set up.
|
||||
* This is necessary if the data is accessed by name but not through direct reference.
|
||||
*/
|
||||
protected def setupTestData(): Unit = {
|
||||
|
@ -250,8 +250,8 @@ private[sql] trait SQLTestUtils
|
|||
}
|
||||
|
||||
/**
|
||||
* Turn a logical plan into a [[DataFrame]]. This should be removed once we have an easier
|
||||
* way to construct [[DataFrame]] directly out of local data without relying on implicits.
|
||||
* Turn a logical plan into a `DataFrame`. This should be removed once we have an easier
|
||||
* way to construct `DataFrame` directly out of local data without relying on implicits.
|
||||
*/
|
||||
protected implicit def logicalPlanToSparkQuery(plan: LogicalPlan): DataFrame = {
|
||||
Dataset.ofRows(spark, plan)
|
||||
|
@ -271,7 +271,9 @@ private[sql] trait SQLTestUtils
|
|||
}
|
||||
}
|
||||
|
||||
/** Run a test on a separate [[UninterruptibleThread]]. */
|
||||
/**
|
||||
* Run a test on a separate `UninterruptibleThread`.
|
||||
*/
|
||||
protected def testWithUninterruptibleThread(name: String, quietly: Boolean = false)
|
||||
(body: => Unit): Unit = {
|
||||
val timeoutMillis = 10000
|
||||
|
|
|
@ -22,7 +22,7 @@ import org.apache.spark.sql.SparkSession
|
|||
import org.apache.spark.sql.internal.{SessionState, SessionStateBuilder, SQLConf, WithTestConf}
|
||||
|
||||
/**
|
||||
* A special [[SparkSession]] prepared for testing.
|
||||
* A special `SparkSession` prepared for testing.
|
||||
*/
|
||||
private[sql] class TestSparkSession(sc: SparkContext) extends SparkSession(sc) { self =>
|
||||
def this(sparkConf: SparkConf) {
|
||||
|
|
|
@ -49,7 +49,7 @@ public interface Service {
|
|||
* The transition must be from {@link STATE#NOTINITED} to {@link STATE#INITED} unless the
|
||||
* operation failed and an exception was raised.
|
||||
*
|
||||
* @param config
|
||||
* @param conf
|
||||
* the configuration of the service
|
||||
*/
|
||||
void init(HiveConf conf);
|
||||
|
|
|
@ -51,7 +51,7 @@ public final class ServiceOperations {
|
|||
|
||||
/**
|
||||
* Initialize a service.
|
||||
* <p/>
|
||||
*
|
||||
* The service state is checked <i>before</i> the operation begins.
|
||||
* This process is <i>not</i> thread safe.
|
||||
* @param service a service that must be in the state
|
||||
|
@ -69,7 +69,7 @@ public final class ServiceOperations {
|
|||
|
||||
/**
|
||||
* Start a service.
|
||||
* <p/>
|
||||
*
|
||||
* The service state is checked <i>before</i> the operation begins.
|
||||
* This process is <i>not</i> thread safe.
|
||||
* @param service a service that must be in the state
|
||||
|
@ -86,7 +86,7 @@ public final class ServiceOperations {
|
|||
|
||||
/**
|
||||
* Initialize then start a service.
|
||||
* <p/>
|
||||
*
|
||||
* The service state is checked <i>before</i> the operation begins.
|
||||
* This process is <i>not</i> thread safe.
|
||||
* @param service a service that must be in the state
|
||||
|
@ -102,9 +102,9 @@ public final class ServiceOperations {
|
|||
|
||||
/**
|
||||
* Stop a service.
|
||||
* <p/>Do nothing if the service is null or not
|
||||
* in a state in which it can be/needs to be stopped.
|
||||
* <p/>
|
||||
*
|
||||
* Do nothing if the service is null or not in a state in which it can be/needs to be stopped.
|
||||
*
|
||||
* The service state is checked <i>before</i> the operation begins.
|
||||
* This process is <i>not</i> thread safe.
|
||||
* @param service a service or null
|
||||
|
|
|
@ -89,7 +89,7 @@ public final class HttpAuthUtils {
|
|||
* @param clientUserName Client User name.
|
||||
* @return An unsigned cookie token generated from input parameters.
|
||||
* The final cookie generated is of the following format :
|
||||
* cu=<username>&rn=<randomNumber>&s=<cookieSignature>
|
||||
* {@code cu=<username>&rn=<randomNumber>&s=<cookieSignature>}
|
||||
*/
|
||||
public static String createCookieToken(String clientUserName) {
|
||||
StringBuffer sb = new StringBuffer();
|
||||
|
|
|
@ -26,7 +26,7 @@ public interface PasswdAuthenticationProvider {
|
|||
* to authenticate users for their requests.
|
||||
* If a user is to be granted, return nothing/throw nothing.
|
||||
* When a user is to be disallowed, throw an appropriate {@link AuthenticationException}.
|
||||
* <p/>
|
||||
*
|
||||
* For an example implementation, see {@link LdapAuthenticationProviderImpl}.
|
||||
*
|
||||
* @param user The username received over the connection request
|
||||
|
|
|
@ -31,12 +31,9 @@ import org.slf4j.LoggerFactory;
|
|||
|
||||
/**
|
||||
* This class is responsible for setting the ipAddress for operations executed via HiveServer2.
|
||||
* <p>
|
||||
* <ul>
|
||||
* <li>IP address is only set for operations that calls listeners with hookContext</li>
|
||||
* <li>IP address is only set if the underlying transport mechanism is socket</li>
|
||||
* </ul>
|
||||
* </p>
|
||||
*
|
||||
* - IP address is only set for operations that calls listeners with hookContext
|
||||
* - IP address is only set if the underlying transport mechanism is socket
|
||||
*
|
||||
* @see org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext
|
||||
*/
|
||||
|
|
|
@ -38,7 +38,7 @@ public class CLIServiceUtils {
|
|||
* Convert a SQL search pattern into an equivalent Java Regex.
|
||||
*
|
||||
* @param pattern input which may contain '%' or '_' wildcard characters, or
|
||||
* these characters escaped using {@link #getSearchStringEscape()}.
|
||||
* these characters escaped using {@code getSearchStringEscape()}.
|
||||
* @return replace %/_ with regex search characters, also handle escaped
|
||||
* characters.
|
||||
*/
|
||||
|
|
|
@ -28,9 +28,9 @@ import org.apache.hadoop.hive.metastore.TableType;
|
|||
/**
|
||||
* ClassicTableTypeMapping.
|
||||
* Classic table type mapping :
|
||||
* Managed Table ==> Table
|
||||
* External Table ==> Table
|
||||
* Virtual View ==> View
|
||||
* Managed Table to Table
|
||||
* External Table to Table
|
||||
* Virtual View to View
|
||||
*/
|
||||
public class ClassicTableTypeMapping implements TableTypeMapping {
|
||||
|
||||
|
|
|
@ -31,7 +31,7 @@ public interface TableTypeMapping {
|
|||
|
||||
/**
|
||||
* Map hive's table type name to client's table type
|
||||
* @param clientTypeName
|
||||
* @param hiveTypeName
|
||||
* @return
|
||||
*/
|
||||
String mapToClientType(String hiveTypeName);
|
||||
|
|
|
@ -224,7 +224,9 @@ public class SessionManager extends CompositeService {
|
|||
* The username passed to this method is the effective username.
|
||||
* If withImpersonation is true (==doAs true) we wrap all the calls in HiveSession
|
||||
* within a UGI.doAs, where UGI corresponds to the effective user.
|
||||
* @see org.apache.hive.service.cli.thrift.ThriftCLIService#getUserName()
|
||||
*
|
||||
* Please see {@code org.apache.hive.service.cli.thrift.ThriftCLIService.getUserName()} for
|
||||
* more details.
|
||||
*
|
||||
* @param protocol
|
||||
* @param username
|
||||
|
|
|
@ -30,12 +30,12 @@ import org.apache.hadoop.hive.metastore.RawStore;
|
|||
* in custom cleanup code to be called before this thread is GC-ed.
|
||||
* Currently cleans up the following:
|
||||
* 1. ThreadLocal RawStore object:
|
||||
* In case of an embedded metastore, HiveServer2 threads (foreground & background)
|
||||
* In case of an embedded metastore, HiveServer2 threads (foreground and background)
|
||||
* end up caching a ThreadLocal RawStore object. The ThreadLocal RawStore object has
|
||||
* an instance of PersistenceManagerFactory & PersistenceManager.
|
||||
* an instance of PersistenceManagerFactory and PersistenceManager.
|
||||
* The PersistenceManagerFactory keeps a cache of PersistenceManager objects,
|
||||
* which are only removed when PersistenceManager#close method is called.
|
||||
* HiveServer2 uses ExecutorService for managing thread pools for foreground & background threads.
|
||||
* HiveServer2 uses ExecutorService for managing thread pools for foreground and background threads.
|
||||
* ExecutorService unfortunately does not provide any hooks to be called,
|
||||
* when a thread from the pool is terminated.
|
||||
* As a solution, we're using this ThreadFactory to keep a cache of RawStore objects per thread.
|
||||
|
|
|
@ -53,8 +53,8 @@ import org.apache.spark.unsafe.types.UTF8String
|
|||
* java.sql.Date
|
||||
* java.sql.Timestamp
|
||||
* Complex Types =>
|
||||
* Map: [[MapData]]
|
||||
* List: [[ArrayData]]
|
||||
* Map: `MapData`
|
||||
* List: `ArrayData`
|
||||
* Struct: [[org.apache.spark.sql.catalyst.InternalRow]]
|
||||
* Union: NOT SUPPORTED YET
|
||||
* The Complex types plays as a container, which can hold arbitrary data types.
|
||||
|
|
|
@ -24,7 +24,7 @@ import org.apache.spark.sql.catalyst.util._
|
|||
/**
|
||||
* A framework for running the query tests that are listed as a set of text files.
|
||||
*
|
||||
* TestSuites that derive from this class must provide a map of testCaseName -> testCaseFiles
|
||||
* TestSuites that derive from this class must provide a map of testCaseName to testCaseFiles
|
||||
* that should be included. Additionally, there is support for whitelisting and blacklisting
|
||||
* tests as development progresses.
|
||||
*/
|
||||
|
|
|
@ -43,7 +43,7 @@ private[sql] trait OrcTest extends SQLTestUtils with TestHiveSingleton {
|
|||
}
|
||||
|
||||
/**
|
||||
* Writes `data` to a Orc file and reads it back as a [[DataFrame]],
|
||||
* Writes `data` to a Orc file and reads it back as a `DataFrame`,
|
||||
* which is then passed to `f`. The Orc file will be deleted after `f` returns.
|
||||
*/
|
||||
protected def withOrcDataFrame[T <: Product: ClassTag: TypeTag]
|
||||
|
@ -53,7 +53,7 @@ private[sql] trait OrcTest extends SQLTestUtils with TestHiveSingleton {
|
|||
}
|
||||
|
||||
/**
|
||||
* Writes `data` to a Orc file, reads it back as a [[DataFrame]] and registers it as a
|
||||
* Writes `data` to a Orc file, reads it back as a `DataFrame` and registers it as a
|
||||
* temporary table named `tableName`, then call `f`. The temporary table together with the
|
||||
* Orc file will be dropped/deleted after `f` returns.
|
||||
*/
|
||||
|
|
|
@ -29,7 +29,7 @@ import org.apache.spark.streaming.util.{EmptyStateMap, StateMap}
|
|||
import org.apache.spark.util.Utils
|
||||
|
||||
/**
|
||||
* Record storing the keyed-state [[MapWithStateRDD]]. Each record contains a [[StateMap]] and a
|
||||
* Record storing the keyed-state [[MapWithStateRDD]]. Each record contains a `StateMap` and a
|
||||
* sequence of records returned by the mapping function of `mapWithState`.
|
||||
*/
|
||||
private[streaming] case class MapWithStateRDDRecord[K, S, E](
|
||||
|
@ -111,7 +111,7 @@ private[streaming] class MapWithStateRDDPartition(
|
|||
/**
|
||||
* RDD storing the keyed states of `mapWithState` operation and corresponding mapped data.
|
||||
* Each partition of this RDD has a single record of type [[MapWithStateRDDRecord]]. This contains a
|
||||
* [[StateMap]] (containing the keyed-states) and the sequence of records returned by the mapping
|
||||
* `StateMap` (containing the keyed-states) and the sequence of records returned by the mapping
|
||||
* function of `mapWithState`.
|
||||
* @param prevStateRDD The previous MapWithStateRDD on whose StateMap data `this` RDD
|
||||
* will be created
|
||||
|
|
|
@ -26,7 +26,7 @@ import org.apache.spark.internal.Logging
|
|||
* case of Spark Streaming the error is the difference between the measured processing
|
||||
* rate (number of elements/processing delay) and the previous rate.
|
||||
*
|
||||
* @see https://en.wikipedia.org/wiki/PID_controller
|
||||
* @see <a href="https://en.wikipedia.org/wiki/PID_controller">PID controller (Wikipedia)</a>
|
||||
*
|
||||
* @param batchIntervalMillis the batch duration, in milliseconds
|
||||
* @param proportional how much the correction should depend on the current
|
||||
|
|
|
@ -24,7 +24,7 @@ import org.apache.spark.streaming.Duration
|
|||
* A component that estimates the rate at which an `InputDStream` should ingest
|
||||
* records, based on updates at every batch completion.
|
||||
*
|
||||
* @see [[org.apache.spark.streaming.scheduler.RateController]]
|
||||
* Please see `org.apache.spark.streaming.scheduler.RateController` for more details.
|
||||
*/
|
||||
private[streaming] trait RateEstimator extends Serializable {
|
||||
|
||||
|
|
Loading…
Reference in a new issue