7f3c778fd0
## What changes were proposed in this pull request? There are some tests failed on Windows due to the wrong format of path and the limitation of path length as below: This PR proposes both to fix the failed tests by fixing the path for the tests below: - `InsertSuite` ``` Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.sources.InsertSuite *** ABORTED *** (12 seconds, 547 milliseconds) org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark arget mpspark-177945ef-9128-42b4-8c07-de31f78bbbd6; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) ``` - `PathOptionSuite` ``` - path option also exist for write path *** FAILED *** (1 second, 93 milliseconds) "C:[projectsspark arget mp]spark-5ab34a58-df8d-..." did not equal "C:[\projects\spark\target\tmp\]spark-5ab34a58-df8d-..." (PathOptionSuite.scala:93) org.scalatest.exceptions.TestFailedException: at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) ... ``` - `UDFSuite` ``` - SPARK-8005 input_file_name *** FAILED *** (2 seconds, 234 milliseconds) "file:///C:/projects/spark/target/tmp/spark-e4e5720a-2006-48f9-8b11-797bf59794bf/part-00001-26fb05e4-603d-471d-ae9d-b9549e0c7765.snappy.parquet" did not contain "C:\projects\spark\target\tmp\spark-e4e5720a-2006-48f9-8b11-797bf59794bf" (UDFSuite.scala:67) org.scalatest.exceptions.TestFailedException: at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) ... ``` and to skip the tests belows which are being failed on Windows due to path length limitation. - `SparkLauncherSuite` ``` Test org.apache.spark.launcher.SparkLauncherSuite.testChildProcLauncher failed: java.lang.AssertionError: expected:<0> but was:<1>, took 0.062 sec at org.apache.spark.launcher.SparkLauncherSuite.testChildProcLauncher(SparkLauncherSuite.java:177) ... ``` The stderr from the process is `The filename or extension is too long` which is equivalent to the one below. - `BroadcastJoinSuite` ``` 04:09:40.882 ERROR org.apache.spark.deploy.worker.ExecutorRunner: Error running executor java.io.IOException: Cannot run program "C:\Progra~1\Java\jdk1.8.0\bin\java" (in directory "C:\projects\spark\work\app-20161205040542-0000\51658"): CreateProcess error=206, The filename or extension is too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.spark.deploy.worker.ExecutorRunner.org$apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:167) at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73) Caused by: java.io.IOException: CreateProcess error=206, The filename or extension is too long at java.lang.ProcessImpl.create(Native Method) at java.lang.ProcessImpl.<init>(ProcessImpl.java:386) at java.lang.ProcessImpl.start(ProcessImpl.java:137) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 2 more 04:09:40.929 ERROR org.apache.spark.deploy.worker.ExecutorRunner: Error running executor (appearently infinite same error messages) ... ``` ## How was this patch tested? Manually tested via AppVeyor. **Before** `InsertSuite`: https://ci.appveyor.com/project/spark-test/spark/build/148-InsertSuite-pr `PathOptionSuite`: https://ci.appveyor.com/project/spark-test/spark/build/139-PathOptionSuite-pr `UDFSuite`: https://ci.appveyor.com/project/spark-test/spark/build/143-UDFSuite-pr `SparkLauncherSuite`: https://ci.appveyor.com/project/spark-test/spark/build/141-SparkLauncherSuite-pr `BroadcastJoinSuite`: https://ci.appveyor.com/project/spark-test/spark/build/145-BroadcastJoinSuite-pr **After** `PathOptionSuite`: https://ci.appveyor.com/project/spark-test/spark/build/140-PathOptionSuite-pr `SparkLauncherSuite`: https://ci.appveyor.com/project/spark-test/spark/build/142-SparkLauncherSuite-pr `UDFSuite`: https://ci.appveyor.com/project/spark-test/spark/build/144-UDFSuite-pr `InsertSuite`: https://ci.appveyor.com/project/spark-test/spark/build/147-InsertSuite-pr `BroadcastJoinSuite`: https://ci.appveyor.com/project/spark-test/spark/build/149-BroadcastJoinSuite-pr Author: hyukjinkwon <gurwls223@gmail.com> Closes #16147 from HyukjinKwon/fix-tests. |
||
---|---|---|
.. | ||
catalyst | ||
core | ||
hive | ||
hive-thriftserver | ||
README.md |
Spark SQL
This module provides support for executing relational queries expressed in either SQL or the DataFrame/Dataset API.
Spark SQL is broken up into four subprojects:
- Catalyst (sql/catalyst) - An implementation-agnostic framework for manipulating trees of relational operators and expressions.
- Execution (sql/core) - A query planner / execution engine for translating Catalyst's logical query plans into Spark RDDs. This component also includes a new public interface, SQLContext, that allows users to execute SQL or LINQ statements against existing RDDs and Parquet files.
- Hive Support (sql/hive) - Includes an extension of SQLContext called HiveContext that allows users to write queries using a subset of HiveQL and access data from a Hive Metastore using Hive SerDes. There are also wrappers that allows users to run queries that include Hive UDFs, UDAFs, and UDTFs.
- HiveServer and CLI support (sql/hive-thriftserver) - Includes support for the SQL CLI (bin/spark-sql) and a HiveServer2 (for JDBC/ODBC) compatible server.