spark-instrumented-optimizer/python
Wenchen Fan d57daf1f77 [SPARK-13593] [SQL] improve the createDataFrame to accept data type string and verify the data
## What changes were proposed in this pull request?

This PR improves the `createDataFrame` method to make it also accept datatype string, then users can convert python RDD to DataFrame easily, for example, `df = rdd.toDF("a: int, b: string")`.
It also supports flat schema so users can convert an RDD of int to DataFrame directly, we will automatically wrap int to row for users.
If schema is given, now we checks if the real data matches the given schema, and throw error if it doesn't.

## How was this patch tested?

new tests in `test.py` and doc test in `types.py`

Author: Wenchen Fan <wenchen@databricks.com>

Closes #11444 from cloud-fan/pyrdd.
2016-03-08 14:00:03 -08:00
..
docs [SPARK-13154][PYTHON] Add linting for pydocs 2016-02-12 02:13:06 -08:00
lib [SPARK-12652][PYSPARK] Upgrade Py4J to 0.9.1 2016-01-12 14:27:05 -08:00
pyspark [SPARK-13593] [SQL] improve the createDataFrame to accept data type string and verify the data 2016-03-08 14:00:03 -08:00
test_support [SPARK-13509][SPARK-13507][SQL] Support for writing CSV with a single function call 2016-02-29 09:44:29 -08:00
.gitignore [SPARK-3946] gitignore in /python includes wrong directory 2014-10-14 14:09:39 -07:00
pylintrc [SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs 2016-03-07 14:48:02 -08:00
run-tests [SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integrate with dev/run-tests module system 2015-06-27 20:24:34 -07:00
run-tests.py [SPARK-12243][BUILD][PYTHON] PySpark tests are slow in Jenkins. 2016-03-07 12:06:46 -08:00