spark-instrumented-optimizer

History

Matei Zaharia 23f966f475 [SPARK-3930] [SPARK-3933] Support fixed-precision decimal in SQL, and some optimizations - Adds optional precision and scale to Spark SQL's decimal type, which behave similarly to those in Hive 13 (https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf) - Replaces our internal representation of decimals with a Decimal class that can store small values in a mutable Long, saving memory in this situation and letting some operations happen directly on Longs This is still marked WIP because there are a few TODOs, but I'll remove that tag when done. Author: Matei Zaharia <matei@databricks.com> Closes #2983 from mateiz/decimal-1 and squashes the following commits: 35e6b02 [Matei Zaharia] Fix issues after merge 227f24a [Matei Zaharia] Review comments 31f915e [Matei Zaharia] Implement Davies's suggestions in Python eb84820 [Matei Zaharia] Support reading/writing decimals as fixed-length binary in Parquet 4dc6bae [Matei Zaharia] Fix decimal support in PySpark d1d9d68 [Matei Zaharia] Fix compile error and test issues after rebase b28933d [Matei Zaharia] Support decimal precision/scale in Hive metastore 2118c0d [Matei Zaharia] Some test and bug fixes 81db9cb [Matei Zaharia] Added mutable Decimal that will be more efficient for small precisions 7af0c3b [Matei Zaharia] Add optional precision and scale to DecimalType, but use Unlimited for now ec0a947 [Matei Zaharia] Make the result of AVG on Decimals be Decimal, not Double		2014-11-01 19:29:14 -07:00
..
mllib	[SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API	2014-10-30 22:25:18 -07:00
streaming	replace awaitTransformation with awaitTermination in scaladoc/javadoc	2014-10-21 09:37:17 -07:00
__init__.py	[SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs	2014-10-07 18:09:27 -07:00
accumulators.py	[SPARK-3478] [PySpark] Profile the Python tasks	2014-09-30 18:24:57 -07:00
broadcast.py	[SPARK-3430] [PySpark] [Doc] generate PySpark API docs using Sphinx	2014-09-16 12:51:58 -07:00
cloudpickle.py	[SPARK-3679] [PySpark] pickle the exact globals of functions	2014-09-24 13:00:05 -07:00
conf.py	[SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs	2014-10-07 18:09:27 -07:00
context.py	[SPARK-2652] [PySpark] donot use KyroSerializer as default serializer	2014-10-23 23:58:00 -07:00
daemon.py	[SPARK-4088] [PySpark] Python worker should exit after socket is closed by JVM	2014-10-25 01:20:39 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
heapq3.py	[SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey()	2014-08-26 16:57:40 -07:00
java_gateway.py	[SPARK-3167] Handle special driver configs in Windows	2014-08-26 22:52:16 -07:00
join.py	[SPARK-546] Add full outer join to RDD and DStream.	2014-09-24 20:39:09 -07:00
rdd.py	[SPARK-4150][PySpark] return self in rdd.setName	2014-10-31 12:07:48 -07:00
rddsampler.py	Fix for sampling error in NumPy v1.9 [SPARK-3995][PYSPARK]	2014-10-22 09:33:12 -07:00
resultiterable.py	[SPARK-2627] [PySpark] have the build enforce PEP 8 automatically	2014-08-06 12:58:24 -07:00
serializers.py	[SPARK-3993] [PySpark] fix bug while reuse worker after take()	2014-10-23 17:20:00 -07:00
shell.py	[SPARK-3273][SPARK-3301]We should read the version information from the same place	2014-09-06 15:08:43 -07:00
shuffle.py	[SPARK-3786] [PySpark] speedup tests	2014-10-06 14:07:53 -07:00
sql.py	[SPARK-3930] [SPARK-3933] Support fixed-precision decimal in SQL, and some optimizations	2014-11-01 19:29:14 -07:00
statcounter.py	StatCounter on NumPy arrays [PYSPARK][SPARK-2012]	2014-08-01 22:33:25 -07:00
storagelevel.py	[SPARK-3417] Use new-style classes in PySpark	2014-09-08 15:45:36 -07:00
tests.py	[SPARK-4133] [SQL] [PySpark] type conversionfor python udf	2014-10-28 19:38:16 -07:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
worker.py	[SPARK-3993] [PySpark] fix bug while reuse worker after take()	2014-10-23 17:20:00 -07:00