spark-instrumented-optimizer/python/pyspark/__init__.py

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

"""
PySpark is the Python API for Spark.

Public classes:

    - L{SparkContext<pyspark.context.SparkContext>}
        Main entry point for Spark functionality.
    - L{RDD<pyspark.rdd.RDD>}
        A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
    - L{Broadcast<pyspark.broadcast.Broadcast>}
        A broadcast variable that gets reused across tasks.
    - L{Accumulator<pyspark.accumulators.Accumulator>}
        An "add-only" shared variable that tasks can only add values to.
    - L{SparkFiles<pyspark.files.SparkFiles>}
        Access files shipped with jobs.
    - L{StorageLevel<pyspark.storagelevel.StorageLevel>}
        Finer-grained cache persistence levels.
"""
import sys
import os
sys.path.insert(0, os.path.join(os.environ["SPARK_HOME"], "python/lib/py4j0.7.egg"))


from pyspark.context import SparkContext
from pyspark.rdd import RDD
from pyspark.files import SparkFiles
from pyspark.storagelevel import StorageLevel


__all__ = ["SparkContext", "RDD", "SparkFiles", "StorageLevel"]
Add missing license headers found with RAT 2013-09-02 15:23:03 -04:00			`#`
			`# Licensed to the Apache Software Foundation (ASF) under one or more`
			`# contributor license agreements. See the NOTICE file distributed with`
			`# this work for additional information regarding copyright ownership.`
			`# The ASF licenses this file to You under the Apache License, Version 2.0`
			`# (the "License"); you may not use this file except in compliance with`
			`# the License. You may obtain a copy of the License at`
			`#`
			`# http://www.apache.org/licenses/LICENSE-2.0`
			`#`
			`# Unless required by applicable law or agreed to in writing, software`
			`# distributed under the License is distributed on an "AS IS" BASIS,`
			`# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.`
			`# See the License for the specific language governing permissions and`
			`# limitations under the License.`
			`#`

Minor documentation and style fixes for PySpark. 2013-01-01 16:52:14 -05:00			`"""`
Add missing license headers found with RAT 2013-09-02 15:23:03 -04:00			`PySpark is the Python API for Spark.`
Minor documentation and style fixes for PySpark. 2013-01-01 16:52:14 -05:00
			`Public classes:`

			`- L{SparkContext<pyspark.context.SparkContext>}`
			`Main entry point for Spark functionality.`
			`- L{RDD<pyspark.rdd.RDD>}`
			`A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.`
Added accumulators to PySpark 2013-01-20 04:57:44 -05:00			`- L{Broadcast<pyspark.broadcast.Broadcast>}`
			`A broadcast variable that gets reused across tasks.`
			`- L{Accumulator<pyspark.accumulators.Accumulator>}`
			`An "add-only" shared variable that tasks can only add values to.`
Don't download files to master's working directory. This should avoid exceptions caused by existing files with different contents. I also removed some unused code. 2013-01-21 19:42:24 -05:00			`- L{SparkFiles<pyspark.files.SparkFiles>}`
			`Access files shipped with jobs.`
Export StorageLevel and refactor 2013-09-07 17:41:31 -04:00			`- L{StorageLevel<pyspark.storagelevel.StorageLevel>}`
			`Finer-grained cache persistence levels.`
Minor documentation and style fixes for PySpark. 2013-01-01 16:52:14 -05:00			`"""`
Simplify PySpark installation. - Bundle Py4J binaries, since it's hard to install - Uses Spark's `run` script to launch the Py4J gateway, inheriting the settings in spark-env.sh With these changes, (hopefully) nothing more than running `sbt/sbt package` will be necessary to run PySpark. 2012-12-28 01:47:37 -05:00			`import sys`
			`import os`
Rename top-level 'pyspark' directory to 'python' 2013-01-01 17:48:45 -05:00			`sys.path.insert(0, os.path.join(os.environ["SPARK_HOME"], "python/lib/py4j0.7.egg"))`
Add documentation for Python API. 2012-12-29 01:51:28 -05:00

			`from pyspark.context import SparkContext`
Minor documentation and style fixes for PySpark. 2013-01-01 16:52:14 -05:00			`from pyspark.rdd import RDD`
Don't download files to master's working directory. This should avoid exceptions caused by existing files with different contents. I also removed some unused code. 2013-01-21 19:42:24 -05:00			`from pyspark.files import SparkFiles`
Export StorageLevel and refactor 2013-09-07 17:41:31 -04:00			`from pyspark.storagelevel import StorageLevel`
Add documentation for Python API. 2012-12-29 01:51:28 -05:00

Export StorageLevel and refactor 2013-09-07 17:41:31 -04:00			`__all__ = ["SparkContext", "RDD", "SparkFiles", "StorageLevel"]`