## What changes were proposed in this pull request?
Add python api support and JavaSparkContext support for resources(). I needed the JavaSparkContext support for it to properly translate into python with the py4j stuff.
## How was this patch tested?
Unit tests added and manually tested in local cluster mode and on yarn.
Closes#25087 from tgravescs/SPARK-28234-python.
Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
## What changes were proposed in this pull request?
Misc code cleanup from lgtm.com analysis. See comments below for details.
## How was this patch tested?
Existing tests.
Closes#23571 from srowen/SPARK-26640.
Lead-authored-by: Sean Owen <sean.owen@databricks.com>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
It's the follow-up PR for #22962, contains the following works:
- Remove `__init__` in TaskContext and BarrierTaskContext.
- Add more comments to explain the fix.
- Rewrite UT in a new class.
## How was this patch tested?
New UT in test_taskcontext.py
Closes#23435 from xuanyuanking/SPARK-25921-follow.
Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
## What changes were proposed in this pull request?
Running a barrier job after a normal spark job causes the barrier job to run without a BarrierTaskContext. This is because while python worker reuse, BarrierTaskContext._getOrCreate() will still return a TaskContext after firstly submit a normal spark job, we'll get a `AttributeError: 'TaskContext' object has no attribute 'barrier'`. Fix this by adding check logic in BarrierTaskContext._getOrCreate() and make sure it will return BarrierTaskContext in this scenario.
## How was this patch tested?
Add new UT in pyspark-core.
Closes#22962 from xuanyuanking/SPARK-25921.
Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?
I made one pass over the Python APIs for barrier mode and updated them to match the Scala doc in #22240 . Major changes:
* export the public classes
* expand the docs
* add doc for BarrierTaskInfo.addresss
cc: jiangxb1987
Closes#22261 from mengxr/SPARK-25248.1.
Authored-by: Xiangrui Meng <meng@databricks.com>
Signed-off-by: Xiangrui Meng <meng@databricks.com>
This eliminates some duplication in the code to connect to a server on localhost to talk directly to the jvm. Also it gives consistent ipv6 and error handling. Two other incidental changes, that shouldn't matter:
1) python barrier tasks perform authentication immediately (rather than waiting for the BARRIER_FUNCTION indicator)
2) for `rdd._load_from_socket`, the timeout is only increased after authentication.
Closes#22247 from squito/py_connection_refactor.
Authored-by: Imran Rashid <irashid@cloudera.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
## What changes were proposed in this pull request?
Add method `barrier()` and `getTaskInfos()` in python TaskContext, these two methods are only allowed for barrier tasks.
## How was this patch tested?
Add new tests in `tests.py`
Closes#22085 from jiangxb1987/python.barrier.
Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: Xiangrui Meng <meng@databricks.com>
## What changes were proposed in this pull request?
This adds a new API `TaskContext.getLocalProperty(key)` to the Python TaskContext. It mirrors the Java TaskContext API of returning a string value if the key exists, or None if the key does not exist.
## How was this patch tested?
New test added.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes#21437 from tdas/SPARK-24397.
## What changes were proposed in this pull request?
Adds basic TaskContext information to PySpark.
## How was this patch tested?
New unit tests to `tests.py` & existing unit tests.
Author: Holden Karau <holden@us.ibm.com>
Closes#16211 from holdenk/SPARK-18576-pyspark-taskcontext.