[SPARK-4137] [EC2] Don't change working dir on user

This issue was uncovered after [this discussion](https://issues.apache.org/jira/browse/SPARK-3398?focusedCommentId=14187471&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14187471).

Don't change the working directory on the user. This breaks relative paths the user may pass in, e.g., for the SSH identity file.

```
./ec2/spark-ec2 -i ../my.pem
```

This patch will preserve the user's current working directory and allow calls like the one above to work.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #2988 from nchammas/spark-ec2-cwd and squashes the following commits:

f3850b5 [Nicholas Chammas] pep8 fix
fbc20c7 [Nicholas Chammas] revert to old commenting style
752f958 [Nicholas Chammas] specify deploy.generic path absolutely
bcdf6a5 [Nicholas Chammas] fix typo
77871a2 [Nicholas Chammas] add clarifying comment
ce071fc [Nicholas Chammas] don't change working dir
This commit is contained in:
Nicholas Chammas 2014-11-05 20:45:35 -08:00 committed by Shivaram Venkataraman
parent 3d2b5bc5bb
commit db45f5ad03
2 changed files with 17 additions and 3 deletions

View file

@ -18,5 +18,9 @@
# limitations under the License.
#
cd "`dirname $0`"
PYTHONPATH="./third_party/boto-2.4.1.zip/boto-2.4.1:$PYTHONPATH" python ./spark_ec2.py "$@"
# Preserve the user's CWD so that relative paths are passed correctly to
#+ the underlying Python script.
SPARK_EC2_DIR="$(dirname $0)"
PYTHONPATH="${SPARK_EC2_DIR}/third_party/boto-2.4.1.zip/boto-2.4.1:$PYTHONPATH" \
python "${SPARK_EC2_DIR}/spark_ec2.py" "$@"

View file

@ -40,6 +40,7 @@ from boto.ec2.blockdevicemapping import BlockDeviceMapping, BlockDeviceType, EBS
from boto import ec2
DEFAULT_SPARK_VERSION = "1.1.0"
SPARK_EC2_DIR = os.path.dirname(os.path.realpath(__file__))
MESOS_SPARK_EC2_BRANCH = "v4"
# A URL prefix from which to fetch AMI information
@ -593,7 +594,14 @@ def setup_cluster(conn, master_nodes, slave_nodes, opts, deploy_ssh_key):
)
print "Deploying files to master..."
deploy_files(conn, "deploy.generic", opts, master_nodes, slave_nodes, modules)
deploy_files(
conn=conn,
root_dir=SPARK_EC2_DIR + "/" + "deploy.generic",
opts=opts,
master_nodes=master_nodes,
slave_nodes=slave_nodes,
modules=modules
)
print "Running setup on master..."
setup_spark_cluster(master, opts)
@ -730,6 +738,8 @@ def get_num_disks(instance_type):
# cluster (e.g. lists of masters and slaves). Files are only deployed to
# the first master instance in the cluster, and we expect the setup
# script to be run on that instance to copy them to other nodes.
#
# root_dir should be an absolute path to the directory with the files we want to deploy.
def deploy_files(conn, root_dir, opts, master_nodes, slave_nodes, modules):
active_master = master_nodes[0].public_dns_name