spark-instrumented-optimizer

History

Matei Zaharia 90a04dab8d Initial work towards scheduler refactoring: - Replace use of hostPort vs host in Task.preferredLocations with a TaskLocation class that contains either an executorId and a host or just a host. This is part of a bigger effort to eliminate hostPort based data structures and just use executorID, since the hostPort vs host stuff is confusing (and not checkable with static typing, leading to ugly debug code), and hostPorts are not provided by Mesos. - Replaced most hostPort-based data structures and fields as above. - Simplified ClusterTaskSetManager to deal with preferred locations in a more concise way and generally be more concise. - Updated the way ClusterTaskSetManager handles racks: instead of enqueueing a task to a separate queue for all the hosts in the rack, which would create lots of large queues, have one queue per rack name. - Removed non-local fallback stuff in ClusterScheduler that tried to launch less-local tasks on a node once the local ones were all assigned. This change didn't work because many cluster schedulers send offers for just one node at a time (even the standalone and YARN ones do so as nodes join the cluster one by one). Thus, lots of non-local tasks would be assigned even though a node with locality for them would be able to receive tasks just a short time later. - Renamed MapOutputTracker "generations" to "epochs".	2013-08-18 19:51:06 -07:00
..
src	Initial work towards scheduler refactoring:	2013-08-18 19:51:06 -07:00
pom.xml	Use the JSON formatter from Scala library and removed dependency on lift-json.	2013-08-15 18:23:01 -07:00

Matei Zaharia 90a04dab8d Initial work towards scheduler refactoring:

- Replace use of hostPort vs host in Task.preferredLocations with a
  TaskLocation class that contains either an executorId and a host or
  just a host. This is part of a bigger effort to eliminate hostPort
  based data structures and just use executorID, since the hostPort vs
  host stuff is confusing (and not checkable with static typing, leading
  to ugly debug code), and hostPorts are not provided by Mesos.

- Replaced most hostPort-based data structures and fields as above.

- Simplified ClusterTaskSetManager to deal with preferred locations in a
  more concise way and generally be more concise.

- Updated the way ClusterTaskSetManager handles racks: instead of
  enqueueing a task to a separate queue for all the hosts in the rack,
  which would create lots of large queues, have one queue per rack name.

- Removed non-local fallback stuff in ClusterScheduler that tried to
  launch less-local tasks on a node once the local ones were all
  assigned. This change didn't work because many cluster schedulers send
  offers for just one node at a time (even the standalone and YARN ones
  do so as nodes join the cluster one by one). Thus, lots of non-local
  tasks would be assigned even though a node with locality for them
  would be able to receive tasks just a short time later.

- Renamed MapOutputTracker "generations" to "epochs".

2013-08-18 19:51:06 -07:00

src

Initial work towards scheduler refactoring:

2013-08-18 19:51:06 -07:00

pom.xml

Use the JSON formatter from Scala library and removed dependency on lift-json.

2013-08-15 18:23:01 -07:00