spark-instrumented-optimizer/mllib/pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one or more
  ~ contributor license agreements.  See the NOTICE file distributed with
  ~ this work for additional information regarding copyright ownership.
  ~ The ASF licenses this file to You under the Apache License, Version 2.0
  ~ (the "License"); you may not use this file except in compliance with
  ~ the License.  You may obtain a copy of the License at
  ~
  ~    http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing, software
  ~ distributed under the License is distributed on an "AS IS" BASIS,
  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  ~ See the License for the specific language governing permissions and
  ~ limitations under the License.
  -->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <parent>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-parent</artifactId>
    <version>1.1.0-SNAPSHOT</version>
    <relativePath>../pom.xml</relativePath>
  </parent>

  <groupId>org.apache.spark</groupId>
  <artifactId>spark-mllib_2.10</artifactId>
  <properties>
     <sbt.project.name>mllib</sbt.project.name>
  </properties>  
  <packaging>jar</packaging>
  <name>Spark Project ML Library</name>
  <url>http://spark.apache.org/</url>

  <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_${scala.binary.version}</artifactId>
      <version>${project.version}</version>
    </dependency>
    <dependency>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>jetty-server</artifactId>
    </dependency>
    <dependency>
      <groupId>org.jblas</groupId>
      <artifactId>jblas</artifactId>
      <version>${jblas.version}</version>
    </dependency>
    <dependency>
      <groupId>org.scalanlp</groupId>
      <artifactId>breeze_${scala.binary.version}</artifactId>
      <version>0.7</version>
      <exclusions>
        <!-- This is included as a compile-scoped dependency by jtransforms, which is
             a dependency of breeze. -->
        <exclusion>
          <groupId>junit</groupId>
          <artifactId>junit</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <dependency>
      <groupId>org.scalatest</groupId>
      <artifactId>scalatest_${scala.binary.version}</artifactId>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.scalacheck</groupId>
      <artifactId>scalacheck_${scala.binary.version}</artifactId>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>com.novocode</groupId>
      <artifactId>junit-interface</artifactId>
      <scope>test</scope>
    </dependency>
  </dependencies>
  <profiles>
    <profile>
      <id>netlib-lgpl</id>
      <dependencies>
        <dependency>
          <groupId>com.github.fommil.netlib</groupId>
          <artifactId>all</artifactId>
          <version>1.1.2</version>
          <type>pom</type>
        </dependency>
      </dependencies>
    </profile>
  </profiles>
  <build>
    <outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
    <testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
    <plugins>
      <plugin>
        <groupId>org.scalatest</groupId>
        <artifactId>scalatest-maven-plugin</artifactId>
      </plugin>
    </plugins>
    <resources>
      <resource>
        <directory>../python</directory>
        <includes>
          <include>pyspark/mllib/*.py</include>
        </includes>
      </resource>
    </resources>
  </build>
</project>
Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml 2013-07-30 17:03:15 -04:00			`<?xml version="1.0" encoding="UTF-8"?>`
			`<!--`
			`~ Licensed to the Apache Software Foundation (ASF) under one or more`
			`~ contributor license agreements. See the NOTICE file distributed with`
			`~ this work for additional information regarding copyright ownership.`
			`~ The ASF licenses this file to You under the Apache License, Version 2.0`
			`~ (the "License"); you may not use this file except in compliance with`
			`~ the License. You may obtain a copy of the License at`
			`~`
			`~ http://www.apache.org/licenses/LICENSE-2.0`
			`~`
			`~ Unless required by applicable law or agreed to in writing, software`
			`~ distributed under the License is distributed on an "AS IS" BASIS,`
			`~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.`
			`~ See the License for the specific language governing permissions and`
			`~ limitations under the License.`
			`-->`

			`<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">`
			`<modelVersion>4.0.0</modelVersion>`
			`<parent>`
Initial work to rename package to org.apache.spark 2013-08-31 22:27:07 -04:00			`<groupId>org.apache.spark</groupId>`
Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml 2013-07-30 17:03:15 -04:00			`<artifactId>spark-parent</artifactId>`
[SPARK-2029] Bump pom.xml version number of master branch to 1.1.0-SNAPSHOT. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #974 from ueshin/issues/SPARK-2029 and squashes the following commits: e19e8f4 [Takuya UESHIN] Bump version number to 1.1.0-SNAPSHOT. 2014-06-05 14:27:33 -04:00			`<version>1.1.0-SNAPSHOT</version>`
Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml 2013-07-30 17:03:15 -04:00			`<relativePath>../pom.xml</relativePath>`
			`</parent>`

Initial work to rename package to org.apache.spark 2013-08-31 22:27:07 -04:00			`<groupId>org.apache.spark</groupId>`
Incorporated Patrick's feedback comment on #211 and made maven build/dep-resolution atleast a bit faster. 2013-12-07 02:15:57 -05:00			`<artifactId>spark-mllib_2.10</artifactId>`
[SPARK-1776] Have Spark's SBT build read dependencies from Maven. Patch introduces the new way of working also retaining the existing ways of doing things. For example build instruction for yarn in maven is `mvn -Pyarn -PHadoop2.2 clean package -DskipTests` in sbt it can become `MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly` Also supports `sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly` Author: Prashant Sharma <prashant.s@imaginea.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #772 from ScrapCodes/sbt-maven and squashes the following commits: a8ac951 [Prashant Sharma] Updated sbt version. 62b09bb [Prashant Sharma] Improvements. fa6221d [Prashant Sharma] Excluding sql from mima 4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default. 72651ca [Prashant Sharma] Addresses code reivew comments. acab73d [Prashant Sharma] Revert "Small fix to run-examples script." ac4312c [Prashant Sharma] Revert "minor fix" 6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit. 65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path. 446768e [Prashant Sharma] minor fix 89b9777 [Prashant Sharma] Merge conflicts d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups. dccc8ac [Prashant Sharma] updated mima to check against 1.0 a49c61b [Prashant Sharma] Fix for tools jar a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies. cf88758 [Prashant Sharma] cleanup 9439ea3 [Prashant Sharma] Small fix to run-examples script. 96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven. 36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins. 4973dbd [Patrick Wendell] Example build using pom reader. 2014-07-10 14:03:37 -04:00			`<properties>`
			`<sbt.project.name>mllib</sbt.project.name>`
			`</properties>`
Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml 2013-07-30 17:03:15 -04:00			`<packaging>jar</packaging>`
			`<name>Spark Project ML Library</name>`
Remove remaining references to incubation This removes some loose ends not caught by the other (incubating -> tlp) patches. @markhamstra this updates the version as you mentioned earlier. Author: Patrick Wendell <pwendell@gmail.com> Closes #51 from pwendell/tlp and squashes the following commits: d553b1b [Patrick Wendell] Remove remaining references to incubation 2014-03-02 04:00:16 -05:00			`<url>http://spark.apache.org/</url>`
Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml 2013-07-30 17:03:15 -04:00
			`<dependencies>`
Initial changes to make Maven build agnostic of hadoop version 2013-08-15 15:10:31 -04:00			`<dependency>`
Initial work to rename package to org.apache.spark 2013-08-31 22:27:07 -04:00			`<groupId>org.apache.spark</groupId>`
Use scala.binary.version in POMs 2013-12-15 15:39:58 -05:00			`<artifactId>spark-core_${scala.binary.version}</artifactId>`
Initial changes to make Maven build agnostic of hadoop version 2013-08-15 15:10:31 -04:00			`<version>${project.version}</version>`
			`</dependency>`
Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml 2013-07-30 17:03:15 -04:00			`<dependency>`
			`<groupId>org.eclipse.jetty</groupId>`
			`<artifactId>jetty-server</artifactId>`
			`</dependency>`
			`<dependency>`
			`<groupId>org.jblas</groupId>`
			`<artifactId>jblas</artifactId>`
SPARK-1387. Update build plugins, avoid plugin version warning, centralize versions Another handful of small build changes to organize and standardize a bit, and avoid warnings: - Update Maven plugin versions for good measure - Since plugins need maven 3.0.4 already, require it explicitly (<3.0.4 had some bugs anyway) - Use variables to define versions across dependencies where they should move in lock step - ... and make this consistent between Maven/SBT OK, I also updated the JIRA URL while I was at it here. Author: Sean Owen <sowen@cloudera.com> Closes #291 from srowen/SPARK-1387 and squashes the following commits: 461eca1 [Sean Owen] Couldn't resist also updating JIRA location to new one c2d5cc5 [Sean Owen] Update plugins and Maven version; use variables consistently across Maven/SBT to define dependency versions that should stay in step. 2014-04-06 20:40:37 -04:00			`<version>${jblas.version}</version>`
Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml 2013-07-30 17:03:15 -04:00			`</dependency>`
[SPARK-1212] Adding sparse data support and update KMeans Continue our discussions from https://github.com/apache/incubator-spark/pull/575 This PR is WIP because it depends on a SNAPSHOT version of breeze. Per previous discussions and benchmarks, I switched to breeze for linear algebra operations. @dlwh and I made some improvements to breeze to keep its performance comparable to the bare-bone implementation, including norm computation and squared distance. This is why this PR needs to depend on a SNAPSHOT version of breeze. @fommil , please find the notice of using netlib-core in `NOTICE`. This is following Apache's instructions on appropriate labeling. I'm going to update this PR to include: 1. Fast distance computation: using `\\|a\\|_2^2 + \\|b\\|_2^2 - 2 a^T b` when it doesn't introduce too much numerical error. The squared norms are pre-computed. Otherwise, computing the distance between the center (dense) and a point (possibly sparse) always takes O(n) time. 2. Some numbers about the performance. 3. A released version of breeze. @dlwh, a minor release of breeze will help this PR get merged early. Do you mind sharing breeze's release plan? Thanks! Author: Xiangrui Meng <meng@databricks.com> Closes #117 from mengxr/sparse-kmeans and squashes the following commits: 67b368d [Xiangrui Meng] fix SparseVector.toArray 5eda0de [Xiangrui Meng] update NOTICE 67abe31 [Xiangrui Meng] move ArrayRDDs to mllib.rdd 1da1033 [Xiangrui Meng] remove dependency on commons-math3 and compute EPSILON directly 9bb1b31 [Xiangrui Meng] optimize SparseVector.toArray 226d2cd [Xiangrui Meng] update Java friendly methods in Vectors 238ba34 [Xiangrui Meng] add VectorRDDs with a converter from RDD[Array[Double]] b28ba2f [Xiangrui Meng] add toArray to Vector e69b10c [Xiangrui Meng] remove examples/JavaKMeans.java, which is replaced by mllib/examples/JavaKMeans.java 72bde33 [Xiangrui Meng] clean up code for distance computation 712cb88 [Xiangrui Meng] make Vectors.sparse Java friendly 27858e4 [Xiangrui Meng] update breeze version to 0.7 07c3cf2 [Xiangrui Meng] change Mahout to breeze in doc use a simple lower bound to avoid unnecessary distance computation 6f5cdde [Xiangrui Meng] fix a bug in filtering finished runs 42512f2 [Xiangrui Meng] Merge branch 'master' into sparse-kmeans d6e6c07 [Xiangrui Meng] add predict(RDD[Vector]) to KMeansModel 42b4e50 [Xiangrui Meng] line feed at the end a4ace73 [Xiangrui Meng] Merge branch 'fast-dist' into sparse-kmeans 3ed1a24 [Xiangrui Meng] add doc to BreezeVectorWithSquaredNorm 0107e19 [Xiangrui Meng] update NOTICE 87bc755 [Xiangrui Meng] tuned the KMeans code: changed some for loops to while, use view to avoid copying arrays 0ff8046 [Xiangrui Meng] update KMeans to use fastSquaredDistance f355411 [Xiangrui Meng] add BreezeVectorWithSquaredNorm case class ab74f67 [Xiangrui Meng] add fastSquaredDistance for KMeans 4e7d5ca [Xiangrui Meng] minor style update 07ffaf2 [Xiangrui Meng] add dense/sparse vector data models and conversions to/from breeze vectors use breeze to implement KMeans in order to support both dense and sparse data 2014-03-23 20:34:02 -04:00			`<dependency>`
			`<groupId>org.scalanlp</groupId>`
			`<artifactId>breeze_${scala.binary.version}</artifactId>`
			`<version>0.7</version>`
Remove compile-scoped junit dependency. This avoids having junit classes showing up in the assembly jar. I verified that only test classes in the jtransforms package use junit. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #794 from vanzin/junit-dep-exclusion and squashes the following commits: 274e1c2 [Marcelo Vanzin] Remove junit from assembly in sbt build also. ad950be [Marcelo Vanzin] Remove compile-scoped junit dependency. 2014-06-05 16:13:33 -04:00			`<exclusions>`
			`<!-- This is included as a compile-scoped dependency by jtransforms, which is`
			`a dependency of breeze. -->`
			`<exclusion>`
			`<groupId>junit</groupId>`
			`<artifactId>junit</artifactId>`
			`</exclusion>`
			`</exclusions>`
[SPARK-1212] Adding sparse data support and update KMeans Continue our discussions from https://github.com/apache/incubator-spark/pull/575 This PR is WIP because it depends on a SNAPSHOT version of breeze. Per previous discussions and benchmarks, I switched to breeze for linear algebra operations. @dlwh and I made some improvements to breeze to keep its performance comparable to the bare-bone implementation, including norm computation and squared distance. This is why this PR needs to depend on a SNAPSHOT version of breeze. @fommil , please find the notice of using netlib-core in `NOTICE`. This is following Apache's instructions on appropriate labeling. I'm going to update this PR to include: 1. Fast distance computation: using `\\|a\\|_2^2 + \\|b\\|_2^2 - 2 a^T b` when it doesn't introduce too much numerical error. The squared norms are pre-computed. Otherwise, computing the distance between the center (dense) and a point (possibly sparse) always takes O(n) time. 2. Some numbers about the performance. 3. A released version of breeze. @dlwh, a minor release of breeze will help this PR get merged early. Do you mind sharing breeze's release plan? Thanks! Author: Xiangrui Meng <meng@databricks.com> Closes #117 from mengxr/sparse-kmeans and squashes the following commits: 67b368d [Xiangrui Meng] fix SparseVector.toArray 5eda0de [Xiangrui Meng] update NOTICE 67abe31 [Xiangrui Meng] move ArrayRDDs to mllib.rdd 1da1033 [Xiangrui Meng] remove dependency on commons-math3 and compute EPSILON directly 9bb1b31 [Xiangrui Meng] optimize SparseVector.toArray 226d2cd [Xiangrui Meng] update Java friendly methods in Vectors 238ba34 [Xiangrui Meng] add VectorRDDs with a converter from RDD[Array[Double]] b28ba2f [Xiangrui Meng] add toArray to Vector e69b10c [Xiangrui Meng] remove examples/JavaKMeans.java, which is replaced by mllib/examples/JavaKMeans.java 72bde33 [Xiangrui Meng] clean up code for distance computation 712cb88 [Xiangrui Meng] make Vectors.sparse Java friendly 27858e4 [Xiangrui Meng] update breeze version to 0.7 07c3cf2 [Xiangrui Meng] change Mahout to breeze in doc use a simple lower bound to avoid unnecessary distance computation 6f5cdde [Xiangrui Meng] fix a bug in filtering finished runs 42512f2 [Xiangrui Meng] Merge branch 'master' into sparse-kmeans d6e6c07 [Xiangrui Meng] add predict(RDD[Vector]) to KMeansModel 42b4e50 [Xiangrui Meng] line feed at the end a4ace73 [Xiangrui Meng] Merge branch 'fast-dist' into sparse-kmeans 3ed1a24 [Xiangrui Meng] add doc to BreezeVectorWithSquaredNorm 0107e19 [Xiangrui Meng] update NOTICE 87bc755 [Xiangrui Meng] tuned the KMeans code: changed some for loops to while, use view to avoid copying arrays 0ff8046 [Xiangrui Meng] update KMeans to use fastSquaredDistance f355411 [Xiangrui Meng] add BreezeVectorWithSquaredNorm case class ab74f67 [Xiangrui Meng] add fastSquaredDistance for KMeans 4e7d5ca [Xiangrui Meng] minor style update 07ffaf2 [Xiangrui Meng] add dense/sparse vector data models and conversions to/from breeze vectors use breeze to implement KMeans in order to support both dense and sparse data 2014-03-23 20:34:02 -04:00			`</dependency>`
Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml 2013-07-30 17:03:15 -04:00			`<dependency>`
			`<groupId>org.scalatest</groupId>`
Use scala.binary.version in POMs 2013-12-15 15:39:58 -05:00			`<artifactId>scalatest_${scala.binary.version}</artifactId>`
Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml 2013-07-30 17:03:15 -04:00			`<scope>test</scope>`
			`</dependency>`
			`<dependency>`
			`<groupId>org.scalacheck</groupId>`
Use scala.binary.version in POMs 2013-12-15 15:39:58 -05:00			`<artifactId>scalacheck_${scala.binary.version}</artifactId>`
Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml 2013-07-30 17:03:15 -04:00			`<scope>test</scope>`
			`</dependency>`
Java examples, tests for KMeans and ALS - Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it easier to call from Java - Renames class methods from `train` to `run` to enable static methods to be called from Java. - Add unit tests which check if both static / class methods can be called. - Also add examples which port the main() function in ALS, KMeans to the examples project. Couple of minor changes to existing code: - Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily - Workaround a bug where using double[] from Java leads to class cast exception in KMeans init 2013-08-06 18:43:46 -04:00			`<dependency>`
			`<groupId>com.novocode</groupId>`
			`<artifactId>junit-interface</artifactId>`
			`<scope>test</scope>`
			`</dependency>`
Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml 2013-07-30 17:03:15 -04:00			`</dependencies>`
[SPARK-2358][MLLIB] Add an option to include native BLAS/LAPACK loader in the build It would be easy for users to include the netlib-java jniloader in the spark jar, which is LGPL-licensed. We can follow the same approach as ganglia support in Spark, which could be enabled by turning on "-Pganglia-lgpl" at build time. We can use "-Pnetlib-lgpl" flag for this. Author: Xiangrui Meng <meng@databricks.com> Closes #1295 from mengxr/netlib-lgpl and squashes the following commits: aebf001 [Xiangrui Meng] add a profile to optionally include native BLAS/LAPACK loader in mllib 2014-07-11 00:57:54 -04:00			`<profiles>`
			`<profile>`
			`<id>netlib-lgpl</id>`
			`<dependencies>`
			`<dependency>`
			`<groupId>com.github.fommil.netlib</groupId>`
			`<artifactId>all</artifactId>`
			`<version>1.1.2</version>`
			`<type>pom</type>`
			`</dependency>`
			`</dependencies>`
			`</profile>`
			`</profiles>`
Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml 2013-07-30 17:03:15 -04:00			`<build>`
Use scala.binary.version in POMs 2013-12-15 15:39:58 -05:00			`<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>`
			`<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>`
Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml 2013-07-30 17:03:15 -04:00			`<plugins>`
			`<plugin>`
			`<groupId>org.scalatest</groupId>`
			`<artifactId>scalatest-maven-plugin</artifactId>`
			`</plugin>`
			`</plugins>`
[SPARK-2172] PySpark cannot import mllib modules in YARN-client mode Include pyspark/mllib python sources as resources in the mllib.jar. This way they will be included in the final assembly Author: Szul, Piotr <Piotr.Szul@csiro.au> Closes #1223 from piotrszul/branch-1.0 and squashes the following commits: 69d5174 [Szul, Piotr] Removed unsed resource directory src/main/resource from mllib pom f8c52a0 [Szul, Piotr] [SPARK-2172] PySpark cannot import mllib modules in YARN-client mode Include pyspark/mllib python sources as resources in the jar (cherry picked from commit fa167194ce1b5898e4d7232346c9f86b2897a722) Signed-off-by: Reynold Xin <rxin@apache.org> 2014-06-26 00:55:49 -04:00			`<resources>`
			`<resource>`
			`<directory>../python</directory>`
			`<includes>`
			`<include>pyspark/mllib/*.py</include>`
			`</includes>`
			`</resource>`
			`</resources>`
Add bagel, mllib to SBT assembly. Also add jblas dependency to mllib pom.xml 2013-07-30 17:03:15 -04:00			`</build>`
			`</project>`