LICENSE and NOTICE policy is explained here:
http://www.apache.org/dev/licensing-howto.htmlhttp://www.apache.org/legal/3party.html
This leads to the following changes.
First, this change enables two extensions to maven-shade-plugin in assembly/ that will try to include and merge all NOTICE and LICENSE files. This can't hurt.
This generates a consolidated NOTICE file that I manually added to NOTICE.
Next, a list of all dependencies and their licenses was generated:
`mvn ... license:aggregate-add-third-party`
to create: `target/generated-sources/license/THIRD-PARTY.txt`
Each dependency is listed with one or more licenses. Determine the most-compatible license for each if there is more than one.
For "unknown" license dependencies, I manually evaluateD their license. Many are actually Apache projects or components of projects covered already. The only non-trivial one was Colt, which has its own (compatible) license.
I ignored Apache-licensed and public domain dependencies as these require no further action (beyond NOTICE above).
BSD and MIT licenses (permissive Category A licenses) are evidently supposed to be mentioned in LICENSE, so I added a section without output from the THIRD-PARTY.txt file appropriately.
Everything else, Category B licenses, are evidently mentioned in NOTICE (?) Same there.
LICENSE contained some license statements for source code that is redistributed. I left this as I think that is the right place to put it.
Author: Sean Owen <sowen@cloudera.com>
Closes#770 from srowen/SPARK-1827 and squashes the following commits:
a764504 [Sean Owen] Add LICENSE and NOTICE info for all transitive dependencies as of 1.0
Continue our discussions from https://github.com/apache/incubator-spark/pull/575
This PR is WIP because it depends on a SNAPSHOT version of breeze.
Per previous discussions and benchmarks, I switched to breeze for linear algebra operations. @dlwh and I made some improvements to breeze to keep its performance comparable to the bare-bone implementation, including norm computation and squared distance. This is why this PR needs to depend on a SNAPSHOT version of breeze.
@fommil , please find the notice of using netlib-core in `NOTICE`. This is following Apache's instructions on appropriate labeling.
I'm going to update this PR to include:
1. Fast distance computation: using `\|a\|_2^2 + \|b\|_2^2 - 2 a^T b` when it doesn't introduce too much numerical error. The squared norms are pre-computed. Otherwise, computing the distance between the center (dense) and a point (possibly sparse) always takes O(n) time.
2. Some numbers about the performance.
3. A released version of breeze. @dlwh, a minor release of breeze will help this PR get merged early. Do you mind sharing breeze's release plan? Thanks!
Author: Xiangrui Meng <meng@databricks.com>
Closes#117 from mengxr/sparse-kmeans and squashes the following commits:
67b368d [Xiangrui Meng] fix SparseVector.toArray
5eda0de [Xiangrui Meng] update NOTICE
67abe31 [Xiangrui Meng] move ArrayRDDs to mllib.rdd
1da1033 [Xiangrui Meng] remove dependency on commons-math3 and compute EPSILON directly
9bb1b31 [Xiangrui Meng] optimize SparseVector.toArray
226d2cd [Xiangrui Meng] update Java friendly methods in Vectors
238ba34 [Xiangrui Meng] add VectorRDDs with a converter from RDD[Array[Double]]
b28ba2f [Xiangrui Meng] add toArray to Vector
e69b10c [Xiangrui Meng] remove examples/JavaKMeans.java, which is replaced by mllib/examples/JavaKMeans.java
72bde33 [Xiangrui Meng] clean up code for distance computation
712cb88 [Xiangrui Meng] make Vectors.sparse Java friendly
27858e4 [Xiangrui Meng] update breeze version to 0.7
07c3cf2 [Xiangrui Meng] change Mahout to breeze in doc use a simple lower bound to avoid unnecessary distance computation
6f5cdde [Xiangrui Meng] fix a bug in filtering finished runs
42512f2 [Xiangrui Meng] Merge branch 'master' into sparse-kmeans
d6e6c07 [Xiangrui Meng] add predict(RDD[Vector]) to KMeansModel
42b4e50 [Xiangrui Meng] line feed at the end
a4ace73 [Xiangrui Meng] Merge branch 'fast-dist' into sparse-kmeans
3ed1a24 [Xiangrui Meng] add doc to BreezeVectorWithSquaredNorm
0107e19 [Xiangrui Meng] update NOTICE
87bc755 [Xiangrui Meng] tuned the KMeans code: changed some for loops to while, use view to avoid copying arrays
0ff8046 [Xiangrui Meng] update KMeans to use fastSquaredDistance
f355411 [Xiangrui Meng] add BreezeVectorWithSquaredNorm case class
ab74f67 [Xiangrui Meng] add fastSquaredDistance for KMeans
4e7d5ca [Xiangrui Meng] minor style update
07ffaf2 [Xiangrui Meng] add dense/sparse vector data models and conversions to/from breeze vectors use breeze to implement KMeans in order to support both dense and sparse data
Author: Matei Zaharia <matei@databricks.com>
Closes#174 from mateiz/update-notice and squashes the following commits:
47fc1a5 [Matei Zaharia] Update copyright year in NOTICE to 2014