2e981b7bfa
This pull request adds a destructAndCreateExternalSorter method to UnsafeFixedWidthAggregationMap. The new method does the following: 1. Creates a new external sorter UnsafeKVExternalSorter 2. Adds all the data into an in-memory sorter, sorts them 3. Spills the sorted in-memory data to disk This method can be used to fallback to sort-based aggregation when under memory pressure. The pull request also includes accounting fixes from JoshRosen. TODOs (that can be done in follow-up PRs) - [x] Address Josh's feedbacks from #7849 - [x] More documentation and test cases - [x] Make sure we are doing memory accounting correctly with test cases (e.g. did we release the memory in BytesToBytesMap twice?) - [ ] Look harder at possible memory leaks and exception handling - [ ] Randomized tester for the KV sorter as well as the aggregation map Author: Reynold Xin <rxin@databricks.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #7860 from rxin/kvsorter and squashes the following commits: 986a58c [Reynold Xin] Bug fix. 599317c [Reynold Xin] Style fix and slightly more compact code. fe7bd4e [Reynold Xin] Bug fixes. fd71bef [Reynold Xin] Merge remote-tracking branch 'josh/large-records-in-sql-sorter' into kvsorter-with-josh-fix 3efae38 [Reynold Xin] More fixes and documentation. 45f1b09 [Josh Rosen] Ensure that spill files are cleaned up f6a9bd3 [Reynold Xin] Josh feedback. 9be8139 [Reynold Xin] Remove testSpillFrequency. 7cbe759 [Reynold Xin] [SPARK-9531][SQL] UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter. ae4a8af [Josh Rosen] Detect leaked unsafe memory in UnsafeExternalSorterSuite. 52f9b06 [Josh Rosen] Detect ShuffleMemoryManager leaks in UnsafeExternalSorter.
140 lines
4.7 KiB
XML
140 lines
4.7 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
~ Licensed to the Apache Software Foundation (ASF) under one or more
|
|
~ contributor license agreements. See the NOTICE file distributed with
|
|
~ this work for additional information regarding copyright ownership.
|
|
~ The ASF licenses this file to You under the Apache License, Version 2.0
|
|
~ (the "License"); you may not use this file except in compliance with
|
|
~ the License. You may obtain a copy of the License at
|
|
~
|
|
~ http://www.apache.org/licenses/LICENSE-2.0
|
|
~
|
|
~ Unless required by applicable law or agreed to in writing, software
|
|
~ distributed under the License is distributed on an "AS IS" BASIS,
|
|
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
~ See the License for the specific language governing permissions and
|
|
~ limitations under the License.
|
|
-->
|
|
|
|
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
|
<modelVersion>4.0.0</modelVersion>
|
|
<parent>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-parent_2.10</artifactId>
|
|
<version>1.5.0-SNAPSHOT</version>
|
|
<relativePath>../../pom.xml</relativePath>
|
|
</parent>
|
|
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-sql_2.10</artifactId>
|
|
<packaging>jar</packaging>
|
|
<name>Spark Project SQL</name>
|
|
<url>http://spark.apache.org/</url>
|
|
<properties>
|
|
<sbt.project.name>sql</sbt.project.name>
|
|
</properties>
|
|
|
|
<dependencies>
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-core_${scala.binary.version}</artifactId>
|
|
<version>${project.version}</version>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-core_${scala.binary.version}</artifactId>
|
|
<version>${project.version}</version>
|
|
<type>test-jar</type>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-catalyst_${scala.binary.version}</artifactId>
|
|
<version>${project.version}</version>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.spark</groupId>
|
|
<artifactId>spark-catalyst_${scala.binary.version}</artifactId>
|
|
<version>${project.version}</version>
|
|
<type>test-jar</type>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.parquet</groupId>
|
|
<artifactId>parquet-column</artifactId>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.parquet</groupId>
|
|
<artifactId>parquet-hadoop</artifactId>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>com.fasterxml.jackson.core</groupId>
|
|
<artifactId>jackson-databind</artifactId>
|
|
<version>${fasterxml.jackson.version}</version>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>junit</groupId>
|
|
<artifactId>junit</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.scalacheck</groupId>
|
|
<artifactId>scalacheck_${scala.binary.version}</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>com.h2database</groupId>
|
|
<artifactId>h2</artifactId>
|
|
<version>1.4.183</version>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>mysql</groupId>
|
|
<artifactId>mysql-connector-java</artifactId>
|
|
<version>5.1.34</version>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.postgresql</groupId>
|
|
<artifactId>postgresql</artifactId>
|
|
<version>9.3-1102-jdbc41</version>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.apache.parquet</groupId>
|
|
<artifactId>parquet-avro</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
<dependency>
|
|
<groupId>org.mockito</groupId>
|
|
<artifactId>mockito-core</artifactId>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
</dependencies>
|
|
<build>
|
|
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
|
|
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
|
|
<plugins>
|
|
<plugin>
|
|
<groupId>org.codehaus.mojo</groupId>
|
|
<artifactId>build-helper-maven-plugin</artifactId>
|
|
<executions>
|
|
<execution>
|
|
<id>add-scala-test-sources</id>
|
|
<phase>generate-test-sources</phase>
|
|
<goals>
|
|
<goal>add-test-source</goal>
|
|
</goals>
|
|
<configuration>
|
|
<sources>
|
|
<source>src/test/gen-java</source>
|
|
</sources>
|
|
</configuration>
|
|
</execution>
|
|
</executions>
|
|
</plugin>
|
|
</plugins>
|
|
</build>
|
|
</project>
|