spark-instrumented-optimizer/external
Wenchen Fan a71f6a1750 [SPARK-25414][SS][TEST] make it clear that the numRows metrics should be counted for each scan of the source
## What changes were proposed in this pull request?

For self-join/self-union, Spark will produce a physical plan which has multiple `DataSourceV2ScanExec` instances referring to the same `ReadSupport` instance. In this case, the streaming source is indeed scanned multiple times, and the `numInputRows` metrics should be counted for each scan.

Actually we already have 2 test cases to verify the behavior:
1. `StreamingQuerySuite.input row calculation with same V2 source used twice in self-join`
2. `KafkaMicroBatchSourceSuiteBase.ensure stream-stream self-join generates only one offset in log and correct metrics`.

However, in these 2 tests, the expected result is different, which is super confusing. It turns out that, the first test doesn't trigger exchange reuse, so the source is scanned twice. The second test triggers exchange reuse, and the source is scanned only once.

This PR proposes to improve these 2 tests, to test with/without exchange reuse.

## How was this patch tested?

test only change

Closes #22402 from cloud-fan/bug.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2018-09-20 00:29:48 +08:00
..
avro [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT 2018-09-15 16:24:02 -07:00
docker [SPARK-23038][TEST] Update docker/spark-test (JDK/OS) 2018-01-13 23:26:12 -08:00
docker-integration-tests [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT 2018-09-15 16:24:02 -07:00
flume [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT 2018-09-15 16:24:02 -07:00
flume-assembly [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT 2018-09-15 16:24:02 -07:00
flume-sink [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT 2018-09-15 16:24:02 -07:00
kafka-0-8 [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT 2018-09-15 16:24:02 -07:00
kafka-0-8-assembly [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT 2018-09-15 16:24:02 -07:00
kafka-0-10 [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT 2018-09-15 16:24:02 -07:00
kafka-0-10-assembly [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT 2018-09-15 16:24:02 -07:00
kafka-0-10-sql [SPARK-25414][SS][TEST] make it clear that the numRows metrics should be counted for each scan of the source 2018-09-20 00:29:48 +08:00
kinesis-asl [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT 2018-09-15 16:24:02 -07:00
kinesis-asl-assembly [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT 2018-09-15 16:24:02 -07:00
spark-ganglia-lgpl [SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT 2018-09-15 16:24:02 -07:00