Quality-Aware Machine Learning

Oliver Kennedy

Jaroslaw Zola

Matthew Knepley

Fixing Data is Expensive

(or impossible)

The "right" fix depends on use case

Re-using already fixed data is dangerous.

Idea: Track Errors

Incomplete Databases store possibilities, not just certainties.

Goals

  • Statistically rigorous techniques for training classifiers, neural networks on incomplete databases.
  • Models incorporating incompleteness information.
    "I didn't have enough training data" should be an allowed prediction.
  • Incompleteness as an assist for model debugging.
    Which errors have the biggest impact on a prediction?
    Which errors best explain an incorrect prediction?