Updated seminar

This commit is contained in:
Oliver Kennedy 2021-03-23 11:28:39 -04:00
parent 0476162212
commit 4082bbd45e
Signed by: okennedy
GPG key ID: 3E5F9B3ABD3FDB60

View file

@ -31,7 +31,7 @@ papers:
url: http://www.vldb.org/pvldb/vol14/p507-chapman.pdf
abstract: |
Data processing pipelines that are designed to clean, transform and alter data in preparation for learning predictive models, have an impact on those models accuracy and performance, as well on other properties, such as model fairness. It is therefore important to provide developers with the means to gain an in-depth understanding of how the pipeline steps affect the data, from the raw input to training sets ready to be used for learning. While other efforts track creation and changes of pipelines of relational operators, in this work we analyze the typical operations of data preparation within a machine learning process, and provide infrastructure for generating very granular provenance records from it, at the level of individual elements within a dataset. Our contributions include:(i) the formal definition of a core set of preprocessing operators,and the definition of provenance patterns for each of them, and(ii) a prototype implementation of an application-level provenance capture library that works alongside Python. We report on provenance processing and storage overhead and scalability experiments,carried out over both real ML benchmark pipelines and over TCP-DI, and show how the resulting provenance can be used to answer a suite of provenance benchmark queries that underpin some of the developers debugging questions, as expressed on the Data ScienceStack Exchange.
- title: "The Scored Society: DUe Process for Automated Predictions"
- title: "The Scored Society: Due Process for Automated Predictions"
url: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2376209
abstract: |
Big Data is increasingly mined to rank and rate individuals. Predictive algorithms assess whether we are good credit risks, desirable employees, reliable tenants, valuable customers — or deadbeats, shirkers, menaces, and “wastes of time.” Crucial opportunities are on the line, including the ability to obtain loans, work, housing, and insurance. Though automated scoring is pervasive and consequential, it is also opaque and lacking oversight. In one area where regulation does prevail — credit — the law focuses on credit history, not the derivation of scores from data.
@ -74,6 +74,10 @@ claims:
speaker: Bhavin
- title: "The importance of Model Fairness and Interpretability in AI Systems (video)"
name: Oliver
- title: "The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards"
name: Wei
- title: "The Scored Society: Due Process for Automated Predictions"
name: Ciaoxiao
schedule:
- date: Feb. 1
event: Introduction, Course Logistics, Scheduling
@ -106,8 +110,15 @@ schedule:
- title: Responsible Data Management
name: Bhavin
- date: March 29
event: No class
- date: April 5
speakers:
- title: "The Scored Society: Due Process for Automated Predictions"
name: Xiaoxiao
- date: April 12
speakers:
- title: "The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards"
name: Wei
- date: April 19
- date: April 26
- date: May 3