diff --git a/src/teaching/cse-7xx/2021sp.erb b/src/teaching/cse-7xx/2021sp.erb index fb0a06cb..52106d88 100644 --- a/src/teaching/cse-7xx/2021sp.erb +++ b/src/teaching/cse-7xx/2021sp.erb @@ -31,7 +31,7 @@ papers: url: http://www.vldb.org/pvldb/vol14/p507-chapman.pdf abstract: | Data processing pipelines that are designed to clean, transform and alter data in preparation for learning predictive models, have an impact on those models’ accuracy and performance, as well on other properties, such as model fairness. It is therefore important to provide developers with the means to gain an in-depth understanding of how the pipeline steps affect the data, from the raw input to training sets ready to be used for learning. While other efforts track creation and changes of pipelines of relational operators, in this work we analyze the typical operations of data preparation within a machine learning process, and provide infrastructure for generating very granular provenance records from it, at the level of individual elements within a dataset. Our contributions include:(i) the formal definition of a core set of preprocessing operators,and the definition of provenance patterns for each of them, and(ii) a prototype implementation of an application-level provenance capture library that works alongside Python. We report on provenance processing and storage overhead and scalability experiments,carried out over both real ML benchmark pipelines and over TCP-DI, and show how the resulting provenance can be used to answer a suite of provenance benchmark queries that underpin some of the developers’ debugging questions, as expressed on the Data ScienceStack Exchange. - - title: "The Scored Society: DUe Process for Automated Predictions" + - title: "The Scored Society: Due Process for Automated Predictions" url: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2376209 abstract: | Big Data is increasingly mined to rank and rate individuals. Predictive algorithms assess whether we are good credit risks, desirable employees, reliable tenants, valuable customers — or deadbeats, shirkers, menaces, and “wastes of time.” Crucial opportunities are on the line, including the ability to obtain loans, work, housing, and insurance. Though automated scoring is pervasive and consequential, it is also opaque and lacking oversight. In one area where regulation does prevail — credit — the law focuses on credit history, not the derivation of scores from data. @@ -74,6 +74,10 @@ claims: speaker: Bhavin - title: "The importance of Model Fairness and Interpretability in AI Systems (video)" name: Oliver + - title: "The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards" + name: Wei + - title: "The Scored Society: Due Process for Automated Predictions" + name: Ciaoxiao schedule: - date: Feb. 1 event: Introduction, Course Logistics, Scheduling @@ -106,8 +110,15 @@ schedule: - title: Responsible Data Management name: Bhavin - date: March 29 + event: No class - date: April 5 + speakers: + - title: "The Scored Society: Due Process for Automated Predictions" + name: Xiaoxiao - date: April 12 + speakers: + - title: "The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards" + name: Wei - date: April 19 - date: April 26 - date: May 3