This commit is contained in:
Oliver Kennedy 2024-06-17 10:29:58 -04:00
commit efd2988601
Signed by: okennedy
GPG key ID: 3E5F9B3ABD3FDB60
13 changed files with 54 additions and 2 deletions

View file

@ -1,4 +1,12 @@
[
{ "talk" : "Data Preparation with Vizier", "date" : "Apr. 2024",
"venue" : "New York University" },
{ "talk" : "Principled management of notebook state in Vizier", "date" : "Apr. 2024",
"venue" : "University of Illinois: Chicago" },
{ "talk" : "ASTral: A Declarative Compiler Compiler", "date" : "Mar. 2024",
"venue" : "University of Massachusetts: Dartmouth" },
{ "talk" : "Microkernel Notebooks", "date" : "Feb. 2023",
"venue" : "Cornell University" },
{ "talk" : "Panel: On the Multifaceted Impact of Artificial Intelligence in Healthcare: Past, Present, and Emerging Trends", "date" : "May 2022",
"venue" : "UP-STAT 2022" },
{ "talk" : "Caveatting your data: Adding explainability to incomplete datasets", "date" : "Feb 2022",

View file

@ -0,0 +1,21 @@
---
title: "ODIn @ HILDA '24"
author: Oliver Kennedy
---
'Grats to Pratik and Juseung on their #HILDA2024 accept for "Drag, Drop, Merge: A Tool for Streamlining Integration of Longitudinal Survey Instruments", which explores schema integration in longitudinal studies.
Longitudinal surveys, and specifically social sciences data collected through survey forms, are a really interesting case of schema integration.
The data being collected is, on the most fundamental level, about only a single class of entity.
However, each year brings new knowledge, and new context to the survey, necessitating changes.
For example, researchers might learn that the culture of the study population uses different names in different social contexts, necessitating a change to the survey to clarify the social context of the name being recorded.
Alternatively, researchers might adapt a choice of phrasing like "how many of your family members live nearby" into "how many people are in your support network" to better address the nuanced situations.
Even without changes to the survey itself, changing context can result in changing interpretations of participant answers.
For example, take a multiple-choice question about income levels.
A single answer at the start of a 20-year study may indicate a wildly different socioeconomic status than the exact same answer given in the last year of the study.
The problem of integrating many years of forms is fundamentally similar to data integration, but is in some ways easier (there are few changes between successive years), and in some ways harder (there are *many* such changes over the lifetime of the survey). Changes are also nuanced, with growing levels of divergence.
The paper lays the groundwork for a tool to help researchers conducting longitudinal studies to prepare their data for publication, and for researchers trying to use this study data to reliably develop derived, 'clean' datasets useful for the needs of their specific study.
**Side Note**: This paper is the result of a massively interdisciplinary collaboration between CS, Linguistics, Medicine, Stats (and soon-to-be Environmental Health). I'm really excited that we've hit on an opportunity to develop techniques that will benefit such a diverse range of fields of study.

View file

@ -0,0 +1,10 @@
---
title: PL/DB Sp 2024
author: Oliver Kennedy
---
With a talk from [Manos Athanassoulis](https://cs-people.bu.edu/mathan/) earlier this week, we've wrapped up another semester of the PL/DB seminar here at UB. We had a *really* fantastic lineup this year, including five guest speakers ([Jelle Hellings](https://jhellings.nl/), [Hannah Gommerstadt](https://www.cs.vassar.edu/~hgommerstadt/), [Ryan Kavanagh](https://rak.ac/) [Boris Glavic](https://www.cs.uic.edu/~bglavic/dbgroup/members/bglavic.html), and Manos).
Talks this semester spanned a range of different subjects, from distributed programming models, to indexing and data access methods, query processing, compiler optimization, and provenance. On the one hand, it's amazing to see such a diverse range of topics represented, On the other, it was also nifty to see students from across the board engaging with all of the speakers (student or otherwise).
Major props to [Andrew Hirsch](akhirsch.science), who is more/less single-handedly responsible for reviving and bringing new life into the seminar.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -93,6 +93,9 @@ schedule:
detail: The statistical tricks that allow expensive data statistics to be computed using Count-Min and Hyperloglog
docs:
slides: slide/15-sketches.html
flajolet_martin: papers/flajolet_martin.pdf
count: papers/frequent.pdf
count_min: papers/count_min.pdf
deliverables:
- item: "Project 0: Setup"
due: Jan 28
@ -124,20 +127,27 @@ deliverables:
submit: https://autolab.cse.buffalo.edu/courses/cse410-s24/assessments/P2-B-Trees
- item: "Written 2: B+ Tree Analysis"
due: Mar 17
links:
assignment: assignments/w2.pdf
- item: "Project 3: Joins"
due: Apr 9
links:
assignment: assignments/p3.pdf
- item: "Written 3: Joins Analysis"
due: Apr 9
links:
assignment: assignments/w3.pdf
- item: "Project 4: MiniDB"
due: May 5
- item: "Written 4: MiniDB Analysis"
due: May 12
links:
assignment: assignments/p4.pdf
dates:
- event: Midterm
dates: March 4, In Class
links:
review: slide/09-review.pdf
annotated: slide/09-review-annotated.pdf
rubric: assignments/midterm.pdf
- event: Oliver Traveling, No Class
dates: March 8
- event: Spring Break, No Class
@ -146,6 +156,9 @@ dates:
dates: April 16
- event: Final Exam
dates: May 10, 3:30-6:30
links:
review: slide/16-review.pdf
rubric: assignments/final.pdf
---
<style>
p {

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.