diff --git a/src/teaching/cse-562/2019sp/index.erb b/src/teaching/cse-562/2019sp/index.erb index d54bda22..afb5df96 100644 --- a/src/teaching/cse-562/2019sp/index.erb +++ b/src/teaching/cse-562/2019sp/index.erb @@ -44,6 +44,8 @@ schedule: topic: "Physical Data Layout" note: Serialization, Paging, Columnar, Buffer Manager textbook: "Ch. 13.1-13.7, 15.7, 16.7" + materials: + slides: "slide/2019-02-15-Physical.html" - date: "Feb 18" topic: "Indexing (Intro + Tree Indexes)" textbook: "Ch. 8.3-8.4, 14.1-14.2, 14.4" @@ -53,8 +55,8 @@ schedule: note: Learned indexes, LSM Trees textbook: "Ch. 14.3" materials: - bLSMTrees: "https://dl.acm.org/citation.cfm?id=2213862" - LearnedIndexes: "https://dl.acm.org/ft_gateway.cfm?id=3196909&type=pdf" + bLSMTrees: "https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?id=2213862" + LearnedIndexes: "https://dl-acm-org.gate.lib.buffalo.edu/ft_gateway.cfm?id=3196909&type=pdf" - date: "Feb 22" topic: "External (2-Pass) Algorithms" textbook: "Ch. 15.4-15.5, 15.8" diff --git a/src/teaching/cse-562/2019sp/slide/2019-02-15-Physical.html b/src/teaching/cse-562/2019sp/slide/2019-02-15-Physical.html index e7237f29..c7bec1bc 100644 --- a/src/teaching/cse-562/2019sp/slide/2019-02-15-Physical.html +++ b/src/teaching/cse-562/2019sp/slide/2019-02-15-Physical.html @@ -1,281 +1,12 @@ - - - - - - - CSE 4/562 - Spring 2018 - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- - CSE 4/562 - Database Systems -
- -
- -
-

SQL &
Physical Layout

-

CSE 4/562 – Database Systems

-
January 31, 2018
-
- -
-
-

SQL

-
    -
  • Developed by IBM (for System R) in the 1970s.
  • -
  • Standard used by many vendors.
      -
    • SQL-86 (original standard)
    • -
    • SQL-89 (minor revisions; integrity constraints)
    • -
    • SQL-92 (major revision; basis for modern SQL)
    • -
    • SQL-99 (XML, window queries, generated default values)
    • -
    • SQL 2003 (major revisions to XML support)
    • -
    • SQL 2008 (minor extensions)
    • -
    • SQL 2011 (minor extensions; temporal databases)
    • -
    -
-
- -
-

A Basic SQL Query

- -
- -
-

-            SELECT  [DISTINCT] targetlist
-            FROM    relationlist
-            WHERE   condition
-          
-
    -
  1. Compute the $2^n$ combinations of tuples in all relations appearing in relationlist
  2. -
  3. Discard tuples that fail the condition
  4. -
  5. Delete attributes not in targetlist
  6. -
  7. If DISTINCT is specified, eliminate duplicate rows
  8. -
-

- This is the least efficient strategy to compute a query! - A good optimizer will find more efficient strategies to compute the same answer. -

-
- -
-

Example Data

- -
- -
-
SELECT * FROM Trees;
- -

Wildcards (*, tablename.*) are special targets that select all attributes.

- -
- - - - - - - - -
CREATED_ATTREE_IDBLOCK_IDTHE_GEOMTREE_DBHSTUMP_DIAMCURB_LOCSTATUSHEALTHSPC_LATINSPC_COMMONSTEWARDGUARDSSIDEWALKUSER_TYPEPROBLEMSROOT_STONEROOT_GRATEROOT_OTHERTRNK_WIRETRNK_LIGHTTRNK_OTHERBRNCH_LIGHBRNCH_SHOEBRNCH_OTHEADDRESSZIPCODEZIP_CITYCB_NUMBOROCODEBORONAMECNCLDISTST_ASSEMST_SENATENTANTA_NAMEBORO_CTSTATELATITUDELONGITUDEX_SPY_SP
'08/27/2015'180683348711'POINT (-73.84421521958048 40.723091773924274)'30'OnCurb''Alive''Fair''Acer rubrum''red maple''None''None''NoDamage''TreesCount Staff''None''No''No''No''No''No''No''No''No''No''108-005 70 AVENUE''11375''Forest Hills'4064'Queens'292816'QN17''Forest Hills'4073900'New York'40.72309177-73.844215221027431.14821202756.768749
'09/03/2015'200540315986'POINT (-73.81867945834878 40.79411066708779)'210'OnCurb''Alive''Fair''Quercus palustris''pin oak''None''None''Damage''TreesCount Staff''Stones''Yes''No''No''No''No''No''No''No''No''147-074 7 AVENUE''11357''Whitestone'4074'Queens'192711'QN49''Whitestone'4097300'New York'40.79411067-73.818679461034455.70109228644.837379
'09/05/2015'204026218365'POINT (-73.93660770459083 40.717580740099116)'30'OnCurb''Alive''Good''Gleditsia triacanthos var. inermis''honeylocust''1or2''None''Damage''Volunteer''None''No''No''No''No''No''No''No''No''No''390 MORGAN AVENUE''11211''Brooklyn'3013'Brooklyn'345018'BK90''East Williamsburg'3044900'New York'40.71758074-73.93660771001822.83131200716.891267
'09/05/2015'204337217969'POINT (-73.93445615919741 40.713537494833226)'100'OnCurb''Alive''Good''Gleditsia triacanthos var. inermis''honeylocust''None''None''Damage''Volunteer''Stones''Yes''No''No''No''No''No''No''No''No''1027 GRAND STREET''11211''Brooklyn'3013'Brooklyn'345318'BK90''East Williamsburg'3044900'New York'40.71353749-73.934456161002420.35833199244.253136
'08/30/2015'189565223043'POINT (-73.97597938483258 40.66677775537875)'210'OnCurb''Alive''Good''Tilia americana''American linden''None''None''Damage''Volunteer''Stones''Yes''No''No''No''No''No''No''No''No''603 6 STREET''11215''Brooklyn'3063'Brooklyn'394421'BK37''Park Slope-Gowanus'3016500'New York'40.66677776-73.97597938990913.775046182202.425999
... and 683783 more
-
-
- -
-

-            SELECT tree_id, spc_common, boroname
-            FROM Trees
-            WHERE boroname = 'Brooklyn'
-          
- -

In English, what does this query compute?

-

What is the ID, Commmon Name and Borough of Trees in Brooklyn?

- - - - - - - - - -
TREE_IDSPC_COMMONBORONAME
204026'honeylocust''Brooklyn'
204337'honeylocust''Brooklyn'
189565'American linden''Brooklyn'
192755'London planetree''Brooklyn'
189465'London planetree''Brooklyn'
... and 177287 more
-
- -
-

-      SELECT latitude, longitude 
-      FROM Trees, SpeciesInfo
-      WHERE Trees.spc_common = SpeciesInfo.name
-        AND SpeciesInfo.has_unpleasant_smell = 'Yes';
-          
- -

In English, what does this query compute?

-

What are the coordinates of Trees with bad smells?

- - - - - - - - - -
LATITUDELONGITUDE
40.59378755-73.9915968
40.69149917-73.97258754
40.74829709-73.98065645
40.68767857-73.96764605
40.739991-73.86526993
... and more
-
- -
-

-      SELECT Trees.latitude, Trees.longitude 
-      FROM Trees, SpeciesInfo
-      WHERE Trees.spc_common = SpeciesInfo.name
-        AND SpeciesInfo.has_unpleasant_smell = 'Yes';
-          
- -

... is the same as ...

- -

-      SELECT T.latitude, T.longitude 
-      FROM Trees T, SpeciesInfo S
-      WHERE T.spc_common = S.name
-        AND S.has_unpleasant_smell = 'Yes';
-          
- -

... is (usually) the same as ...

- -

-      SELECT latitude, longitude 
-      FROM Trees, SpeciesInfo
-      WHERE spc_common = name
-        AND has_unpleasant_smell = 'Yes';
-          
- -
- -
-

Expressions

- -

-            SELECT tree_id, 
-                   stump_diam / 2 AS stump_radius,
-                   stump_area = 3.14 * stump_diam * stump_diam / 4
-            FROM Trees;
-          
- -

- Arithmetic expressions can appear in targets or conditions. - Use ‘=’ or ‘AS’ to assign names to these attributes. - (The behavior of unnamed attributes is unspecified) -

-
- -
-

Expressions

- -

-  SELECT tree_id, spc_common FROM Trees WHERE spc_common LIKE '%maple'
-          
- - - - - - - -
TREE_IDSPC_COMMON
180683'red maple'
204325'sycamore maple'
205044'Amur maple'
184031'red maple'
208974'red maple'
-

SQL uses single quotes for ‘string literals’

-

LIKE is used for String Matches

-

%’ matches 0 or more characters

-
- -
-

Union

-

-    SELECT tree_id FROM Trees WHERE spc_common = 'red maple'
-    UNION [ALL]
-    SELECT tree_id FROM Trees WHERE spc_common = 'sycamore maple'
-          
-

Computes the set-union of any two union-compatible sets of tuples

-

Adding ALL preserves duplicates across the inputs (bag-union).

-
- -
-

Aggregate Queries

-

-    SELECT [DISTINCT] targetlist
-    FROM relationlist
-    WHERE condition
-    GROUP BY groupinglist
-    HAVING groupcondition
-          
-
-

The targetlist now contains (a) Grouped attributes, and (b)Aggregate expressions.

-

Targets of type (a) must be a subset of the grouping-list

-

(intuitively each answer tuple corresponds to a single group, and each group must have a single value for each attribute)

-

groupcondition is applied after aggregation and may contain aggregate expressions.

-
-
- -
-

Aggregate Queries

-

-    SELECT spc_common, count(*) FROM Trees GROUP BY spc_common
-          
- - - - - - - - -
SPC_COMMON COUNT
''Schubert' chokecherry' 4888
'American beech' 273
'American elm' 7975
'American hophornbeam' 1081
'American hornbeam' 1517
... and more
- -
- -
- -
-
-

Physical Layout

-
- -
-

+---
+template: templates/cse4562_2019_slides.erb
+title: "Physical Layout & Memory Management"
+date: February 15, 2019
+textbook: "Ch. 13.1-13.7, 15.7, 16.7"
+---
+
+
+

   from re import split;
 
   with open('Trees.csv', 'r') as f:
@@ -283,168 +14,242 @@
       fields = split(",", line);
       if(fields[30] == 'Brooklyn'):
         print(fields[0]);
-          
- -
+
+ +
-
-

Record Layouts

-
+
+

Record Layouts

-
-

Record Layout 1: Fixed

- -
+

How is data stored?

+
-
-

Record Layout 2: Delimiters

- -
+
+

Problem 1: How should you encode one tuple?

+
-
-

Record Layout 2: Headers

- -
+
+

Record Layout 1: Fixed

+ +
-
-

Record Formats

-
-
Fixed
-
Constant-size fields. Field $i$ at byte $\sum_{j < i} |Field_j|$
-
Delimited
-
Special character or string (e.g., ,) between fields
-
Header
-
Fixed-size header points to start of each field
-
 
-
 
-
-
+
+

Record Layout 2: Delimiters

+ +
-
-

File Formats

-
-
Fixed
-
Constant-size records. Record $i$ at byte $|Record| \times i$
-
Delimited
-
Special character or string (e.g., \r\n) at record end
-
Header
-
Index in file points to start of each record
-
Paged
-
Align records to paging boundaries
-
-
+
+

Record Layout 2: Headers

+ +
-
- -
+
+

Record Formats

+
+
Fixed
+
Constant-size fields. Field $i$ at byte $\sum_{j < i} |Field_j|$
+
Delimited
+
Special character or string (e.g., ,) between fields
+
Header
+
Fixed-size header points to start of each field
+
 
+
 
+
+
-
- - openclipart.org -
+
+

Problem 2: How should you encode a file of tuples?

+
-
-
-
File
-
A collection of pages (or records)
-
Page
-
A fixed-size collection of records
-
Page size is usually dictated by hardware.
Mem Page $\approx$ 4KB   Cache Line $\approx$ 64B
-
Record
-
One or more fields (for now)
-
Field
-
A primitive value (for now)
-
-
+
+

File Formats

+
+
Fixed
+
Constant-size records. Record $i$ at byte $|Record| \times i$
+
Delimited
+
Special character or string (e.g., \r\n) at record end
+
Header
+
Index in file points to start of each record
+
Paged
+
Align records to paging boundaries
+
+
-
- -
+
+ +
-
-

-  with db_open('Trees') as data:
-    for record in data:
-      if(record['BORONAME'] == 'Brooklyn'):
-        print(record['TREE_ID']);
-          
-
-
+
+ + openclipart.org +
+ + +
+
+
+
File
+
A collection of pages (or records)
+
Page
+
A fixed-size collection of records
+
Page size is usually dictated by hardware.
Mem Page $\approx$ 4KB   Cache Line $\approx$ 64B
+
Record
+
One or more fields (for now)
+
Field
+
A primitive value (for now)
+
+
+ +
+

Problem 2.b: How should you store records in a page?

+

Key question: Where in the page is record #43?

+
+ +
+

Fixed size records: $43 \cdot |\texttt{record}|$

+
+ +
+ +
+ +
+ +

Why store the key and records from opposite ends?

+
+ +
+ +
+ +
+

Problem 3: How should you organize pages in a file?

+

Key question: What happens when all records on a page are deleted?

+

Idea: Track empty pages.

+
+
+ +
+
+ +
+
+ +
+
+

An Alternative Layout

+
+ +
+

Row-Wise Layouts

+ +
+
+

Column-Wise Layouts

+ +
+ +
+

Each file stores 2-tuples $\left< RowID, Value\right >$.

+

Values only for one attribute.

+
+ +
+

Benefits

+
    +
  • Only one attribute to sort per file.
  • +
  • No IO cost for unused attributes ($\pi$-pushdown!)
  • +
+

Drawbacks

+
    +
  • Result attributes must be stitched back together ($\bowtie$)
  • +
+

Great for wide, rarely-updated tables where only a few attributes are used per-query

+
+ +
+

Example Column Stores

+ +

+ Cassandra logo + Vertica pos blk rgb.svg + MonetDB logo +

+ By Apache Software Foundation - https://svn.apache.org/repos/asf/cassandra/logo/cassandra.svg, Apache License 2.0, Link
+ By Ariolica - Own work, CC BY-SA 4.0, Link
+ By Source (WP:NFCC#4), Fair use, Link
+
+
+ +
+
+

Buffer Manager

+

Abstract the messy details of File-IO

+
+ +
+ + openclipart.org +
+ +
+
+
Frame
+
A "slot" managed by the buffer manager that holds one page.
+ +
Pinned Page
+
A page currently in use by part of the database. Must stay in its current frame until unpinned. (A page may be pinned multiple times)
+ +
Dirty Page
+
A page that has been modified since it was last read in.
+
+
+ +
+

When a page is requested

+ +

Is the page in the buffer pool? +

    +
  • Yes? Pin the page (again) and return the address.
  • +
  • No?
      +
    • Pick a frame for replacement with your favorite algorithm (e.g., LRU)...
    • +
    • If the frame is dirty, write it to disk
    • +
    • Read requested page into chosen frame
    • +
    • Pin the page and return its address
    • +
  • +
+
+ +
+

Does this all sound familiar?

+

Isn't this just Virtual Memory?

+
+ +
+

Yes!

+

(Many databases use memory-mapped files as a buffer manager)

+
+ +
+

Why Re-implement VMem?

+
+ +

Databases can predict the future!

+
+

SELECT * FROM R WHERE A > 500 AND A < 2000   →    Pages 10-12

+
+ +
+

How do we decide which pages hold the query results?

+

Answers next class!

+
+ +
-
- - - - - - - - diff --git a/src/teaching/cse-562/2019sp/slide/graphics/2018-01-29-db_as_mediator.svg b/src/teaching/cse-562/2019sp/slide/graphics/2018-01-29-db_as_mediator.svg deleted file mode 100644 index 525de239..00000000 --- a/src/teaching/cse-562/2019sp/slide/graphics/2018-01-29-db_as_mediator.svg +++ /dev/null @@ -1,962 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - image/svg+xml - - - - - - - - - - - - CSV - - - - ProtoBuff - - - - ??? - - - - - - Python - - ... - - - SQL - - - - - - Database - - - diff --git a/src/teaching/cse-562/2019sp/slide/graphics/2018-01-31-mem_hierarchy.png b/src/teaching/cse-562/2019sp/slide/graphics/2018-01-31-mem_hierarchy.png new file mode 100644 index 00000000..6f853456 Binary files /dev/null and b/src/teaching/cse-562/2019sp/slide/graphics/2018-01-31-mem_hierarchy.png differ diff --git a/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Buffer-Manager.svg b/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Buffer-Manager.svg new file mode 100644 index 00000000..8215b798 --- /dev/null +++ b/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Buffer-Manager.svg @@ -0,0 +1,577 @@ + + + +image/svg+xml +Higher levels of the DB +Disk Page +Free Frame +Pages allocated to frames as per + +page replacement policy + \ No newline at end of file diff --git a/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Heap-File-1.svg b/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Heap-File-1.svg new file mode 100644 index 00000000..53f2874f --- /dev/null +++ b/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Heap-File-1.svg @@ -0,0 +1,390 @@ + + + +image/svg+xml +ø +ø +Pages with + +Data +Empty +Pages +Directory +Page +Each page contains 2 pointers plus data + \ No newline at end of file diff --git a/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Heap-File-2.svg b/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Heap-File-2.svg new file mode 100644 index 00000000..0d791aeb --- /dev/null +++ b/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Heap-File-2.svg @@ -0,0 +1,269 @@ + + + +image/svg+xml +Directory +Pages +Directories are a collection of pages (e.g., a linked list) + +Directories point to all data pages + +(entries can include # of free pages) + +Data + +Pages + \ No newline at end of file diff --git a/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Layout-ColumnWise.svg b/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Layout-ColumnWise.svg new file mode 100644 index 00000000..14aa294f --- /dev/null +++ b/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Layout-ColumnWise.svg @@ -0,0 +1,473 @@ + + + + + + + + + + image/svg+xml + + + + + + + + + A1 + + + + B1 + + + + C1 + + + + D1 + + + + A2 + + + + B2 + + + + C2 + + + + D2 + + + + A3 + + + + B3 + + + + C3 + + + + D3 + + + + A4 + + + + B4 + + + + C4 + + + + D4 + + File 1 + File 2 + File 3 + File 4 + + diff --git a/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Layout-RowWise.svg b/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Layout-RowWise.svg new file mode 100644 index 00000000..667af648 --- /dev/null +++ b/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Layout-RowWise.svg @@ -0,0 +1,416 @@ + + + + + + + + + + image/svg+xml + + + + + + + + + A1 + + + + B1 + + + + C1 + + + + D1 + + + + A2 + + + + B2 + + + + C2 + + + + D2 + + + + A3 + + + + B3 + + + + C3 + + + + D3 + + + + A4 + + + + B4 + + + + C4 + + + + D4 + + + diff --git a/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Page-Layouts-1.svg b/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Page-Layouts-1.svg new file mode 100644 index 00000000..e44c25ec --- /dev/null +++ b/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Page-Layouts-1.svg @@ -0,0 +1,729 @@ + + + +image/svg+xml +6 +1 +2 +3 +4 +5 +6 +7 +8 + +N +1 +2 +3 +4 +5 +6 +7 +8 + +N +N +01101011 + +Packed +Unpacked (Bitmap) +Number of Records +Bit array of occupied slots(and size of page) +Data Records +Free Space + \ No newline at end of file diff --git a/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Page-Layouts-2.svg b/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Page-Layouts-2.svg new file mode 100644 index 00000000..7f1a4b42 --- /dev/null +++ b/src/teaching/cse-562/2019sp/slide/graphics/2019-02-15-Page-Layouts-2.svg @@ -0,0 +1,385 @@ + + + +image/svg+xml1 2 3 4 … +R1 +R2 +R3 +Variable Size Records +Pointer to start of free space + \ No newline at end of file diff --git a/src/teaching/cse-562/2019sp/slide/graphics/Clipart/BenBois-Magic-ball.svg b/src/teaching/cse-562/2019sp/slide/graphics/Clipart/BenBois-Magic-ball.svg new file mode 100644 index 00000000..a2e740f4 --- /dev/null +++ b/src/teaching/cse-562/2019sp/slide/graphics/Clipart/BenBois-Magic-ball.svg @@ -0,0 +1,405 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + Openclipart + + + Magic ball + 2007-10-30T22:38:14 + + https://openclipart.org/detail/7655/magic-ball-by-benbois + + + BenBois + + + + + ball + crystal + magic + orb + + + + + + + + + + +