代写Database | sql代写- Database Systems

代写Database | sql代写 : 这是一个数据相关的代写任务

Transaction

Management &

Indexing

CS411:  database Systems
1
Announcements
  • Midterm: 10/29 in class 11-12:15 pm
2
Todays lecture
  • TM: theory of serializability
  • Storage
  • Indexing
    • What is an index? Why do we need it?
    • B+ Trees
3
Locking and Serializability
  • We said that a transaction must hold all locks until it terminates (a condition called strictlocking)
  • It turns out that this is crucial to guarantee serializability
    • Note that the first (bad) example could have been produced if transactions acquired and immediately released locks.
Well-Formed, Two-Phased Transactions
  • A transaction is well-formed if it acquires at least a shared lock on Q before reading Q or an exclusive lock on Q before writing Q and doesnt release the lock until the action is performed – Locks are also released by the end of the transaction
  • A transaction is two-phasedif it never acquires a lock after unlocking one – i.e., there are two phases: a growing phasein which the transaction acquires locks, and a shrinking phasein which locks are released
Two-Phased Locking Theorem
  • If all transactions are well-formed and two-phase, then any schedule in which conflicting locks are never granted ensures serializability – i.e., there is a very simple scheduler!
  • However, if some transaction is not well-formed or two- phase, then there is some schedule in which conflicting locks are never granted but which fails to be serializable – i.e., one bad apple spoils the bunch
 2018 A. Alawini & A. Parameswaran
Two Phase Locking Protocol (2PL)

2PLis a way of managing locks during a transaction T

  • T gets (S and X) locks gradually, as needed
  • T cannot request any additional locks once it releases any locks
time 
# of locks
held by a
transaction T
No locks can be
requested after
releasing lock 5

######## 0

######## 2

######## 1

######## 4

######## 3

######## 5

7
 Lois Delcambre, Len Shapiro, David Maier
End of T
 2018 A. Alawini & A. Parameswaran
Strict Two Phase Locking Protocol
(S2PL)

Strict 2PLis a way of managing locks during a transaction T

  • T gets (S and X) locks gradually, as needed
  • T holds all locks until end of transaction (commit/abort)
time 
# of locks
held by a
transaction T
All locks
are released
at the end,
upon commit or abort
0

######## 2

######## 1

######## 4

######## 3

######## 5

8
 Lois Delcambre, Len Shapiro, David Maier
 2018 A. Alawini & A. Parameswaran
Which of these schedules can arise
under strict 2PL

######## T1: R(A), W(A)

######## T2: R(A), W(A), R(B)

######## T1: R(B), W(A), W(B)

######## T2: R(A), W(A), R(B)

######## T1: R(A), R(B), W(B)

######## T2: W(A)

######## T3: W(A) R(B)

9
 Lois Delcambre, Len Shapiro, David Maier
 2018 A. Alawini & A. Parameswaran
Strict 2PL guarantees serializability
  • 2PLvs.strict2PL
    • s2PL avoids cascading abort
  • Can prove that a Strict 2PL schedule is equivalent to the serial schedule in which each transaction runs instantaneously at the time that it commits
  • This is huge: A property of each transaction (2PL) implies a property of any set of transactions (serializability) No need to check serializabilityof specific schedules
  • Most DBMSs use 2PL to enforce serializability
10
 Lois Delcambre, Len Shapiro, David Maier
Summary of TM
  • Transactions are all-or-nothing units of work guaranteed despite concurrency or failures in the system.
  • Theoretically, the correctexecution of transactions is serializable (i.e. equivalent to some serial execution).
  • Practically, this may adversely affect throughput isolation levels.
  • With isolation levels, users can specify the level of incorrectness they are willing to tolerate.
Outline
TM: theory of serializability
  • Storage
  • Indexing
    • What is an index? Why do we need it?
    • B+ Trees
12
So far, weve been talking about
databases in abstract terms
13
How are they implemented?
  • How are relations stored?
  • How are queries run?
Storage
Simplified Computer Architecture
15
Processor
RegistersCacheOn-Chip
Main
Memory
Persistent
Storage
(Disk)

Speed (ns) 10 ns 102 ns 107 ns

Size KB GB TB

Cost of Accessing Data on Disk
16
Main
Memory

Speed 107 ns 105 ns 102 ns

Memory
Block size vs. record size
17
Block 4KB or 8KB

######## …

######## ~100B 1KB

Table X
OK. So how do we do simple stuff?

####### Lookups. Insertions. Deletions

18
Outline
TM: theory of serializability
Storage
  • Indexing
    • What is an index? Why do we need it?
    • B+ Trees
19
Indexing

####### One catch!

Indexes in databases
  • An index speeds up selections on the search key field(s)
  • Search key = any subset of the fields of a relation
    • Search key is notnecessarily the same as a key
  • Entries in an index: (k, r), where:
    • k = the search key
    • r = the record OR record id OR record ids OR pointers
Some terminology
  • Data file: has the data corresponding to a relation
  • Index file: has the index
  • File consists of smaller units called blocks

####### (e.g. of size 4 KB or 8 KB)

  • index blocks < # data blocks.

####### Index may even fit into main memory.

######## 23

Characteristics of Indexes
  • Clustered/unclustered
    • Clustered: keys sorted
    • Unclustered: keys unsorted
  • Dense/sparse
    • Dense = each record has an entry in the index
    • Sparse = only some records have
  • Primary/secondary
    • Primary = on the primary key
    • Secondary = on any attribute
Ex: Clustered, Dense Index
  • Clustered: File is sorted on the index attribute
  • Dense: sequence of (key,pointer) pairs
10
25
30
40
50
60
70
90
10
25
30
40
50
60
70
90

######## INDEX

######## BLOCK

######## FILE

######## BLOCK

####### INDEX DATA FILE

Clustered, Sparse Index
10
30
50
70
90
110
130
150
10
20
30
40
50
60
70
80
  • Sparse index: one key per data block, corresponding to

####### the lowest search key in that block

What if there are duplicate keys?
Clustered Index with Duplicate Keys

####### Dense index: point to the first record with that key

####### (must have a pointer for each new key)

10
20
30
40
50
60
70
80
10
10
10
20
20
20
30
40
Clustered Index with Duplicate Keys
  • Sparse index: pointer to lowest search key in each block

####### Search for 20

10
10
25
30
10
10
10
20
25
25
30
40
Clustered Index with Duplicate Keys
  • Better: pointer to lowest new search key in each block

####### Search for 20 again

10
20
25
30
10
10
10
20
25
25
30
40
Unclustered Indexes
  • Often for indexing other attributes than primary key
  • Can it be sparse?
10
10
20
20
20
30
30
30
10 20
10 30
10 30
20 20
30 10
35 20
40 10
50 30
Secondary
Summary Clustered vs. Unclustered Index
Index
entries
(Index File)
(Data file)
Data Records
Index
entries
Data Records
CLUSTERED UNCLUSTERED

Space

Look up time
Dense Sparse
Clustered
Unclustered
An Index is a Function!
f(what: key) = where: file block

######## 34

B+ Trees
B+ Trees
  • Intuition:
    • The index can be very large.
    • Index of index?
    • Index of index of index?
    • How best to create such a multi-level index?
  • B+ trees:
    • Textbook refers to B+ trees (a popular variant) as B-trees (as most people do)

####### Focus on the dense version:

####### applies to clustered and unclustered settings

######## 36

  • B+ Trees are trees with nodes with keys and pointers to:
    • Other nodes [if the node is an internal node]
    • Data Records [if the node is a leaf]
B+ Trees Basics

######## 30 120 240

######## [X , 30) [30, 120) [120, 240) [240, Y)

######## 40 50 60

######## 40 50 60

####### next leaf

Data records:
  • Internal node:
  • Leaf:
# pointers = # keys + 1
  • Parameter d = the degree; n = max keys
  • When n is even [this is our focus for simplicity]
    • each node has [d, 2d] keys (except root); n = 2d
  • At least half full at all times
    • d is the minimum amount it needs to be full.
B+ Trees Basics
B+ Tree Example

######## 39

######## 80

######## 20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

######## 10 15 18 20 30 40 50 60 65 80 85 90

####### d = 2

####### n = 2d = 4

####### Root can have 1 or more filled in keys

####### Rest have at least d

B+ Tree Design
  • How large should d be?
  • Example:
    • Key size = 4 bytes
    • Pointer size = 8 bytes
    • Block size = 4096 byes
  • 2d x 4 + (2d+1) x 8 <= 4096
  • d = 170; 2d = 340

####### So up to 340 records in leaf blocks

B+ Trees in Practice

Typical d: 100. Typical fill-factor: 66.5%.

average fanout= 66.5 * 2 =133

Typical capacities:

  • Height 4: 133^4 = 312,900,700 records
  • Height 3: 133^3 = 2,352,637 records

Can often hold top levels in main memory:

  • Level 1 = 1 page = 8 Kbytes
  • Level 2 = 133 pages = 1 Mbyte
  • Level 3 = 17,689 pages = 133 MBytes
Searching a B+ Tree

######## 42

  • Exact key values:
Start at the root;
Proceed down to the leaf

####### Select name

####### From people

####### 80 Where age = 25

######## 20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

######## 10 15 18 20 30 40 50 60 65 80 85 90

Searching a B+ Tree

######## 43

  • Range queries:
As above
Then sequential traversal using next leaf pointers
80

######## 20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

######## 10 15 18 20 30 40 50 60 65 80 85 90

####### Select name

####### From people

####### Where 20 <= age

####### and age <= 70

Insertion in a B+ Tree

######## 80

######## 20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

######## 10 15 18 20 30 40 50 60 65 80 85 90

####### Insert K=19

####### Assume d=2.

######## DATA BLOCKS

Insertion in a B+ Tree

######## 45

######## 80

######## 20 60 100 120 140

10 15 18 19 20 30 40 50 60 65 80 85 90

######## 10 15 18 19 20 30 40 50 60 65 80 85 90

####### After insertion

Insertion in a B+ Tree

######## 46

######## 80

######## 20 60 100 120 140

10 15 18 19 20 30 40 50 60 65 80 85 90

######## 10 15 18 19 20 30 40 50 60 65 80 85 90

####### Now insert 25

Insertion in a B+ Tree

######## 47

######## 80

######## 20 60 100 120 140

10 15 18 19 20 25 30 40 50 60 65 80 85 90

######## 10 15 18 19 20 25 30 40 60 65 80 85 90

####### After insertion

######## 50

Insertion in a B+ Tree

######## 48

######## 80

######## 20 60 100 120 140

10 15 18 19 20 25 30 40 50 60 65 80 85 90

######## 10 15 18 19 20 25 30 40 60 65 80 85 90

####### But now have to split!

######## 50

Insertion in a B+ Tree

######## 49

######## 80

######## 20 30 60 100 120 140

10 15 18 19 20 25 60 65 80 85 90

######## 10 15 18 19 20 25 30 40 60 65 80 85 90

####### After the split

######## 50

30 40 50
Deletion from a B+ Tree

######## 50

######## 80

######## 20 30 60 100 120 140

10 15 18 19 20 25 60 65 80 85 90

######## 10 15 18 19 20 25 30 40 60 65 80 85 90

####### Delete 30

######## 50

30 40 50
Deletion from a B+ Tree

######## 51

######## 80

######## 20 30 60 100 120 140

10 15 18 19 20 25 60 65 80 85 90

######## 10 15 18 19 20 25 40 60 65 80 85 90

####### After deleting 30

######## 50

40 50
May change to
40, or not
Deletion from a B+ Tree

######## 52

######## 80

######## 20 30 60 100 120 140

10 15 18 19 20 25 60 65 80 85 90

######## 10 15 18 19 20 25 40 60 65 80 85 90

####### Now delete 25

######## 50

40 50
Deletion from a B+ Tree

######## 53

######## 80

######## 20 30 60 100 120 140

10 15 18 19 20 60 65 80 85 90

######## 10 15 18 19 20 40 60 65 80 85 90

####### After deleting 25

####### Need to rebalance

####### Rotate

######## 50

40 50

####### Rotation in general can involve

####### either sibling, but here only the

####### left sibling can help

Deletion from a B+ Tree

######## 54

######## 80

######## 20 30 60 100 120 140

10 15 18 19 20 60 65 80 85 90

######## 10 15 18 19 20 40 50 60 65 80 85 90

40 50
Deletion from a B+ Tree

######## 55

######## 80

######## 19 30 60 100 120 140

10 15 18 19 20 60 65 80 85 90

######## 10 15 18 19 20 40 50 60 65 80 85 90

40 50
Deletion from a B+ Tree

######## 56

######## 80

######## 19 30 60 100 120 140

10 15 18 19 20 60 65 80 85 90

######## 10 15 18 19 20 40 60 65 80 85 90

####### Now delete 40

######## 50

40 50
Deletion from a B+ Tree

######## 57

######## 80

######## 19 30 60 100 120 140

10 15 18 19 20 60 65 80 85 90

######## 10 15 18 19 20 60 65 80 85 90

####### After deleting 40

####### Rotation not possible

######## 50

50
Need to merge nodes
Deletion from a B+ Tree

######## 58

######## 80

######## 19 60 100 120 140

10 15 18 19 20 50 60 65 80 85 90

######## 10 15 18 19 20 60 65 80 85 90

####### Final tree

######## 50

Efficiency

####### Operation Run time

####### Search

####### Insert

####### Delete

59
Next Lecture
  • Finish up the remaining part
  • HashTables
  • Learned Indexes (if time allows)
60

Leave a Reply

Your email address will not be published. Required fields are marked *