代写Database | sql代写- Database Systems - 学霸代写 - CS代写, 程序代写, CS作业代写, 代码代写, CS编程代写, java代写, python代写, c++/c代写, R代写, 算法作业代写, web代写, CS assignment代写, MATH代写, 统计代写, 金融代写, business代写, economic, accounting代写等

代写Database | sql代写 : 这是一个数据相关的代写任务

Transaction

Management &

Indexing

CS411:  database Systems

Announcements

Midterm: 10/29 in class 11-12:15 pm

Todays lecture

TM: theory of serializability
Storage
Indexing
- What is an index? Why do we need it?
- B+ Trees

Locking and Serializability

We said that a transaction must hold all locks until it terminates (a condition called strictlocking)
It turns out that this is crucial to guarantee serializability
- Note that the first (bad) example could have been produced if transactions acquired and immediately released locks.

Well-Formed, Two-Phased Transactions

A transaction is well-formed if it acquires at least a shared lock on Q before reading Q or an exclusive lock on Q before writing Q and doesnt release the lock until the action is performed – Locks are also released by the end of the transaction
A transaction is two-phasedif it never acquires a lock after unlocking one – i.e., there are two phases: a growing phasein which the transaction acquires locks, and a shrinking phasein which locks are released

Two-Phased Locking Theorem

If all transactions are well-formed and two-phase, then any schedule in which conflicting locks are never granted ensures serializability – i.e., there is a very simple scheduler!
However, if some transaction is not well-formed or two- phase, then there is some schedule in which conflicting locks are never granted but which fails to be serializable – i.e., one bad apple spoils the bunch

 2018 A. Alawini & A. Parameswaran

Two Phase Locking Protocol (2PL)

2PLis a way of managing locks during a transaction T

T gets (S and X) locks gradually, as needed
T cannot request any additional locks once it releases any locks

time

# of locks
held by a
transaction T

No locks can be
requested after
releasing lock 5

######## 0

######## 2

######## 1

######## 4

######## 3

######## 5

7
 Lois Delcambre, Len Shapiro, David Maier

End of T

 2018 A. Alawini & A. Parameswaran

Strict Two Phase Locking Protocol

(S2PL)

Strict 2PLis a way of managing locks during a transaction T

T gets (S and X) locks gradually, as needed
T holds all locks until end of transaction (commit/abort)

time

# of locks
held by a
transaction T

All locks
are released
at the end,
upon commit or abort

######## 2

######## 1

######## 4

######## 3

######## 5

8
 Lois Delcambre, Len Shapiro, David Maier

 2018 A. Alawini & A. Parameswaran

Which of these schedules can arise

under strict 2PL

######## T1: R(A), W(A)

######## T2: R(A), W(A), R(B)

######## T1: R(B), W(A), W(B)

######## T2: R(A), W(A), R(B)

######## T1: R(A), R(B), W(B)

######## T2: W(A)

######## T3: W(A) R(B)

9
 Lois Delcambre, Len Shapiro, David Maier

 2018 A. Alawini & A. Parameswaran

Strict 2PL guarantees serializability

2PLvs.strict2PL
- s2PL avoids cascading abort
Can prove that a Strict 2PL schedule is equivalent to the serial schedule in which each transaction runs instantaneously at the time that it commits
This is huge: A property of each transaction (2PL) implies a property of any set of transactions (serializability) No need to check serializabilityof specific schedules
Most DBMSs use 2PL to enforce serializability

10
 Lois Delcambre, Len Shapiro, David Maier

Summary of TM

Transactions are all-or-nothing units of work guaranteed despite concurrency or failures in the system.
Theoretically, the correctexecution of transactions is serializable (i.e. equivalent to some serial execution).
Practically, this may adversely affect throughput isolation levels.
With isolation levels, users can specify the level of incorrectness they are willing to tolerate.

Outline

TM: theory of serializability

Storage
Indexing
- What is an index? Why do we need it?
- B+ Trees

So far, weve been talking about

databases in abstract terms

How are they implemented?

How are relations stored?
How are queries run?

Storage

Simplified Computer Architecture

Processor

RegistersCacheOn-Chip
Main
Memory

Persistent
Storage
(Disk)

Speed (ns) 10 ns 102 ns 107 ns

Size KB GB TB

Cost of Accessing Data on Disk

Main
Memory

Speed 107 ns 105 ns 102 ns

Memory

Block size vs. record size

Block 4KB or 8KB

######## …

######## ~100B 1KB

Table X

OK. So how do we do simple stuff?

####### Lookups. Insertions. Deletions

Outline

TM: theory of serializability

Storage

Indexing
- What is an index? Why do we need it?
- B+ Trees

Indexing

####### One catch!

Indexes in databases

An index speeds up selections on the search key field(s)
Search key = any subset of the fields of a relation
- Search key is notnecessarily the same as a key
Entries in an index: (k, r), where:
- k = the search key
- r = the record OR record id OR record ids OR pointers

Some terminology

Data file: has the data corresponding to a relation
Index file: has the index
File consists of smaller units called blocks

####### (e.g. of size 4 KB or 8 KB)

index blocks < # data blocks.

####### Index may even fit into main memory.

######## 23

Characteristics of Indexes

Clustered/unclustered
- Clustered: keys sorted
- Unclustered: keys unsorted
Dense/sparse
- Dense = each record has an entry in the index
- Sparse = only some records have
Primary/secondary
- Primary = on the primary key
- Secondary = on any attribute

Ex: Clustered, Dense Index

Clustered: File is sorted on the index attribute
Dense: sequence of (key,pointer) pairs

######## INDEX

######## BLOCK

######## FILE

######## BLOCK

####### INDEX DATA FILE

Clustered, Sparse Index

Sparse index: one key per data block, corresponding to

####### the lowest search key in that block

What if there are duplicate keys?

Clustered Index with Duplicate Keys

####### Dense index: point to the first record with that key

####### (must have a pointer for each new key)

Clustered Index with Duplicate Keys

Sparse index: pointer to lowest search key in each block

####### Search for 20

Clustered Index with Duplicate Keys

Better: pointer to lowest new search key in each block

####### Search for 20 again

Unclustered Indexes

Often for indexing other attributes than primary key
Can it be sparse?

Secondary

Summary Clustered vs. Unclustered Index

Index
entries

(Index File)

(Data file)

Data Records

Index
entries

Data Records
CLUSTERED UNCLUSTERED

Space

Look up time

Dense Sparse
Clustered
Unclustered

An Index is a Function!

f(what: key) = where: file block

######## 34

B+ Trees

Intuition:
- The index can be very large.
- Index of index?
- Index of index of index?
- How best to create such a multi-level index?
B+ trees:
- Textbook refers to B+ trees (a popular variant) as B-trees (as most people do)

####### Focus on the dense version:

####### applies to clustered and unclustered settings

######## 36

B+ Trees are trees with nodes with keys and pointers to:
- Other nodes [if the node is an internal node]
- Data Records [if the node is a leaf]

B+ Trees Basics

######## 30 120 240

######## [X , 30) [30, 120) [120, 240) [240, Y)

######## 40 50 60

####### next leaf

Data records:

Internal node:
Leaf:

# pointers = # keys + 1

Parameter d = the degree; n = max keys
When n is even [this is our focus for simplicity]
- each node has [d, 2d] keys (except root); n = 2d
At least half full at all times
- d is the minimum amount it needs to be full.

B+ Trees Basics

B+ Tree Example

######## 39

######## 80

######## 20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

######## 10 15 18 20 30 40 50 60 65 80 85 90

####### d = 2

####### n = 2d = 4

####### Root can have 1 or more filled in keys

####### Rest have at least d

B+ Tree Design

How large should d be?
Example:
- Key size = 4 bytes
- Pointer size = 8 bytes
- Block size = 4096 byes
2d x 4 + (2d+1) x 8 <= 4096
d = 170; 2d = 340

####### So up to 340 records in leaf blocks

B+ Trees in Practice

Typical d: 100. Typical fill-factor: 66.5%.

average fanout= 66.5 * 2 =133

Typical capacities:

Height 4: 133^4 = 312,900,700 records
Height 3: 133^3 = 2,352,637 records

Can often hold top levels in main memory:

Level 1 = 1 page = 8 Kbytes
Level 2 = 133 pages = 1 Mbyte
Level 3 = 17,689 pages = 133 MBytes

Searching a B+ Tree

######## 42

Exact key values:

Start at the root;
Proceed down to the leaf

####### Select name

####### From people

####### 80 Where age = 25

######## 20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

######## 10 15 18 20 30 40 50 60 65 80 85 90

Searching a B+ Tree

######## 43

Range queries:

As above
Then sequential traversal using next leaf pointers
80

######## 20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

######## 10 15 18 20 30 40 50 60 65 80 85 90

####### Select name

####### From people

####### Where 20 <= age

####### and age <= 70

Insertion in a B+ Tree

######## 80

######## 20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

######## 10 15 18 20 30 40 50 60 65 80 85 90

####### Insert K=19

####### Assume d=2.

######## DATA BLOCKS

Insertion in a B+ Tree

######## 45

######## 80

######## 20 60 100 120 140

10 15 18 19 20 30 40 50 60 65 80 85 90

######## 10 15 18 19 20 30 40 50 60 65 80 85 90

####### After insertion

Insertion in a B+ Tree

######## 46

######## 80

######## 20 60 100 120 140

10 15 18 19 20 30 40 50 60 65 80 85 90

######## 10 15 18 19 20 30 40 50 60 65 80 85 90

####### Now insert 25

Insertion in a B+ Tree

######## 47

######## 80

######## 20 60 100 120 140

10 15 18 19 20 25 30 40 50 60 65 80 85 90

######## 10 15 18 19 20 25 30 40 60 65 80 85 90

####### After insertion

######## 50

Insertion in a B+ Tree

######## 48

######## 80

######## 20 60 100 120 140

10 15 18 19 20 25 30 40 50 60 65 80 85 90

######## 10 15 18 19 20 25 30 40 60 65 80 85 90

####### But now have to split!

######## 50

Insertion in a B+ Tree

######## 49

######## 80

######## 20 30 60 100 120 140

10 15 18 19 20 25 60 65 80 85 90

######## 10 15 18 19 20 25 30 40 60 65 80 85 90

####### After the split

######## 50

30 40 50

Deletion from a B+ Tree

######## 50

######## 80

######## 20 30 60 100 120 140

10 15 18 19 20 25 60 65 80 85 90

######## 10 15 18 19 20 25 30 40 60 65 80 85 90

####### Delete 30

######## 50

30 40 50

Deletion from a B+ Tree

######## 51

######## 80

######## 20 30 60 100 120 140

10 15 18 19 20 25 60 65 80 85 90

######## 10 15 18 19 20 25 40 60 65 80 85 90

####### After deleting 30

######## 50

40 50

May change to
40, or not

Deletion from a B+ Tree

######## 52

######## 80

######## 20 30 60 100 120 140

10 15 18 19 20 25 60 65 80 85 90

######## 10 15 18 19 20 25 40 60 65 80 85 90

####### Now delete 25

######## 50

40 50

Deletion from a B+ Tree

######## 53

######## 80

######## 20 30 60 100 120 140

10 15 18 19 20 60 65 80 85 90

######## 10 15 18 19 20 40 60 65 80 85 90

####### After deleting 25

####### Need to rebalance

####### Rotate

######## 50

40 50

####### Rotation in general can involve

####### either sibling, but here only the

####### left sibling can help

Deletion from a B+ Tree

######## 54

######## 80

######## 20 30 60 100 120 140

10 15 18 19 20 60 65 80 85 90

######## 10 15 18 19 20 40 50 60 65 80 85 90

40 50

Deletion from a B+ Tree

######## 55

######## 80

######## 19 30 60 100 120 140

10 15 18 19 20 60 65 80 85 90

######## 10 15 18 19 20 40 50 60 65 80 85 90

40 50

Deletion from a B+ Tree

######## 56

######## 80

######## 19 30 60 100 120 140

10 15 18 19 20 60 65 80 85 90

######## 10 15 18 19 20 40 60 65 80 85 90

####### Now delete 40

######## 50

40 50

Deletion from a B+ Tree

######## 57

######## 80

######## 19 30 60 100 120 140

10 15 18 19 20 60 65 80 85 90

######## 10 15 18 19 20 60 65 80 85 90

####### After deleting 40

####### Rotation not possible

######## 50

Need to merge nodes

Deletion from a B+ Tree

######## 58

######## 80

######## 19 60 100 120 140

10 15 18 19 20 50 60 65 80 85 90

######## 10 15 18 19 20 60 65 80 85 90

####### Final tree

######## 50

Efficiency

####### Operation Run time

####### Search

####### Insert

####### Delete

Next Lecture

Finish up the remaining part
HashTables
Learned Indexes (if time allows)

Transaction

Management &

Indexing

Announcements

Todays lecture

Locking and Serializability

Well-Formed, Two-Phased Transactions

Two-Phased Locking Theorem

Two Phase Locking Protocol (2PL)

Strict Two Phase Locking Protocol

(S2PL)

Which of these schedules can arise

under strict 2PL

Strict 2PL guarantees serializability

Summary of TM

Outline

TM: theory of serializability

So far, weve been talking about

databases in abstract terms

How are they implemented?

Storage

Simplified Computer Architecture

Cost of Accessing Data on Disk

Block size vs. record size

OK. So how do we do simple stuff?

Outline

TM: theory of serializability

Storage

Indexing

Indexes in databases

Some terminology

index blocks < # data blocks.

Characteristics of Indexes

Ex: Clustered, Dense Index

Clustered, Sparse Index

What if there are duplicate keys?

Clustered Index with Duplicate Keys

Clustered Index with Duplicate Keys

Clustered Index with Duplicate Keys

Unclustered Indexes

Summary Clustered vs. Unclustered Index

An Index is a Function!

f(what: key) = where: file block

B+ Trees

B+ Trees

B+ Trees Basics

B+ Trees Basics

B+ Tree Example

B+ Tree Design

B+ Trees in Practice

Typical d: 100. Typical fill-factor: 66.5%.

average fanout= 66.5 * 2 =133

Typical capacities:

Can often hold top levels in main memory:

Searching a B+ Tree

Searching a B+ Tree

Insertion in a B+ Tree

Insertion in a B+ Tree

Insertion in a B+ Tree

Insertion in a B+ Tree

Insertion in a B+ Tree

Insertion in a B+ Tree

Deletion from a B+ Tree

Deletion from a B+ Tree

Deletion from a B+ Tree

Deletion from a B+ Tree

Deletion from a B+ Tree

Deletion from a B+ Tree

Deletion from a B+ Tree

Deletion from a B+ Tree

Deletion from a B+ Tree

Efficiency

Next Lecture

发表评论 取消回复

发表评论取消回复