Storage & Indexing 2

ICS 321 Data Storage & Retrieval Overview of Storage
& Indexing (ii) Prof. Lipyeow Lim InformaBon & Computer Science Department University of Hawaii at Manoa 1 Lipyeow Lim -‐-‐ University of Hawaii at Manoa

Indexes •  An index on a file speeds up
selecBons on the search key fields for the index. –  Any subset of the fields of a relaBon can be the search key for an index on the relaBon. –  Search key is not the same as key (minimal set of fields that uniquely idenBfy a record in a relaBon). •  An index contains a collecBon of data entries, and supports efficient retrieval of all data entries k* with a given key value k. –  A data entry is usually in the form <key, rid> –  Given data entry k*, we can find record with key k in at most one disk I/O. (Details soon …) Lipyeow Lim -‐-‐ University of Hawaii at Manoa 2

B+ Tree Indexes •  Leaf pages contain data entries,
and are chained (prev & next) •  A data entry typically contain a key value and a rid. •  Non-‐leaf pages have index entries; only used to direct searches: Lipyeow Lim -‐-‐ University of Hawaii at Manoa 3 Non-leaf Pages Pages (Sorted by search key) Leaf P 0 K 1 P 1 K 2 P 2 K m P m index entry

Example B+ Tree •  Find 28*? 29*? All >
15* and < 30* •  Insert/delete: Find data entry in leaf, then change it. Need to adjust parent someBmes. –  And change someBmes bubbles up the tree Lipyeow Lim -‐-‐ University of Hawaii at Manoa 4 2* 3* Root 17 30 14* 16* 33* 34* 38* 39* 13 5 7* 5* 8* 22* 24* 27 27* 29* Entries <= 17 Entries > 17 Note how data entries in leaf level are sorted

Point Queries using B+ Trees •  Use index to
find 30* •  Request tuple from buffer manager •  If not in bufferpool, fetch page from disk Lipyeow Lim -‐-‐ University of Hawaii at Manoa 5 2* 3* Root 17 30 14* 16* 33* 34* 38* 39* 13 5 7* 5* 8* 22* 24* 27 27* 29* Entries <= 17 Entries > 17 SELECT * FROM Employees WHERE age=30 Assume heap file data storage

Range Queries using B+ Trees •  Use index to
find 30* •  For each data entry to the right of 30* •  Request tuples from buffer manager •  If not in bufferpool, fetch page from disk Lipyeow Lim -‐-‐ University of Hawaii at Manoa 6 2* 3* Root 17 30 14* 16* 33* 34* 38* 39* 13 5 7* 5* 8* 22* 24* 27 27* 29* Entries <= 17 Entries > 17 SELECT * FROM Employees WHERE age>30 Assume heap file data storage

Hash-‐Based Indexes •  Index is a collecBon of buckets
that contain data entries –  Bucket = primary page plus zero or more overﬂow pages. •  Hashing func=on h: h(r) = bucket in which (data entry for) record r belongs. h looks at the search key ﬁelds of r. •  No “index entries” in this scheme. Lipyeow Lim -‐-‐ University of Hawaii at Manoa 7 H1(age) File Value for age Hash Index on Age 30 45 56 41 20 25 20 25 Overflow page

Index ClassiﬁcaBons •  What should be in a Data
Entry k* ? –  PossibiliBes: •  The data record itself with key value k •  <k, rid of data record with key value k> •  <k, list of rids of data records with key value k> –  Variable size data entries –  Applies to any indexing technique •  Primary vs Secondary –  Primary index : search key contains primary key –  Unique Index : search key contains candidate key •  Clustered vs unclustered –  Clustered index: order of data records same or close to order of data entries Lipyeow Lim -‐-‐ University of Hawaii at Manoa 8

Clustered vs Unclustered Index •  Suppose data records are
stored in a Heap file. –  To build clustered index, first sort the Heap file (with some free space on each page for future inserts). –  Overflow pages may be needed for inserts. (Thus, order of data recs is `close to’, but not idenBcal to, the sort order.) Lipyeow Lim -‐-‐ University of Hawaii at Manoa 9 Index entries Data entries direct search for (Index File) (Data file) Data Records data entries Data entries Data Records CLUSTERED UNCLUSTERED

Clustered File •  An index where the data entry
contains the data record itself (cf. just the key value, RID pair). •  No heap/sorted file is used, the index IS the file of record •  Steps to build a clustered file: –  Sort data records –  ParBBon into pages –  Build the tree on the pages Lipyeow Lim -‐-‐ University of Hawaii at Manoa 10 Tree-‐based index File

Storage & Indexing 2

Storage & Indexing 2

Lipyeow

More Decks by Lipyeow

Other Decks in Education

Featured

Transcript

ICS 321 Data Storage & Retrieval Overview of Storage

Indexes •  An index on a ﬁle speeds up

B+ Tree Indexes •  Leaf pages contain data entries,

Example B+ Tree •  Find 28? 29? All >

Point Queries using B+ Trees •  Use index to

Range Queries using B+ Trees •  Use index to

Hash-‐Based Indexes •  Index is a collecBon of buckets

Index ClassiﬁcaBons •  What should be in a Data

Clustered vs Unclustered Index •  Suppose data records are

Clustered File •  An index where the data entry