Database Basics
Data
Data is typically stores on
- non-volatile memory
- Decentralized servers (cloud)
Data has two flavours
- Structured
- tables with predefined columns
- Unstructured
- text Docs, images , videos
- semi-structures
- XML, JSON , webpages(maybe)
Databases are designed for structured data.
Unstructured data may be linked and then retrieved on request
Data Modeling
Defines the data space
considering the:
- Data items to be stored
- What are the data attributes
- What are the relationships
- Any attributes of the relationships
A data model can be developed
Database Management Systems ( DBMS )
Pieces of software to manage the data they will typically
- Manage the data
- Access the data
- Store the data
- Manage access controls
- Ensure the integrity of the data
Can be viewed of as “black box” there is little need to know what happens internally just what goes in and what will come out.
Manage the data
The DBMS will attempt to mange the data to ensure easy retrieval and insertion
- Create indexes to speed up data requests
- Find the best way to execute a query
- Find the best way to perform relational row operations
Data integrity
The DBMS needs to ensure that the data remains internally consistent with
- Attempts at unauthorized changes
- Changes and update to teh data
- Failures, program crashes, power outages hardware failure
- Concurrent changes happening at the same time
Schema
The conceptual schema is a layout of the logical DB,
- Shows entities, data types, relations.constraints.
- Hides The physical layout, how it the data stored, where is the data
Relational Data Model
Informal definition
- Relation is a table of values made from
- A set of rows, the data elements in each row represents certain facts that correspond to a real-world entity or relationship.
- Each column Represents a particular characteristic or attributes
- Gives meaning to the data
- Key of a relation
- Each row must me uniquely identifiable, the key of the row does this
- Sometimes row-id or sequential numbers are assigned as keys
- Relational Database Model
- Based on tables, table name and attribute (name,type) pairs
- Formulating Queries
- Specify the table name and attribute of interest
- Specify constraints (predicates) needed to be satisfied for data to be of interest
Formal Definition
Schema
also the description of a relation * With Attributes of \(A_1\),\(A_2\)…,\(A_n\) * where R is the name of the relation * is denoted by R(\(A_1\),\(A_2\)…,\(A_n\))
Domain
The set of all possible values that a particular attribute could be.
- A domain may be re-used for several attributes
- Typically contains ranged and/or format
- The domain of name could be the set fo all strings of length 2 to 25
tuple of a relation
An ordered set of values
- represented using (<..>)
- Each value is within the a defined domain
- A relation is an unordered set of such tuples
Relation State
The set of tuples currently in the relation
- Is a subset of all the cartesian product of the domains of it’s attributes
Relation instance
A set of tuples for a relation schema
- Denoted by r(R)
Short hand
given R(\(A_1\),\(A_2\)…,\(A_n\))
- Relation Schema: R(\(A_1\),\(A_2\)…,\(A_n\))
- Relation Name: R
- Relation Attributes: \(A_1\),\(A_2\)…,\(A_n\)
- r(R): {\(t_1\),\(t_2\)…,\(t_n\)}
- is a specific state of a relation with a set of tuples
- r(R) \(\subset\) dom(\(A_1\)) \(x\) dom(\(A_2\)) … dom(\(A_n\))
- \(t_i\): <\(v_1\),\(v_2\)…,\(v_n\)>
Other Terminology
Database
A collection of relations
Database Schema
A set of all relation schemas in the database
Intension
- Schema
Extension
- Values /data
Metadata
- Schema
- Other key info about the relations
Informal to formal
- Table = Relation
- Column Headers = Attributes
- Row = Tuple
- Table Elements = Relation instance
SQL
Sql is built on Bags not sets a back is a set that can have multiple of the same value e.g
SELECT fname FROM users WHERE Sname = "Smith"
will return many instances of John
Relational Database Constraints
Constraints determine the permissible states of relation instances
Explicit schema based constraints
- Key constraints
- Entity integrity constraints
- Referential integrity constraints
There are other domain constraints
- Type
- Maximum , minimum
- is NULL allowed
Constraints are typically at relation creation or with the CREATE TABLE
statement in sql
SuperKey
The super key of a relation R is a subset SK of R such that:
- in any valid state of r(R)
- for an distinct tuples in r(R) \(t_1\),\(t_2\)
- t1[SK] \(\neq\) t2[SK]
- for an distinct tuples in r(R) \(t_1\),\(t_2\)
Minimal SuperKEy
Any key is a minimal superKey or candidate key, if the removal of any of the attributes would result in the key not being a superKey.
If the relation hs several candidate keys one will be chosen arbitrarily to be the primary key
Entity Integrity Constraint
The primary key cannot be null of any tuple
Any other attribute of R may or may not have null values and can be specified at the table creation.
Referential Integrity
covers cross table relationships, a table may be linked to another table, by using a foreign key (FK) to reference a primary key (PK) in another table.
The domain of the foreign key in R1 that references R2 is the set of primary keys in r(R2) \(\cup\) {NULL}.
alternatively:
- if \(t_i\) in R1 references \(t_j\) in R2
- then \(t_i\)[FK] = \(t_j\)[PK]
- OR \(t_i\)[FK] is NULL
Updating with Constraints
The values in a relation instance can change on update so with each update checks are made to ensure that the update conforms to the constraints
If the update does nto meet the constraints the DBMS can:
- Reject the update and return an error.
- Inform the user fo the database.
- Attempt to perform correcting updates.
Insert
When adding new tuples to the database any of the constraint could be violated.
Domain
One of the attributes may not be in the attribute domain
Key
The value of the key attribute(s) may already exist in another tuple
Referential integrity
A foreign key in the tuple may refer to a primary key value that does not exist
Entity integrity
The primary key may not be NULL for any of it’s attributes
Update
Domain
One of the attributes may not be in the attribute domain
Key
Changing the key attribute(s) may mean it already exist in another tuple
Referential integrity
A new foreign key in the tuple may refer to a primary key value that does not exist
Changing the primary key may mean a foreign key is no-longer pointint to it
Entity integrity
The primary key may not be changed to NULL
Delete
Referential integrity
Removing the primary key may mean a foreign key is pointing to an instance that does not exist
Functional Dependencies
A formal tool that allows derivation of good DB designs.
Definition
Assume
- X,Y,Z are attributes
- A,B,C are single attributes
- notation ABC = {A,B,C}
Then
- x \(\rArr\) Y
- If the values of Y depends on the values of X
- If two tuples of the attribute agree on a value of X then they must agree on a value of y
- t1[X]=t2[X] \(\Arr\) t1[Y]=t2[Y]