Database Basics

Data

Data is typically stores on

  • non-volatile memory
  • Decentralized servers (cloud)

Data has two flavours

  • Structured
    • tables with predefined columns
  • Unstructured
    • text Docs, images , videos
  • semi-structures
    • XML, JSON , webpages(maybe)

Databases are designed for structured data.

Unstructured data may be linked and then retrieved on request

Data Modeling

Defines the data space

considering the:

  • Data items to be stored
  • What are the data attributes
  • What are the relationships
  • Any attributes of the relationships

A data model can be developed

Database Management Systems ( DBMS )

Pieces of software to manage the data they will typically

  • Manage the data
  • Access the data
  • Store the data
  • Manage access controls
  • Ensure the integrity of the data

Can be viewed of as “black box” there is little need to know what happens internally just what goes in and what will come out.

Manage the data

The DBMS will attempt to mange the data to ensure easy retrieval and insertion

  • Create indexes to speed up data requests
  • Find the best way to execute a query
  • Find the best way to perform relational row operations

Data integrity

The DBMS needs to ensure that the data remains internally consistent with

  • Attempts at unauthorized changes
  • Changes and update to teh data
  • Failures, program crashes, power outages hardware failure
  • Concurrent changes happening at the same time

Schema

The conceptual schema is a layout of the logical DB,

  • Shows entities, data types, relations.constraints.
  • Hides The physical layout, how it the data stored, where is the data



Relational Data Model

Informal definition

  • Relation is a table of values made from
    • A set of rows, the data elements in each row represents certain facts that correspond to a real-world entity or relationship.
    • Each column Represents a particular characteristic or attributes
      • Gives meaning to the data
  • Key of a relation
    • Each row must me uniquely identifiable, the key of the row does this
    • Sometimes row-id or sequential numbers are assigned as keys
  • Relational Database Model
    • Based on tables, table name and attribute (name,type) pairs
    • Formulating Queries
      • Specify the table name and attribute of interest
      • Specify constraints (predicates) needed to be satisfied for data to be of interest

Formal Definition

Schema

also the description of a relation * With Attributes of \(A_1\),\(A_2\)…,\(A_n\) * where R is the name of the relation * is denoted by R(\(A_1\),\(A_2\)…,\(A_n\))

Domain

The set of all possible values that a particular attribute could be.

  • A domain may be re-used for several attributes
  • Typically contains ranged and/or format
  • The domain of name could be the set fo all strings of length 2 to 25

tuple of a relation

An ordered set of values

  • represented using (<..>)
  • Each value is within the a defined domain
  • A relation is an unordered set of such tuples

Relation State

The set of tuples currently in the relation

  • Is a subset of all the cartesian product of the domains of it’s attributes

Relation instance

A set of tuples for a relation schema

  • Denoted by r(R)

    Short hand

    given R(\(A_1\),\(A_2\)…,\(A_n\))

  • Relation Schema: R(\(A_1\),\(A_2\)…,\(A_n\))
  • Relation Name: R
  • Relation Attributes: \(A_1\),\(A_2\)…,\(A_n\)
  • r(R): {\(t_1\),\(t_2\)…,\(t_n\)}
    • is a specific state of a relation with a set of tuples
    • r(R) \(\subset\) dom(\(A_1\)) \(x\) dom(\(A_2\)) … dom(\(A_n\))
  • \(t_i\): <\(v_1\),\(v_2\)…,\(v_n\)>

Other Terminology

Database

A collection of relations

Database Schema

A set of all relation schemas in the database

Intension

  • Schema

    Extension

  • Values /data

    Metadata

  • Schema
  • Other key info about the relations

Informal to formal

  • Table = Relation
  • Column Headers = Attributes
  • Row = Tuple
  • Table Elements = Relation instance

SQL

Sql is built on Bags not sets a back is a set that can have multiple of the same value e.g

SELECT fname FROM users WHERE Sname = "Smith" will return many instances of John




Relational Database Constraints

Constraints determine the permissible states of relation instances

Explicit schema based constraints

  • Key constraints
  • Entity integrity constraints
  • Referential integrity constraints

There are other domain constraints

  • Type
  • Maximum , minimum
  • is NULL allowed

Constraints are typically at relation creation or with the CREATE TABLE statement in sql

SuperKey

The super key of a relation R is a subset SK of R such that:

  • in any valid state of r(R)
    • for an distinct tuples in r(R) \(t_1\),\(t_2\)
      • t1[SK] \(\neq\) t2[SK]

Minimal SuperKEy

Any key is a minimal superKey or candidate key, if the removal of any of the attributes would result in the key not being a superKey.

If the relation hs several candidate keys one will be chosen arbitrarily to be the primary key

Entity Integrity Constraint

The primary key cannot be null of any tuple

Any other attribute of R may or may not have null values and can be specified at the table creation.

Referential Integrity

covers cross table relationships, a table may be linked to another table, by using a foreign key (FK) to reference a primary key (PK) in another table.

The domain of the foreign key in R1 that references R2 is the set of primary keys in r(R2) \(\cup\) {NULL}.

alternatively:

  • if \(t_i\) in R1 references \(t_j\) in R2
  • then \(t_i\)[FK] = \(t_j\)[PK]
  • OR \(t_i\)[FK] is NULL

Updating with Constraints

The values in a relation instance can change on update so with each update checks are made to ensure that the update conforms to the constraints

If the update does nto meet the constraints the DBMS can:

  • Reject the update and return an error.
  • Inform the user fo the database.
  • Attempt to perform correcting updates.

Insert

When adding new tuples to the database any of the constraint could be violated.

Domain

One of the attributes may not be in the attribute domain

Key

The value of the key attribute(s) may already exist in another tuple

Referential integrity

A foreign key in the tuple may refer to a primary key value that does not exist

Entity integrity

The primary key may not be NULL for any of it’s attributes

Update

Domain

One of the attributes may not be in the attribute domain

Key

Changing the key attribute(s) may mean it already exist in another tuple

Referential integrity

A new foreign key in the tuple may refer to a primary key value that does not exist

Changing the primary key may mean a foreign key is no-longer pointint to it

Entity integrity

The primary key may not be changed to NULL

Delete

Referential integrity

Removing the primary key may mean a foreign key is pointing to an instance that does not exist

Functional Dependencies

A formal tool that allows derivation of good DB designs.

Definition

Assume

  • X,Y,Z are attributes
  • A,B,C are single attributes
  • notation ABC = {A,B,C}

Then

  • x \(\rArr\) Y
  • If the values of Y depends on the values of X
  • If two tuples of the attribute agree on a value of X then they must agree on a value of y
  • t1[X]=t2[X] \(\Arr\) t1[Y]=t2[Y]