Data Modelling Patterns
Contents
- 1 Relational Schema Patterns
- 2 OLTP Patterns
- 3 OLAP Patterns
- 4 Star & Snowflake Schema Patterns
- 5 Spatial Schema Patterns
- 6 Predictive Analytics Schema
- 7 Ontology Schema Patterns
- 8 Provenance Schema Patterns
- 9 NoSQL Patterns
- 10 Process Management Schema
- 11 Search Engine Schema
- 12 User Profile Schema
- 13 Spatial Schemas
- 14 Dataset Schemas
Relational Schema Patterns
Standard Relational Schema Patterns, ER designs. Some 6 design patterns for Relational database are:
- First Normal Form (1NF)
- includes definition of a relation
- Second Normal Form (2NF)
- defined in terms of functional dependencies
- Third Normal Form (3NF)
- defined in terms of functional dependencies
- Most OTLP systems are in 3NF
- Boyce-Codd Normal Form (BCNF)
- defined in terms of functional dependencies
- Forth Normal Form (4NF)
- defined using multivalued dependencies
- Fifth Normal Form (5NF) or Project Join Normal Form (PJNF)
- defined using join dependencies
OLTP Patterns
Online Transactional Processing (OLTP) systems are characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The emphasis in OLTP Systems is on efficient and high-throughput query processing, strong data integrity in a multi-access environment. OLTP Systems are measured by number the number of transactions per second. OLTP database store detailed and current data. Transactional databases typically use the 3NF entity model. Master Data is typically stored in OLTP systems. Consider OTLP vs OLAP
- Insert Only Pattern (eg. History Tables)
OLAP Patterns
Online Analytical Processing (OLAP) are the methods used to design systems and schemas widely used in Data Mining systems, and Business Intelligence. OLAP is characterized by relatively low volume of transactions. However, queries are often very complex and involve aggregations across multiple datasets. The performance of OLAP Systems is measured by response times. (to generate a report). The typical OLAP database stores aggregated, post-processed, historical data in a multi-dimensional schema. (usually star schema, snowflake). Consider OTLP vs OLAP
A common short-cut is to join all relational tables together into one large table, frequently resulting in a huge view/table. For mid-sized data warehouses, this may be feasible, but for very large data-warehouses, data marts are needed.
Data cubes are then created from this.
There are variants of OLAP. These are:
Star & Snowflake Schema Patterns
- Ralph Kimball
- Star Schema Patterns
- Snowflake Schema Patterns
- Dimension Schema Patterns
- Inmon
Spatial Schema Patterns
There are a variety of spatial database schemas available. This data is stored differently from Spatial file-based systems.
Predictive Analytics Schema
The ability to store simulation and forecast data. This is different from when forecasted data may be replaced with actual future data.
Ontology Schema Patterns
Ontology is the science of discovering relationship between entities. In this case data.
Several identified methods are:
- String-based
- Language-based
- Linguistic-based (lexicons, thesaurus)
- Constraint-based
- Alignment-reuse
- Upper-level (logic-based ontologies)
- Graph-based
- Taxonomy-based
- Repository of structure-based
- Model-based
Ontology schemas are developed with an industry in mind. Semantics depends on ontology to derive meaning.
Ontology Schema Patterns
- ontology
- owl_instances (value?)
- owl_classes (needs subclasses)
- owl_properties
- owl_operators
Provenance Schema Patterns
Ways to determine how, when a certain dataset was derived. Provenance use caveats.
- OWL - Onotology Web Language
- PML - Provenance
- SPARQL queries on RDF
NoSQL Patterns
- NoSQL alternatives
Here are some reusable Data Schemas
Process Management Schema
Ways of managing processes within an application environment. Consider design patterns:
Search Engine Schema
Data structures related to search
User Profile Schema
Standard user profile schemas for multi-user systems.
Spatial Schemas
- See Oracle Spatial
Dataset Schemas
Specific schemas related to a dataset are noted with individual datasets. This is typically dependent on the Industry involved.