The world of SQL and databases can seem like a huge, untamable monster. With all the different clauses, phrases, and keywords, you’d be forgiven for simply not knowing where to start.
Whilst SQL can seem like an untamable monster, once you understand the basics, that huge monster starts to shrink.
In this article, we’ll be taking you through one of the most basic and intrinsic elements of a database: Dimensions.
In SQL and in databases, dimensions is more of an umbrella term, as there are numerous names for dimensions within a database. These names usually refer to the type of data these dimensions hold or the behavior of the data within a dimension.
If you’re looking for a definition of a dimension and a brief overview of the types of dimensions, you’re in the right place!
Dimension SQL: A Definition
In the context of SQL, a dimension is a digital structure that categorizes facts and measurements to enable users to answer business questions.
There are three primary functions of dimensions: to provide grouping, labeling, and filtering of data.
Within a data warehouse, a dimension gives us the ability to structure and label information that would otherwise be unordered numerical quantities. A dimension is a data set comprised of individual data elements.
What might the real-world use of a dimension in a data warehouse be?
Well, some of the most commonly used dimensions are quantities like customers, time, and products.
Say you work for a clothing retailer and want to analyze the sale of a particular product across the chain of stores.
Given that each sales channel gathers and stores data of sales of this product, and that this data is stored in the retailer’s data warehouse, you might be asked to answer questions about the sales of the product across time.
This would be a real-world use of a dimension in a data warehouse.
As each of the numerical quantities like products, customers, and time labeled as a dimension can be interpreted to give accurate answers to sales questions.
The Types Of Dimensions
Now that we’ve established what a dimension is, it’s time to discuss the types of dimensions. As mentioned above, the types of dimensions relate to the data held within a dimension.
Slowly Changing Dimensions
As the name suggests, Slowly Changing Dimensions hold a set of data attributes that you would expect to change slowly over time (a name or address, for example).
As these attributes change over a period of time, they are combined into a slowly changing dimension data set.
Within a set of slowly changing dimensions, there are classifications of types as follows:
Type 0: Attributes that never change (Retain Original)
Type 1: An old value is overwritten by a new value of attributes (Overwrite)
Type 2: A new row is created for a new value with a start and end date/version (Add New Row)
Type 3: In a new value, a new column is created (Add New Attribute)
Type 4: A table keeps its current value, but the history is of the value is saved in a second table (Add History Table)
Type 5: Is a combination of Type 1 and Type 4 (Combined Approach = 1 + 4)
Type 6: Is a combination of Type 1, Type 2, and Type 3 (Combined Approach = 1 + 2 + 3)
Type 7: Both a surrogate and a natural key are used (Hybrid Approach)
A set of data attributes that are physically referenced in multiple database tables are grouped into a Conformed Dimension.
These references can be the same attribute, structure, definition, or domain value (plus a few more). This means that a conformed dimension is able to cut across multiple facts.
In the context of a dimension, a conformed dimension is either completely identical or strict mathematical subsets of a detailed, granular dimension.
A convenient grouping of low-cardinality flags or indicators is known as a junk dimension.
When creating an abstract dimension, flags and indicators are removed from a fact table and placed within a useful dimensional framework.
A junk dimension consists of attributes that don’t belong in a fact table, with these attributes usually being text/various flags like, for example, non-generic comments, or yes/no indicators.
Degenerate dimensions hold information that is a key, like an invoice number, ticket number, or transaction number. These keys have no attributes and as such do not join with an actual dimension table.
A degenerate dimension often plays an integral role in a fact table’s primary key.
Within the context of a database, it is typically the case that dimensions are recycled for use in multiple applications within the same database.
As an example, a “Date” dimension could also be used for “Date of Sale,” “Date of Hire,” or “Date of Delivery.” These are referred to as role-playing dimensions.
An Outrigger dimension is usually referred to as a data warehouse “antipattern.”
This is because a dimension table doesn’t reference another via a foreign key. It’s best practice to fact tables that relate to two dimensions.
When a conformed dimension contains a subset of the rows/columns of the original dimension, this is known as a shrunken dimension.
Calendar Data Dimension
A Calendar Data Dimension is a very special kind of dimension that is used to show dates with a granularity of a day.
As we’ve mentioned at the beginning of this article, SQL and dimensions can take some time and practice to learn and perfect.
As the use of databases and the data within them is growing year on year, it stands you in good stead to learn the fundamentals of SQL and databases as the application of data is permeating an increasing number of job roles.