A dimension table is one of the 2 primary building blocks of a dimensional data model. The other primary type of table is the fact table. This post will describe 7 best practices for designing a dimension table.
What is the purpose of a Dimension Table
A dimension table is the primary way to filter and group results from a fact table. A fact table is almost useless by itself, as you need to give some descriptive attributes to provide context to the data. Imagine if your fact table said that you had $1 million in sales. Is this good or bad? What are the sales by year and region? Only with this descriptive data from your dimension tables does the $1 million in sales give actionable meaning. The attributes in a dimension table provide the descriptive elements that appear on a report or dashboard to provide needed context.
Use surrogate keys
The primary key for all dimension tables should be a single meaningless integer. The primary purpose of this surrogate key is to track slowly changing dimension (SCD) attributes changes. The surrogate key also helps to protect the data warehouse from changes to operational keys in the source systems. A bonus of this structure is that it also helps the performance of querying your dimensional data model because databases love to join tables based on single integer keys. The only reason you should add meaning to the primary key is if you need it for partitioning a fact table for performance. One example would be to smart number the primary key of the date dimension so that you can partition the fact tables by year and month.
Translate operational codes into meaning
All operational codes should be decomposed into something that everyone understands. If you have an operational code that is HCR, most people would not know what it means. But if you translate that into 3 attributes the data will be more descriptive:
- Entity (1st character) (H = Hospital)
- Type (2nd character) (C = Cost Center)
- Leader (3rd character) (R = Dr. Roberts)
Now you will know that this dimension is part of the Hospital entity, is of type Cost Center, and is led by Dr. Roberts. This will allow you to do analysis for all hospital entities combined by using a single attribute.
Flatten hierarchical relationships
Hierarchical relationships in your dimension should be flattened over multiple attributes so that they are easy and efficient to query against. Multiple different hierarchies in a dimension table are allowed and often appear. Here is one example on how to flatten peoplesoft summer trees. Other enterprise resource planning systems will have similar hierarchical constructs.
Flags and Indicators
Do not display dimension attribute values as TRUE or FALSE, or 0 or 1. Instead create textual values for these indicators and flags. If you have an indicator for Current Fund, you should use 2 values Current Fund, and Not Current Fund. This will help users understand the data more easily instead of having to remember that current fund is TRUE and not current fund is FALSE.
Attributes in a dimension table that don’t have a discrete value should list a default value or an indication that no value exists. Resist the temptation to include a null or blank value in dimension attributes. The users of your data warehouse will thank you.
Resist the temptation to normalize your dimension table. Normalizing dimensions is a performance reducing activity. You will save database space, but a data warehouse is built for performance and ease of querying data. In addition, dimension tables typically have fewer rows than fact tables. An exception to this rule is when using outrigger tables. When to use outrigger tables is beyond the scope of this post.
Dimension tables should be used across the data warehouse architecture. If you develop a Product dimension, make sure that you use the same dimension for all your fact tables that have Product. This will help to decrease the time it takes to build other subject areas and fact tables.
For more information about designing dimension tables I highly suggest you buy The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling by Ralph Kimball and Margy Ross.
If you have a best practice for designing a dimension table I have not listed, please share in the comments.