Semantic models

Semantic models are the foundation for data definition in MetricFlow, which powers the dbt Semantic Layer:

Think of semantic models as nodes connected by entities in a semantic graph.
MetricFlow uses YAML configuration files to create this graph for querying metrics.
Each semantic model corresponds to a dbt model in your DAG, requiring a unique YAML configuration for each semantic model.
You can create multiple semantic models from a single dbt model (SQL or Python), as long as you give each semantic model a unique name.
Configure semantic models in a YAML file within your dbt project directory. Refer to the best practices guide for more info on project structuring.
Organize them under a metrics: folder or within project sources as needed.

A semantic model is made up of different components: Entities, Measures, and Dimensions.

📹 Learn about the dbt Semantic Layer with on-demand video courses!

Here we describe the Semantic model components with examples:

Component	Description	Type
Name	Choose a unique name for the semantic model. Avoid using double underscores (__) in the name as they're not supported.	Required
Description	Includes important details in the description	Optional
Model	Specifies the dbt model for the semantic model using the `ref` function	Required
Defaults	The defaults for the model, currently only `agg_time_dimension` is supported.	Required
Entities	Uses the columns from entities as join keys and indicate their type as primary, foreign, or unique keys with the `type` parameter	Required
Primary Entity	If a primary entity exists, this component is Optional. If the semantic model has no primary entity, then this property is required.	Optional
Dimensions	Different ways to group or slice data for a metric, they can be `time` or `categorical`	Required
Measures	Aggregations applied to columns in your data model. They can be the final metric or used as building blocks for more complex metrics	Optional
Label	The display name for your semantic model `node`, `dimension`, `entity`, and/or `measures`	Optional
`config`	Use the `config` property to specify configurations for your metric. Supports `meta`, `group`, and `enabled` configs.	Optional

Semantic models components

The complete spec for semantic models is below:

semantic_models:
  - name: the_name_of_the_semantic_model ## Required
    description: same as always ## Optional
    model: ref('some_model') ## Required
    defaults: ## Required
      agg_time_dimension: dimension_name ## Required if the model contains measures
    entities: ## Required
      - see more information in entities
    measures: ## Optional
      - see more information in the measures section
    dimensions: ## Required
      - see more information in the dimensions section
    primary_entity: >-
      if the semantic model has no primary entity, then this property is required. #Optional if a primary entity exists, otherwise Required

You can refer to the best practices guide for more info on project structuring.

The following example displays a complete configuration and detailed descriptions of each field:

semantic_models:
  - name: transaction # A semantic model with the name Transactions
    model: ref('fact_transactions') # References the dbt model named `fact_transactions`
    description: "Transaction fact table at the transaction level. This table contains one row per transaction and includes the transaction timestamp."
    defaults:
      agg_time_dimension: transaction_date

    entities: # Entities included in the table are defined here. MetricFlow will use these columns as join keys.
      - name: transaction
        type: primary
        expr: transaction_id
      - name: customer
        type: foreign
        expr: customer_id

    dimensions: # dimensions are qualitative values such as names, dates, or geographical data. They provide context to metrics and allow "metric by group" data slicing.
      - name: transaction_date
        type: time
        type_params:
          time_granularity: day

      - name: transaction_location
        type: categorical
        expr: order_country

    measures: # Measures are columns we perform an aggregation over. Measures are inputs to metrics.
      - name: transaction_total
        description: "The total value of the transaction."
        agg: sum

      - name: sales
        description: "The total sale of the transaction."
        agg: sum
        expr: transaction_total

      - name: median_sales
        description: "The median sale of the transaction."
        agg: median
        expr: transaction_total

  - name: customers # Another semantic model called customers.
    model: ref('dim_customers')
    description: "A customer dimension table."

    entities:
      - name: customer
        type: primary
        expr: customer_id

    dimensions:
      - name: first_name
        type: categorical

Semantic models support meta, group, and enabled config property in either the schema file or at the project level:

Semantic model config in models/semantic.yml:

semantic_models:
  - name: orders
    config:
      enabled: true | false
      group: some_group
      meta:
        some_key: some_value

Semantic model config in dbt_project.yml:

semantic-models:
  my_project_name:
    +enabled: true | false
    +group: some_group
    +meta:
      some_key: some_value

For more information on dbt_project.yml and config naming conventions, see the dbt_project.yml reference page.

Name

Define the name of the semantic model. You must define a unique name for the semantic model. The semantic graph will use this name to identify the model, and you can update it at any time. Avoid using double underscores (__) in the name as they're not supported.

Description

Includes important details in the description of the semantic model. This description will primarily be used by other configuration contributors. You can use the pipe operator (|) to include multiple lines in the description.

Model

Specify the dbt model for the semantic model using the ref function.

Defaults

Defaults for the semantic model. Currently only agg_time_dimension. agg_time_dimension represents the default time dimensions for measures. This can be overridden by adding the agg_time_dimension key directly to a measure - see Dimensions for examples.

Entities

To specify the entities in your model, use their columns as join keys and indicate their type as primary, foreign, or unique keys with the type parameter.

Primary entity

MetricFlow requires that all dimensions be tied to an entity. This is to guarantee unique dimension names. If your data source doesn't have a primary entity, you need to assign the entity a name using the primary_entity: entity_name key. It doesn't necessarily have to map to a column in that table and assigning the name doesn't affect query generation.

You can define a primary entity using the following configs:

semantic_model:
  name: bookings_monthly_source
  description: bookings_monthly_source
  defaults:
    agg_time_dimension: ds
  model: ref('bookings_monthly_source')
  measures:
    - name: bookings_monthly
      agg: sum
      create_metric: true
  primary_entity: booking_id

Entity types
Sample config

Here are the types of keys:

Primary — Only one record per row in the table, and it includes every record in the data platform.
Unique — Only one record per row in the table, but it may have a subset of records in the data platform. Null values may also be present.
Foreign — Can have zero, one, or multiple instances of the same record. Null values may also be present.
Natural — A column or combination of columns in a table that uniquely identifies a record based on real-world data. For example, the sales_person_id can serve as a natural key in a sales_person_department dimension table.

This example shows a semantic model with three entities and their entity types: transaction (primary), order (foreign), and user (foreign).

To reference a desired column, use the actual column name from the model in the name parameter. You can also use name as an alias to rename the column, and the expr parameter to refer to the original column name or a SQL expression of the column.

entity:
  - name: transaction
    type: primary
  - name: order
    type: foreign
    expr: id_order
  - name: user
    type: foreign
    expr: substring(id_order FROM 2)

You can refer to entities (join keys) in a semantic model using the name parameter. Entity names must be unique within a semantic model, and identifier names can be non-unique across semantic models since MetricFlow uses them for joins.

Dimensions

Dimensions are different ways to organize or look at data. They are effectively the group by parameters for metrics. For example, you might group data by things like region, country, or job title.

MetricFlow takes a dynamic approach when making dimensions available for metrics. Instead of trying to figure out all the possible groupings ahead of time, MetricFlow lets you ask for the dimensions you need and constructs any joins necessary to reach the requested dimensions at query time. The advantage of this approach is that you don't need to set up a system that pre-materializes every possible way to group data, which can be time-consuming and prone to errors. Instead, you define the dimensions (group by parameters) you're interested in within the semantic model, and they will automatically be made available for valid metrics.

Dimensions have the following characteristics:

There are two types of dimensions: categorical and time. Categorical dimensions are for things you can't measure in numbers, while time dimensions represent dates and timestamps.
Dimensions are bound to the primary entity of the semantic model in which they are defined. For example, if a dimension called full_name is defined in a model with user as a primary entity, then full_name is scoped to the user entity. To reference this dimension, you would use the fully qualified dimension name user__full_name.
The naming of dimensions must be unique in each semantic model with the same primary entity. Dimension names can be repeated if defined in semantic models with a different primary entity.

For time groups

For semantic models with a measure, you must have a primary time group.

Measures

Measures are aggregations applied to columns in your data model. They can be used as the foundational building blocks for more complex metrics, or be the final metric itself.

Measures have various parameters which are listed in a table along with their descriptions and types.

Parameter	Description
`name`	Provide a name for the measure, which must be unique and can't be repeated across all semantic models in your dbt project.	Required
`description`	Describes the calculated measure.	Optional
`agg`	dbt supports the following aggregations: `sum`, `max`, `min`, `average`, `median`, `count_distinct`, `percentile`, and `sum_boolean`.	Required
`expr`	Either reference an existing column in the table or use a SQL expression to create or derive a new one.	Optional
`non_additive_dimension`	Non-additive dimensions can be specified for measures that cannot be aggregated over certain dimensions, such as bank account balances, to avoid producing incorrect results.	Optional
`agg_params`	Specific aggregation properties, such as a percentile.	Optional
`agg_time_dimension`	The time field. Defaults to the default agg time dimension for the semantic model.	Optional
`label`	String that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as orders_total or "orders_total"). Available in dbt version 1.7 or higher.	Optional
`create_metric`	Create a `simple` metric from a measure by setting `create_metric: True`. The `label` and `description` attributes will be automatically propagated to the created metric. Available in dbt version 1.7 or higher.	Optional

Dependencies

Metric nodes will reflect dependencies on semantic models based on their measures. However, dependencies based on filters should not be reflected in:

dbt selection syntax
Visualization of the DAGA DAG is a Directed Acyclic Graph, a type of graph whose nodes are directionally related to each other and don’t form a directional closed loop. in dbt-docs and the integrated development environment (IDE).

This is because metrics need to source nodes for their depends_on attribute from a few different places:

RATIO and DERIVED type metrics should reference Metric.type_params.input_metrics.
SIMPLE type metrics should reference Metric.type_params.measure.

For example, when you run the command dbt list --select my_semantic_model+, it will show you the metrics that belong to the specified semantic model.

But there's a condition: Only the metrics that actually use measures or derived metrics from that semantic model will be included in the list. In other words, if a metric only uses a dimension from the semantic model in its filters, it won't be considered as part of that semantic model.

Semantic models components​

Name​

Description​

Model​

Defaults​

Entities​

Primary entity​

Dimensions​

Measures​

Dependencies​

Related docs​