Cube, The Semantic Layer
Data democratization is the term which is making the buzz all around the Business. Data is never restricted to a subset of users or groups like Data Engineers, Data Analysts or Data scientists.
According to Collibra
Data democratization is when an organization makes data accessible to all employees and stakeholders, and educates them on how to work with data, regardless of their technical background.
Why do we need to Democratize the Data
Usage of data in different areas of the organization is different.
Every team has their own set of requirements and perception of the data. By letting these teams to use the data and educating them about the different areas and aspects of it would make everyone in the team more self reliable and more agnostic.
We are basically removing all the middlemen from the equation and all the team go and query the data.
Semantic Layer
Semantic layer is also known as a Translation layer that lies between the your data and the business stakeholders.
Semantic Layer helps the business users to use the data directly without having to know any of the nuances of the data technical terms.
Consider you have 100s of tables representing the dimension of a product and its sales. But business might only be asking simple questions like total sales and the items. Here by using the Semantic layer, you remove all the complexity of managing and cleaning of the data and only exposing the set of data which is required and keeping it as simple as possible.
Semantic layer keeps all the complexity withing by integrating the data with different sources like API, Database, Data warehouse etc, and exposing the given data in a simple term to the business users.
Ref: https://airbyte.com/blog/the-rise-of-the-semantic-layer-metrics-on-the-fly
Cube
Cube is an API-first, four-part semantic layer that enables data engineers and developers to make their data consistent, secure, performant, and accessible across every application.
Source: https://cube.dev/?ref=github-readme
Cube, is nothing but a Semantic Layer as we discussed in the beginning.
Cube is built to help the business to be able to access the data without having to know any of the complex data terms.
Cube helps you to connect to most of the Data sources like Snowflake, BigQuery, Redshift, Postgres etc and build the semantic layer on top of it.
Cube exposes the data using REST, GraphQL, and SQL APIs and also it controls the user accesses.
Cube has the built in charting library using which you can build the visualization on top of the given datasets.
Features
Data Modeling
Using Cube you can model your data. Semantic layer helps you to pull the given data into a perspecitve.
Once the data is ready the very next step is to model your data at different dimenttions and adding any measures.
Example for Product table:
{
"order": {
"products.count": "desc"
},
"dimensions": [
"products.name"
],
"measures": [
"products.count"
],
"timeDimensions": [
{
"dimension": "products.created_at"
}
],
"filters": [
{
"member": "products.count",
"operator": "lte",
"values": [
"100"
]
}
]
}
Query Format:
SELECT
"products".name "products__name",
count("products".id) "products__count"
FROM
public.products AS "products"
GROUP BY
1
HAVING
(count("products".id) <= $ 1)
ORDER BY
2 DESC
LIMIT
10000
Caching
Cube supports 2 types of caching.
- In-memory: which is enabled by default. When the same data is accessed multiple times, the given dataset is cached for future requests for a quick response to the queries.
- pre-aggregated: In here, the Cube reads the source table and runs your aggregation and stores the result in the per-aggregation schema in the source database.
name: product-agg
measures:
- products.count
dimensions:
- products.name
timeDimension: products.created_at
granularity: day
Ref: https://cube.dev/docs/caching
API Integration
Cube has rich integration of APIs to different source systems.
At a high level, Cube has an API connectivity of SQL, REST, GraphQL and orchestration API.
Ref:
Cube out of the box has the support to connect to Airflow, Dagster and Prefect
With that Cube has many more features like
Single Sign On: You can manage the user accesses and permissions.
SQL Runner: Where you can build and run your queries.
Playground: Its a web-based tool which allows executing queries and previewing their results.
CLI: to help you create new Cube services or build data models.
Monitoring: To alert for any outage or issues. Cube can also connect to Datadog or Grafana.
DEMO
Create a new folder and create a docker-compose.yaml
mkdir cube-demo
cd cube-demo
vim docker-compose.yaml
version: '2.2'
services:
cube:
image: cubejs/cube:latest
ports:
- 4000:4000
- 15432:15432
environment:
- CUBEJS_DEV_MODE=true
volumes:
- .:/cube/conf
Run the docker
docker compose up -d
Hit the URL
http://localhost:4000/
Let’s use the sample database: PostgresSQL
Host demo-db.cube.dev
Port 5432
Database ecom
Username cube
Password 12345
Build The Data Model using YAML
Let’s query the data
You can get the SQL quey for the given view
You can build the pre-aggregation
Conclusion
In the era of Data where everyone is asking questions based on the the historical performance and future implications, its very important to have the data democraised.
Semantic Layer is jut built for that. It acts as a middleman between your data and the organisation or end users. It takes away all the complecity and provides the simple solution where anyone can can business questions without having to know the data terms and nuances.
In this blog we have only provided very high level information about Semantic layer and how Cube is helping the organisation to connect your end users with the data.
I strogly recommend you to read the Cube documentation to get the full picture and the all the features provided by them in the Cube Cloud.
Ajith Shetty
Bigdata Engineer — Bigdata, Analytics, Cloud and Infrastructure.
Subscribe✉️ ||More blogs📝||LinkedIn📊||Profile Page📚||Git Repo👓
Interested in getting the weekly newsletter on the big data analytics around the world, do subscribe to my: Newsletter Just Enough Data