31. 10. 2023

What is Dataform?

Data Architecture

Until recently, it was difficult to automate complicated data pipelines using Google Analytics 4. That has changed with Dataform, a component of the Google Cloud Platform that allows automating data pipelines.

Dataform is an open-source platform that was bought by Google in 2020. Dataform appeared in Google Cloud Plaform a few months ago, and its purpose is still the same.

Dataform helps data specialists or teams build data pipelines, version code in Git, and orchestrate workflows.

What are the advantages of Dataform?

The creation of data pipelines

If you are pretty serious about data, you can’t avoid creating data pipelines. Without Dataform, creating such pipelines is complex and skeletal because you’ll probably reach for scheduled queries. You will have to manually set all dependencies according to time, and this will be time-consuming in the case of more complex pipelines.

In the case of Dataform, you just need to define individual dependencies between tables, and you are done.

Version controlling

The structure of data is constantly changing in our hands, so as data specialists, we have to change our code regularly. For this purpose, there is another great feature of Dataform. This is version control.

With this, for example, you can have the same processes set up as the developers have before releasing a change or new code.

SQLX

Dataform does not only work with SQL, but it also allows it to work in the SQLX language. Dataform does not only work with SQL, but it allows you to work in the SQLX language. SQLX is a superstructure over classic SQL that allows to use certain features of JavaScript. Specifically, JS variables and macros.

Google Cloud Platform ecosystem

Dataform’s major advantage is that it’s already part of GCP, which means we can use other services there.

Specifically, for example, you can use Cloud Router, Pub/Sub, and Workflow to trigger a Dataform calculation when an update or new table is created in BigQuery.

Pricing

Using Dataform is free. You will only pay for the amount of data processed.

What is the alternative to Dataform?

The biggest alternative to Dataform at the moment is DBT, which is especially popular in the data mining community. The reason is given by the fact that it supports SQL and Python.

Both tools have, so to speak, identical functionalities. The only difference is that DBT is a SaaS tool, i.e., it charges a fee if you want to add more than one person to it.

Our Dataform package

In Optimics, we believe in contributing to the community, so that is why we have created an open-source Dataform package. This package processes Google Analytics 4 e-commerce data in BigQuery into a flat table. This table can then be used as a data source for data visualization in Looker Studio or another visualization tool.

In addition to flat table creation, the daily increment of data is buried here, so the package will only count data for the previous day.

If you are interested in this package, check out our GitHub, for a description of the installation process.

Využijte sílu*
marketingových dat k optimálnímu prodeji

Other interesting articles

Customer Success Manager / Consultant for Digital analytics & advertising

Customer Success Manager / Consultant for Digital analytics & advertising

R&D Lead - Cloud for marketing

R&D Lead - Cloud for marketing

Account / Delivery Manager for Digital analytics & advertising

Account / Delivery Manager for Digital analytics & advertising

Get in touch

Interested to find out the real potential* of your data?

Drop us your contact details and we will
get back to you shortly.

By sending a message you agree with our Privacy Policy and GDPR.