Skip to main content
T-GCPDE-IOfficial Google Curriculum

Data Engineering on Google Cloud

4 daysILTIntermediateLoading...

Overview

Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning. This course covers structured, unstructured, and streaming data.

What You'll Learn

  • Design scalable data processing systems in Google Cloud
  • Differentiate data architectures and implement data lakehouse and pipeline concepts
  • Build and manage robust streaming and batch data pipelines
  • Utilize AI/ML tools to optimize performance and gain process and data insights

Who Should Attend

Data Engineers, Data Analysts, and Data Architects

Prerequisites

Understanding of data engineering principles, including ETL/ELT processes, data modeling, and common data formats (Avro, Parquet, JSON). Familiarity with data architecture concepts, specifically Data Warehouses and Data Lakes. Proficiency in SQL for data querying. Proficiency in a common programming language (Python recommended). Familiarity with using Command Line Interfaces (CLI). Familiarity with core Google Cloud concepts and services (Compute, Storage, and Identity management).

Products Covered

AlloyDBBigLakeBigQueryBigtableCloud ComposerCloud Data FusionCloud LoggingCloud MonitoringDataflowDataformDataplex Universal CatalogDataprocManaged Service for Apache KafkaPub/SubServerless for Apache SparkVertexAI

Course Modules

1

Introduction to Data Engineering

Topics

  • Explore the role of a data engineer
  • Analyze data engineering use cases
  • Explore data engineering tasks
  • Enable data-driven decision making

Learning Outcomes

  • Explore the role of a data engineer
  • Analyze data engineering use cases
  • Explore data engineering tasks
  • Enable data-driven decision making

Activities

Lab: Analyzing Big Data using AI Platform Notebooks and BigQuery
2

Building a Data Lake

Topics

  • Data lakes and data warehouses
  • What is a data lake?
  • Data storage and ETL options on Google Cloud
  • Building a data lake using Cloud Storage

Learning Outcomes

  • Differentiate between data lakes and data warehouses
  • Describe data storage and ETL options on Google Cloud
  • Build a data lake using Cloud Storage

Activities

Lab: Loading Taxi Data into Cloud SQL
3

Building a Data Warehouse

Topics

  • The modern data warehouse
  • Introduction to BigQuery
  • Getting started with BigQuery
  • Loading data

Learning Outcomes

  • Describe a modern data warehouse
  • Explain BigQuery's architecture and how it supports a modern data warehouse
  • Load data into BigQuery

Activities

Lab: Loading Data into BigQueryLab: Working with JSON and Array Data in BigQuery
4

Introduction to Data Engineering on Google Cloud: Course Summary

5

Introduction to Building Batch Data Pipelines

Topics

  • EL, ELT, ETL
  • Quality considerations
  • How to carry out operations in BigQuery
  • Shortcomings

Learning Outcomes

  • Define EL, ELT, and ETL and their differences
  • Discuss quality considerations in data operations
  • Explain how to carry out operations in BigQuery
  • Describe the shortcomings of ELT
6

Executing Spark on Dataproc

Topics

  • The Hadoop ecosystem
  • Running Hadoop on Dataproc
  • GCS instead of HDFS
  • Optimizing Dataproc

Learning Outcomes

  • Discuss the Hadoop ecosystem
  • Run Hadoop on Dataproc
  • Leverage GCS instead of HDFS
  • Optimize Dataproc

Activities

Lab: Running Apache Spark jobs on Dataproc
7

Serverless Data Processing with Dataflow

Topics

  • Dataflow
  • Why customers value Dataflow
  • Building Dataflow pipelines in code
  • Key considerations with designing pipelines

Learning Outcomes

  • Describe Dataflow
  • Explain why customers value Dataflow
  • Build Dataflow pipelines in code
  • Explain key considerations with designing pipelines

Activities

Lab: A Simple Dataflow Pipeline (Python)Lab: A Simple Dataflow Pipeline (Java)
8

Manage Data Pipelines with Data Fusion and Composer

Topics

  • Building batch data pipelines visually with Cloud Data Fusion
  • Components of Cloud Data Fusion
  • Orchestrating work between GCP services with Cloud Composer
  • Apache Airflow environment
  • Monitoring and logging
  • DAGs and operators
  • Workflow scheduling

Learning Outcomes

  • Build batch data pipelines visually with Cloud Data Fusion
  • Identify components of Cloud Data Fusion
  • Orchestrate work between GCP services with Cloud Composer
  • Understand the Apache Airflow environment
  • Monitor and log workflows

Activities

Lab: Building and Executing a Pipeline Graph with Data FusionLab: An Introduction to Cloud Composer
9

Building Batch Data Pipelines on Google Cloud: Course Summary

10

Introduction to Processing Streaming Data

Topics

  • Challenges of streaming data
  • Message-oriented architectures with Pub/Sub
  • Designing streaming pipelines with Apache Beam
  • Implementing streaming pipelines on Dataflow

Learning Outcomes

  • Identify challenges of streaming data
  • Design message-oriented architectures with Pub/Sub
  • Design streaming pipelines with Apache Beam
  • Implement streaming pipelines on Dataflow

Activities

Lab: Publish Streaming Data into Pub/Sub
11

Serverless Messaging with Pub/Sub

Topics

  • Pub/Sub overview
  • Pub/Sub push vs pull
  • Publishing with Pub/Sub code

Learning Outcomes

  • Explain Pub/Sub overview
  • Differentiate Pub/Sub push and pull
  • Publish with Pub/Sub code

Activities

Lab: Streaming Data Processing: Publish Streaming Data into Pub/Sub
12

Dataflow Streaming Features

Topics

  • Streaming data challenges
  • Dataflow windowing

Learning Outcomes

  • Identify streaming data challenges
  • Use Dataflow windowing

Activities

Lab: Streaming Data Processing: Streaming Data Pipelines
13

High-Throughput BigQuery and Bigtable Streaming Features

Topics

  • BigQuery basics
  • Streaming into BigQuery and visualizing results
  • High-throughput streaming with Cloud Bigtable
  • Optimizing Cloud Bigtable performance

Learning Outcomes

  • Review BigQuery basics
  • Stream into BigQuery and visualize results
  • Implement high-throughput streaming with Cloud Bigtable
  • Optimize Cloud Bigtable performance

Activities

Lab: Streaming Data Processing: Streaming Analytics and DashboardsLab: Streaming Data Processing: Streaming Data Pipelines into Bigtable
14

Advanced BigQuery Functionality and Performance

Topics

  • Analytic window functions
  • Using WITH clauses
  • GIS functions
  • Performance considerations
  • BigQuery best practices

Learning Outcomes

  • Use analytic window functions
  • Use WITH clauses
  • Use GIS functions
  • Explain performance considerations
  • Apply BigQuery best practices

Activities

Lab: Optimizing your BigQuery Queries for Performance
15

Building Streaming Data Pipelines on Google Cloud: Course Summary

16

Introduction to Analytics and AI

Topics

  • What is AI?
  • From ad-hoc data analysis to data-driven AI
  • Options for ML models on Google Cloud
  • Pre-built AI solutions

Learning Outcomes

  • Define AI
  • Explain the progression from ad-hoc data analysis to data-driven AI
  • Identify options for ML models on Google Cloud
  • Describe pre-built AI solutions
17

BigQuery ML for Quick Model Building

Topics

  • BigQuery ML project phases
  • Key required statements for model creation
  • Supported models and when to use them
  • Evaluation metrics

Learning Outcomes

  • Explain BigQuery ML project phases
  • Identify key required statements for model creation
  • Use supported models appropriately
  • Interpret evaluation metrics

Activities

Lab: Predict Visitor Purchases Using BigQuery ML
18

Cloud AI Platform Notebooks for Deep Learning

Topics

  • Notebooks for ML development
  • Introduction to Vertex AI
  • Introduction to AutoML
  • AI solutions

Learning Outcomes

  • Use notebooks for ML development
  • Describe Vertex AI
  • Explain AutoML
  • Identify AI solutions

Activities

Lab: Running AI Models on Vertex AI
19

Production ML Pipelines

Topics

  • ML pipelines
  • Kubeflow and Kubeflow Pipelines
  • Data and model versioning
  • Continuous evaluation and monitoring
  • Design considerations

Learning Outcomes

  • Explain ML pipelines
  • Describe Kubeflow and Kubeflow Pipelines
  • Implement data and model versioning
  • Apply continuous evaluation and monitoring
  • Understand design considerations

Get This Training

No public classes currently scheduled. Express interest below or request private training.

Request Private Session

Course Details

Course Code
T-GCPDE-I
Duration
4 days
Format
ILT
Level
Intermediate
Modules
19
Activities
18
Price
Loading...
View Official Google Datasheet →

Questions About This Course?

Contact us for custom scheduling, group discounts, or curriculum customization.

Contact Us
Starting fromLoading...