Data Engineering on Google Cloud
Overview
Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning. This course covers structured, unstructured, and streaming data.
What You'll Learn
- Design scalable data processing systems in Google Cloud
- Differentiate data architectures and implement data lakehouse and pipeline concepts
- Build and manage robust streaming and batch data pipelines
- Utilize AI/ML tools to optimize performance and gain process and data insights
Who Should Attend
Data Engineers, Data Analysts, and Data Architects
Prerequisites
Understanding of data engineering principles, including ETL/ELT processes, data modeling, and common data formats (Avro, Parquet, JSON). Familiarity with data architecture concepts, specifically Data Warehouses and Data Lakes. Proficiency in SQL for data querying. Proficiency in a common programming language (Python recommended). Familiarity with using Command Line Interfaces (CLI). Familiarity with core Google Cloud concepts and services (Compute, Storage, and Identity management).
Products Covered
Course Modules
Introduction to Data Engineering
Topics
- Explore the role of a data engineer
- Analyze data engineering use cases
- Explore data engineering tasks
- Enable data-driven decision making
Learning Outcomes
- Explore the role of a data engineer
- Analyze data engineering use cases
- Explore data engineering tasks
- Enable data-driven decision making
Activities
Building a Data Lake
Topics
- Data lakes and data warehouses
- What is a data lake?
- Data storage and ETL options on Google Cloud
- Building a data lake using Cloud Storage
Learning Outcomes
- Differentiate between data lakes and data warehouses
- Describe data storage and ETL options on Google Cloud
- Build a data lake using Cloud Storage
Activities
Building a Data Warehouse
Topics
- The modern data warehouse
- Introduction to BigQuery
- Getting started with BigQuery
- Loading data
Learning Outcomes
- Describe a modern data warehouse
- Explain BigQuery's architecture and how it supports a modern data warehouse
- Load data into BigQuery
Activities
Introduction to Data Engineering on Google Cloud: Course Summary
Introduction to Building Batch Data Pipelines
Topics
- EL, ELT, ETL
- Quality considerations
- How to carry out operations in BigQuery
- Shortcomings
Learning Outcomes
- Define EL, ELT, and ETL and their differences
- Discuss quality considerations in data operations
- Explain how to carry out operations in BigQuery
- Describe the shortcomings of ELT
Executing Spark on Dataproc
Topics
- The Hadoop ecosystem
- Running Hadoop on Dataproc
- GCS instead of HDFS
- Optimizing Dataproc
Learning Outcomes
- Discuss the Hadoop ecosystem
- Run Hadoop on Dataproc
- Leverage GCS instead of HDFS
- Optimize Dataproc
Activities
Serverless Data Processing with Dataflow
Topics
- Dataflow
- Why customers value Dataflow
- Building Dataflow pipelines in code
- Key considerations with designing pipelines
Learning Outcomes
- Describe Dataflow
- Explain why customers value Dataflow
- Build Dataflow pipelines in code
- Explain key considerations with designing pipelines
Activities
Manage Data Pipelines with Data Fusion and Composer
Topics
- Building batch data pipelines visually with Cloud Data Fusion
- Components of Cloud Data Fusion
- Orchestrating work between GCP services with Cloud Composer
- Apache Airflow environment
- Monitoring and logging
- DAGs and operators
- Workflow scheduling
Learning Outcomes
- Build batch data pipelines visually with Cloud Data Fusion
- Identify components of Cloud Data Fusion
- Orchestrate work between GCP services with Cloud Composer
- Understand the Apache Airflow environment
- Monitor and log workflows
Activities
Building Batch Data Pipelines on Google Cloud: Course Summary
Introduction to Processing Streaming Data
Topics
- Challenges of streaming data
- Message-oriented architectures with Pub/Sub
- Designing streaming pipelines with Apache Beam
- Implementing streaming pipelines on Dataflow
Learning Outcomes
- Identify challenges of streaming data
- Design message-oriented architectures with Pub/Sub
- Design streaming pipelines with Apache Beam
- Implement streaming pipelines on Dataflow
Activities
Serverless Messaging with Pub/Sub
Topics
- Pub/Sub overview
- Pub/Sub push vs pull
- Publishing with Pub/Sub code
Learning Outcomes
- Explain Pub/Sub overview
- Differentiate Pub/Sub push and pull
- Publish with Pub/Sub code
Activities
Dataflow Streaming Features
Topics
- Streaming data challenges
- Dataflow windowing
Learning Outcomes
- Identify streaming data challenges
- Use Dataflow windowing
Activities
High-Throughput BigQuery and Bigtable Streaming Features
Topics
- BigQuery basics
- Streaming into BigQuery and visualizing results
- High-throughput streaming with Cloud Bigtable
- Optimizing Cloud Bigtable performance
Learning Outcomes
- Review BigQuery basics
- Stream into BigQuery and visualize results
- Implement high-throughput streaming with Cloud Bigtable
- Optimize Cloud Bigtable performance
Activities
Advanced BigQuery Functionality and Performance
Topics
- Analytic window functions
- Using WITH clauses
- GIS functions
- Performance considerations
- BigQuery best practices
Learning Outcomes
- Use analytic window functions
- Use WITH clauses
- Use GIS functions
- Explain performance considerations
- Apply BigQuery best practices
Activities
Building Streaming Data Pipelines on Google Cloud: Course Summary
Introduction to Analytics and AI
Topics
- What is AI?
- From ad-hoc data analysis to data-driven AI
- Options for ML models on Google Cloud
- Pre-built AI solutions
Learning Outcomes
- Define AI
- Explain the progression from ad-hoc data analysis to data-driven AI
- Identify options for ML models on Google Cloud
- Describe pre-built AI solutions
BigQuery ML for Quick Model Building
Topics
- BigQuery ML project phases
- Key required statements for model creation
- Supported models and when to use them
- Evaluation metrics
Learning Outcomes
- Explain BigQuery ML project phases
- Identify key required statements for model creation
- Use supported models appropriately
- Interpret evaluation metrics
Activities
Cloud AI Platform Notebooks for Deep Learning
Topics
- Notebooks for ML development
- Introduction to Vertex AI
- Introduction to AutoML
- AI solutions
Learning Outcomes
- Use notebooks for ML development
- Describe Vertex AI
- Explain AutoML
- Identify AI solutions
Activities
Production ML Pipelines
Topics
- ML pipelines
- Kubeflow and Kubeflow Pipelines
- Data and model versioning
- Continuous evaluation and monitoring
- Design considerations
Learning Outcomes
- Explain ML pipelines
- Describe Kubeflow and Kubeflow Pipelines
- Implement data and model versioning
- Apply continuous evaluation and monitoring
- Understand design considerations
Get This Training
No public classes currently scheduled. Express interest below or request private training.
Course Details
- Course Code
- T-GCPDE-I
- Duration
- 4 days
- Format
- ILT
- Level
- Intermediate
- Modules
- 19
- Activities
- 18
- Price
- Loading...
Questions About This Course?
Contact us for custom scheduling, group discounts, or curriculum customization.
Contact Us