Logging, Monitoring, and Observability in Google Cloud
Overview
This course teaches participants techniques for monitoring and improving infrastructure and application performance in Google Cloud. Using a combination of presentations, demos, hands-on labs, and real-world case studies, attendees gain experience with full-stack monitoring, real-time log management and analysis, debugging code in production, tracing application performance bottlenecks, and profiling CPU and memory usage.
What You'll Learn
- Explain the purpose and capabilities of Google Cloud Observability
- Implement monitoring for multiple cloud projects
- Create alerting policies, uptime checks, and alerts
- Install and manage Ops Agent to collect logs for Compute Engine
- Explain Cloud Operations for GKE
- Analyze VPC Flow Logs and firewall rules logs
- Analyze and export Cloud Audit Logs instances
- Profile and identify resource-intensive functions in an application
- Analyze resource utilization cost for monitoring related components within Google Cloud
Who Should Attend
Cloud architects, administrators, and SysOps personnel; Cloud developers and DevOps personnel
Prerequisites
To get the most out of this course, participants should meet the following requirements: Complete the Google Cloud Fundamentals: Core Infrastructure course or have equivalent experience; Have basic scripting or coding familiarity; Be proficient with command-line tools and Linux operating system environments.
Products Covered
Course Modules
Introduction to Google Cloud Observability
Learning Outcomes
- Describe the purpose and capabilities of Google Cloud Observability
- Explain the purpose of the Cloud Monitoring tool
- Explain the purpose of Cloud Logging and Error Reporting tools
- Explain the purpose of Application Performance Management tools
Activities
Monitoring critical systems
Learning Outcomes
- Use Cloud Monitoring to view metrics for multiple cloud projects
- Explain the different types of dashboards and charts that can be built
- Create an uptime check
- Explain the cloud operations architecture
- Explain and demonstrate the purpose of using Monitoring Query Language (MQL) for monitoring
Activities
Alerting policies
Learning Outcomes
- Explain alerting strategies
- Explain alerting policies
- Explain error budget
- Explain why server-level indicators (SLIs), service-level objectives (SLOs), and service-level agreements (SLAs) are important
- Identify types of alerts and common uses for each
- Use Cloud Monitoring to manage services
Activities
Advanced logging and analysis
Learning Outcomes
- Use Log Explorer features
- Explain the features and benefits of logs-based metrics
- Define log sinks (inclusion filters) and exclusion filters
- Explain how BigQuery can be used to analyze logs
- Export logs to BigQuery for analysis
- Use log analytics on Google Cloud
Activities
Working with Cloud Audit Logs
Learning Outcomes
- Explain Cloud Audit Logs
- List and explain different audit logs
- Explain the features and functionalities of the different audit logs
- List the best practices to implement audit logs
Activities
Configuring Google Cloud services for observability
Learning Outcomes
- Use the Ops Agent with Compute Engine
- Enable and use Kubernetes Monitoring
- Explain the benefits of using Google Cloud Managed Service for Prometheus
- Explain the use of PromQL to query Cloud Monitoring metrics
- Explain the uses of OpenTelemetry
- Explain custom metrics
Activities
Monitoring the Google Cloud network
Learning Outcomes
- Collect and analyze VPC Flow Logs and firewall rules logs
- Enable and monitor Packet Mirroring
- Explain the capabilities of the Network Intelligence Center
Activities
Investigating application performance issues
Learning Outcomes
- Explain the features, benefits, and functionalities of Error Reporting, Cloud Trace, and Cloud Profiler
Activities
Optimizing the costs for Google Cloud Observability
Learning Outcomes
- Analyze resource utilization cost for monitoring-related components within Google Cloud
- Implement best practices for controlling the cost of monitoring within Google Cloud
Activities
What's Not Covered
- SRE concepts
- SRE best practices
- Incident response
Get This Training
No public classes currently scheduled. Express interest below or request private training.
Course Details
- Course Code
- T-STACKD-B
- Duration
- 2 days
- Format
- ILT
- Level
- Introductory
- Modules
- 9
- Activities
- 7
- Price
- Loading...
Questions About This Course?
Contact us for custom scheduling, group discounts, or curriculum customization.
Contact Us