GOOGLE CLOUD
Logging, Monitoring, and Observability in Google Cloud
Learn how to monitor, troubleshoot, and improve your infrastructure and application performance. Guided by the principles of Site Reliability Engineering (SRE), this course features a combination of lectures, demos, hands-on labs, and real-world case studies. In this course, you'll gain experience with full-stack monitoring, real-time log management and analysis, debugging code in production, and profiling CPU and memory usage.
Plan and implement a well-architected logging and monitoring infrastructure.
Define service level indicators (SLIs) and service level objectives (SLOs).
Create effective monitoring dashboards and alerts.
Monitor, troubleshoot, and improve Google Cloud infrastructure.
Analyze and export Google Cloud audit logs.
Find production code defects, identify bottlenecks, and improve performance.
Optimize monitoring costs.
Beginner
3 x 8 hour sessions
Delivered in English
Google Cloud Platform Fundamentals: Core Infrastructure
or equivalent experience
Basic scripting or coding familiarity
Proficiency with command-line tools and Linux operating
system environments
Cloud architects, administrators, and SysOps personnel
Cloud developers and DevOps personne
Understand the purpose and capabilities of Google Cloud operations-focused components: Logging, Monitoring, Error Reporting, and Service Monitoring
Understand the purpose and capabilities of Google Cloud application performance management focused components: Debugger, Trace, and Profiler
Construct a monitoring base on the four golden signals: latency, traffic, errors, and saturation
Measure customer pain with SLIs
Define critical performance measures
Create and use SLOs and SLAs
Achieve developer and operation harmony with error budgets
Develop alerting strategies
Define alerting policies
Add notification channels
Identify types of alerts and common uses for each
Construct and alert on resource groups
Manage alerting policies programmatically
Choose best practice monitoring project architectures
Differentiate Cloud IAM roles for monitoring
Use the default dashboards appropriately
Build custom dashboards to show resource consumption and application load
Define uptime checks to track aliveness and latency
Integrate logging and monitoring agents into Compute Engine VMs and images
Enable and use Kubernetes Monitoring
Extend and clarify Kubernetes monitoring with Prometheus
Expose custom metrics through code and with the help of OpenCensus
Identify and choose among resource tagging approaches
Define log sinks (inclusion filters) and exclusion filters
Create metrics based on logs
Define custom metrics
Use Error Reporting to link application errors to Logging
Export logs to BigQuery
Collect and analyze VPC Flow logs and Firewall Rules logs.
Enable and monitor Packet Mirroring.
Explain the capabilities of Network Intelligence Center.
Use Admin Activity audit logs to track changes to the configuration or metadata of resources.
Use Data Access audit logs to track accesses or changes to user-provided resource data.
Use System Event audit logs to track GCP administrative actions.
Define incident management roles and communication channels
Mitigate incident impact
Troubleshoot root causes
Resolve incidents
Document incidents in a post-mortem process
Collect and analyze VPC Flow logs and Firewall Rules logs.
Enable and monitor Packet Mirroring.
Explain the capabilities of Network Intelligence Center.
Use Admin Activity audit logs to track changes to the configuration or metadata of resources.
Use Data Access audit logs to track accesses or changes to user-provided resource data.
Use System Event audit logs to track GCP administrative actions.
Understand Stackdriver billing
Analyze Stackdriver resource utilization
Implement best practices for Stackdriver cost control
Ref: T-STACKD-B-01
No worries. Send us a quick message and we'll be happy to answer any questions you have.