DP-601: Implementing a Lakehouse with Microsoft Fabric

Course Overview

This course is designed to build your foundational skills in data engineering on Microsoft Fabric, focusing on the Lakehouse concept. This course will explore the powerful capabilities of Apache Spark for distributed data processing and the essential techniques for efficient data management, versioning, and reliability by working with Delta Lake tables. This course will also explore data ingestion and orchestration using Dataflows Gen2 and Data Factory pipelines.

This course includes a combination of lectures and hands-on exercises that will prepare you to work with lakehouses in Microsoft Fabric.

Key Learning Areas

Describe end-to-end analytics in Microsoft Fabric
Describe core features and capabilities of lakehouses in Microsoft Fabric
Create a lakehouse
Ingest data into files and tables in a lakehouse
Query lakehouse tables with SQL
Configure Spark in a Microsoft Fabric workspace
Identify suitable scenarios for Spark notebooks and Spark jobs
Use Spark dataframes to analyze and transform data
Use Spark SQL to query data in tables and views
Visualize data in a Spark notebook
Understand Delta Lake and delta tables in Microsoft Fabric
Create and manage delta tables using Spark
Use Spark to query and transform data in delta tables
Use delta tables with Spark structured streaming
Describe Dataflow (Gen2) capabilities in Microsoft Fabric
Create Dataflow (Gen2) solutions to ingest and transform data
Include a Dataflow (Gen2) in a pipeline
Describe pipeline capabilities in Microsoft Fabric
Use the Copy Data activity in a pipeline
Create pipelines based on predefined templates
Run and monitor pipelines

Course Outline

Introduction to End-to-End Analytics using Microsoft Fabric

Explore end-to-end analytics with Microsoft Fabric
Data teams and Microsoft Fabric
Enable and use Microsoft Fabric

Get Started with Lakehouses in Microsoft Fabric

Explore the Microsoft Fabric Lakehouse
Work with Microsoft Fabric Lakehouses
Explore and transform data in a lakehouse

Use Apache Spark in Microsoft Fabric

Prepare to use Apache Spark
Run Spark code
Work with data in a Spark dataframe
Work with data using Spark SQL
Visualize data in a Spark notebook

Work with Delta Lake Tables in Microsoft Fabric

Understand Delta Lake
Create delta tables
Work with delta tables in Spark
Use delta tables with streaming data

Ingest Data with Dataflows Gen2 in Microsoft Fabric

Understand Dataflows Gen2 in Microsoft Fabric
Explore Dataflows Gen2 in Microsoft Fabric
Integrate Dataflows Gen2 and Pipelines in Microsoft Fabric

Use Data Factory Pipelines in Microsoft Fabric

Understand pipelines
Use the Copy Data activity
Use pipeline templates
Run and monitor pipelines

Who Benefits

The primary audience for this course is data professionals who are familiar with data modeling, extraction, and analytics. It is designed for professionals who are interested in gaining knowledge about Lakehouse architecture, the Microsoft Fabric platform, and how to enable end-to-end analytics using these technologies.

Prerequisites

You should be familiar with basic data concepts and terminology.

Want this course for your team?

Atmosera can provide this course virtually or on-site. Please reach out to discuss your requirements.

Atmosera is thrilled to announce that we have been named GitHub AI Partner of the Year.