Implementing a Lakehouse with Microsoft Fabric
- Harini Mallawaarachchi
- Jan 24, 2024
- 1 min read
Fabric is a Unified SAAS Platform for all your analytical needs.
Learn about Microsoft Fabric, the analytics platform for the era of AI. Connect, ingest, store, and report on data with Data Factory, notebooks, lakehouses, data warehouses, and Power BI. Whether you are a data analyst, data engineer, or analytics engineer, Fabric helps you upskill and advance your career.
Data Factory: Data integration combining Power Query with the scale of Azure Data Factory to move and transform data.
Synapse Data Engineering: Data engineering with a Spark platform for data transformation at scale.
Synapse Data Warehouse: Data warehousing with industry-leading SQL performance and scale to support data use.
Synapse Data Science: Data science with Azure Machine Learning and Spark for model training and execution tracking in a scalable environment.
Synapse Real-Time Analytics: Real-time analytics to query and analyze large volumes of data in real-time.
Power Bl: Business intelligence for translating data to decisions.
Data Activator: Real-time detection and monitoring of data that can trigger notifications and actions when it finds specified patterns in data.
All the above components are build on top on OneLake.
Fabric is a perfect tool which can be used by different personnels. Previously was just Data Analysts & Data Consumers. But now Data Engineers, Data Scientists, etc...
Enable Fabric
Admin has access over the Fabric settings whether to enabled/disabled.
Enable Microsoft Fabric for your organization - Microsoft Fabric | Microsoft Learn
No need for an azure subscription to use Fabric.
But a user with azure subscription can buy the fabric capacities.
Create Fabric Capacity
After creating a fabric capacity, the below will be enabled and show a list of fabric capacities available.
For more information about the pricing Tier visit the microsoft site Microsoft Fabric - Pricing | Microsoft Azure.
OneLake
Fabric is a unified software-as-a-service (SaaS) offering, with all your data stored in a single open format in OneLake. Behind the scenes this may be deployed in many ADLS accounts depending on the region. The APIs are the same as in ADLS but have only a very small difference.
OneLake is Fabric's lake-centric architecture that provides a single, integrated environment for data professionals and businesses to collaborate on data projects. Fabric's OneLake architecture facilitates collaboration between data team members and saves time by eliminating the need to move and copy data between different systems and teams.
The default storage format for Fabric's OneLake is Delta.
OneLake is built on top of Azure Data Lake Storage (ADLS) and data can be stored in any format, including Delta, Parquet, CSV, JSON, and more.
One Copy is a key component of OneLake that allows you to read data from a single copy, without moving or duplicating data.
Shortcuts
Lakehouse - The OneDrive for your Data
An analytical store that combines the file storage flexibility of a data lake with the SQL-based query capabilities of a data warehouse.
URL
https://onelake.dfs.fabric.microsoft.com/<<tenant_Id>>/<<workspace_Id>>/Fi1es/Rea1Estate/Sacramento Real Estate Transactions . csv
Prepare to use Apache Spark
You can do data engineering tasks at scale. Work will be distributed in the cluster and will be done parallel.
Billing and utilization reporting in Fabric Spark - Microsoft Fabric | Microsoft Learn
Comments