top of page

Implementing a Lakehouse with Microsoft Fabric

  • Writer: Harini Mallawaarachchi
    Harini Mallawaarachchi
  • Jan 24, 2024
  • 1 min read


Fabric is a Unified SAAS Platform for all your analytical needs.


Learn about Microsoft Fabric, the analytics platform for the era of AI. Connect, ingest, store, and report on data with Data Factory, notebooks, lakehouses, data warehouses, and Power BI. Whether you are a data analyst, data engineer, or analytics engineer, Fabric helps you upskill and advance your career.







  • Data Factory: Data integration combining Power Query with the scale of Azure Data Factory to move and transform data.

  • Synapse Data Engineering: Data engineering with a Spark platform for data transformation at scale.

  • Synapse Data Warehouse: Data warehousing with industry-leading SQL performance and scale to support data use.

  • Synapse Data Science: Data science with Azure Machine Learning and Spark for model training and execution tracking in a scalable environment.

  • Synapse Real-Time Analytics: Real-time analytics to query and analyze large volumes of data in real-time.

  • Power Bl: Business intelligence for translating data to decisions.

  • Data Activator: Real-time detection and monitoring of data that can trigger notifications and actions when it finds specified patterns in data.


All the above components are build on top on OneLake.




Fabric is a perfect tool which can be used by different personnels. Previously was just Data Analysts & Data Consumers. But now Data Engineers, Data Scientists, etc...





Enable Fabric

Admin has access over the Fabric settings whether to enabled/disabled.




Admin has Control Over the settings

Enable Microsoft Fabric for your organization - Microsoft Fabric | Microsoft Learn





No need for an azure subscription to use Fabric.

But a user with azure subscription can buy the fabric capacities.


Create Fabric Capacity


After creating a fabric capacity, the below will be enabled and show a list of fabric capacities available.



For more information about the pricing Tier visit the microsoft site Microsoft Fabric - Pricing | Microsoft Azure.


OneLake

Fabric is a unified software-as-a-service (SaaS) offering, with all your data stored in a single open format in OneLake. Behind the scenes this may be deployed in many ADLS accounts depending on the region. The APIs are the same as in ADLS but have only a very small difference.


OneLake is Fabric's lake-centric architecture that provides a single, integrated environment for data professionals and businesses to collaborate on data projects. Fabric's OneLake architecture facilitates collaboration between data team members and saves time by eliminating the need to move and copy data between different systems and teams.


The default storage format for Fabric's OneLake is Delta.


OneLake is built on top of Azure Data Lake Storage (ADLS) and data can be stored in any format, including Delta, Parquet, CSV, JSON, and more.


One Copy is a key component of OneLake that allows you to read data from a single copy, without moving or duplicating data.


Shortcuts


Lakehouse - The OneDrive for your Data

An analytical store that combines the file storage flexibility of a data lake with the SQL-based query capabilities of a data warehouse.






OneLake Explorer (Preview)






Explore, transform, and visualize data in the lakehouse






Prepare to use Apache Spark

You can do data engineering tasks at scale. Work will be distributed in the cluster and will be done parallel.



Billing and utilization reporting in Fabric Spark - Microsoft Fabric | Microsoft Learn



Run Spark in Fabric

To work with spark clusters, you need to work with notebooks or spark job definition.


Load data in a Spark Dataframe
Transform
Partition the output

why? to improve the scalability and performance






Work with Delta Lake tables in Microsoft Fabric


Allow time travel & versioning.
















Comments


bottom of page