L21-Use a SQL Warehouse in Azure Databricks
- Harini Mallawaarachchi
- Dec 22, 2023
- 1 min read
SQL is an industry-standard language for querying and manipulating data. Many data analysts perform data analytics by using SQL to query tables in a relational database. Azure Databricks includes SQL functionality that builds on Spark and Delta Lake technologies to provide a relational database layer over files in a data lake.
Before you start
You'll need an Azure subscription in which you have administrative-level access.
Review the What is data warehousing on Azure Databricks? article in the Azure Synapse Analytics documentation.
View and start a SQL Warehouse
When the Azure Databricks workspace resource has been deployed, go to it in the Azure portal.
In the Overview page for your Azure Databricks workspace, use the Launch Workspace button to open your Azure Databricks workspace in a new browser tab; signing in if prompted.
View the Azure Databricks workspace portal and note that the sidebar on the left side contains the names of the task categories.
In the sidebar, under SQL, select SQL Warehouses.
Observe that the workspace already includes a SQL Warehouse named Starter Warehouse.
In the Actions (⁝) menu for the SQL Warehouse, select Edit. Then set the Cluster size property to 2X-Small and save your changes.
Use the Start button to start the SQL Warehouse (which may take a minute or two).
Note: If your SQL Warehouse fails to start, your subscription may have insufficient quota in the region where your Azure Databricks workspace is provisioned. See Required Azure vCPU quota for details. If this happens, you can try requesting for a quota increase as detailed in the error message when the warehouse fails to start. Alternatively, you can try deleting your workspace and creating a new one in a different region. You can specify a region as a parameter for the setup script like this: ./setup.ps1 eastus
Create a database schema
When your SQL Warehouse is running, select SQL Editor in the sidebar.
In the Schema browser pane, observe that the hive_metastore catalogue contains a database named default.
In the New query pane, enter the following SQL code:
Use the ►Run (1000) button to run the SQL code.
When the code has been successfully executed, in the Schema browser pane, use the refresh button at the bottom of the pane to refresh the list. Then expand hive_metastore and adventureworks, and observe that the database has been created, but contains no tables.
You can use the default database for your tables, but when building an analytical data store its best to create custom databases for specific data.
Create a table
Download the products.csv file to your local computer, saving it as products.csv.
In the Azure Databricks workspace portal, in the sidebar, select (+) New and then select File Upload and upload the products.csv file you downloaded to your computer.
In the Upload data page, select the adventureworks schema and set the table name to products. Then select Create table on the bottom left corner of the page.
When the table has been created, review its details.
The ability to create a table by importing data from a file makes it easy to populate a database. You can also use Spark SQL to create tables using code. The tables themselves are metadata definitions in the hive metastore, and the data they contain is stored in Delta format in Databricks File System (DBFS) storage.
Create a query
In the sidebar, select (+) New and then select Query.
In the Schema browser pane, expand hive_metastore and adventureworks, and verify that the products table is listed.
In the New query pane, enter the following SQL code:
Use the ►Run (1000) button to run the SQL code.
When the query has completed, review the table of results.
Use the Save button at the top right of the query editor to save the query as Products and Categories.
Saving a query makes it easy to retrieve the same data again at a later time.
Create a dashboard
In the sidebar, select (+) New and then select Dashboard.
In the New dashboard dialog box, enter the name Adventure Works Products and select Save.
In the Adventure Works Products dashboard, in the Add drop-down list, select Visualization.
In the Add visualization widget dialog box, select the Products and Categories query. Then select Create new visualization, set the title to Products Per Category, and select Create visualization.
In the visualization editor, set the following properties:
Visualization type: bar
Horizontal chart: selected
Y column: Category
X columns: Product ID : Count
Group by: Leave blank
Stacking: Disabled
Normalize values to percentage: Unselected
Missing and NULL values: Do not display in chart
Save the visualization and view it in the dashboard.
Select Done editing to view the dashboard as users will see it.
Dashboards are a great way to share data tables and visualizations with business users. You can schedule the dashboards to be refreshed periodically, and emailed to subscribers.
Comments