Azure Synapse Analytics is a cloud-based enterprise analytics platform for end-to-end data analytics by Microsoft Azure. It combines data warehousing and big data processing capabilities, allowing organizations to integrate, analyze, and gain insights from large volumes of structured and unstructured data. It supports data ingestion, preparation, and integration while providing scalability, security, and collaboration features for effective data analytics.
This article demonstrates the following exercise. dp-203-azure-data-engineer (microsoftlearning.github.io)
In a web browser, sign into the Azure portal at https://portal.azure.com.
Use the [>_] button to the right of the search bar at the top of the page to create a new Cloud Shell in the Azure portal.
3. In PowerShell run the below codes. A prompt will appear to add an SQLDB login password.
rm -r dp-203 -f
git clone https://github.com/MicrosoftLearning/dp-203-azure-data-engineer dp-203
cd dp-203/Allfiles/labs/01
./setup.ps1
Explore Synapse Studio
Synapse Studio is a web-based portal in which you can manage and work with the resources in your Azure Synapse Analytics workspace.
Data page, there are two tabs containing data sources.
- Workspace tab containing databases defined in the workspace (including dedicated SQL databases and Data Explorer databases)
- Linked tab containing data sources that are linked to the workspace, including Azure Data Lake storage.
Develop page, is where you can define scripts and other assets used to develop data processing solutions.
Integrate page, to manage data ingestion and integration assets; such as pipelines to transfer and transform data between data sources.
Monitor page, is where you can observe data processing jobs as they run and view their history.
Manage page, is where you manage the pools, runtimes, and other assets used in your Azure Synapse workspace.
Ingest data with a pipeline
To ingest data in Synapse Studio:
1. Open the Copy Data tool from the Home page in Synapse Studio.
2. Select the necessary settings for the source, including source type, connection, and authentication.
3. Configure the file format and settings for the destination, such as destination type and folder path.
4. Enter task details like task name and description, and review the settings.
5. Deploy the pipeline and monitor its progress in the Pipeline runs tab of the Monitor page.
6. Verify the presence of the pipeline named "Copy products" in the Integrate page.
View the ingested data
On the Data page, select the Linked tab and expand the synapsexxxxxxx (Primary) datalake container hierarchy until you see the files file storage for your Synapse workspace. Then select the file storage to verify that a folder named product_data containing a file named products.csv has been copied to this location, as shown here:
Use a serverless SQL pool to analyze data
In Synapse Studio, right-click the products.csv file in the file storage for your Synapse workspace, point to New SQL script, and select Select TOP 100 rows.
Use a Spark pool to analyze data
While SQL is a common language for querying structured datasets, many data analysts find languages like Python useful to explore and prepare data for analysis. In Azure Synapse Analytics, you can run Python (and other) code in a Spark pool; which uses a distributed data processing engine based on Apache Spark.
Delete Azure resources
Now that you’ve finished exploring Azure Synapse Analytics, you should delete the resources you’ve created to avoid unnecessary Azure costs.
Comments