OPENROWSET using serverless SQL pool in Azure Synapse Analytics

Harini Mallawaarachchi
Jul 13, 2023
3 min read

In Transact-SQL (T-SQL), the OPENROWSET function plays a crucial role in accessing and manipulating data from external (OLE DB) data sources. It includes all connection information that is required to access remote data from an external data source. This blog post will delve into the details of the OPENROWSET function, its syntax, capabilities, and common usage scenarios.

The OPENROWSET function in Azure Synapse Analytics provides a way to read data from an external data source. It allows you to query data from various file formats, such as CSV, JSON, and Parquet, stored in Azure Blob Storage or Azure Data Lake Storage. This function is particularly useful when you want to access data residing outside of your Azure Synapse Analytics workspace.

OPENROWSET also supports bulk operations through a built-in BULK provider that enables data from a file to be read and returned as a rowset.

Note: The OPENROWSET function is not supported in dedicated SQL pool.

Setting up the Serverless SQL Pool

To get started, you need to set up a serverless SQL pool in your Azure Synapse Analytics workspace. Follow these steps:

Open the Azure portal and navigate to your Synapse Analytics workspace.
Click on the "Serverless SQL pool" option in the left-hand menu.
Click on "New" to create a new serverless SQL pool.
Provide a name, choose the desired storage account, and set the desired maximum data size.
Click on "Create" to provision the serverless SQL pool.

Granting Storage Account Access

In order to access data from your storage account, you need to grant appropriate access permissions. Follow these steps:

Navigate to your storage account in the Azure portal.
Click on "Access control (IAM)" in the left-hand menu.
Click on the "Add" button to add a new role assignment.
Select the appropriate role (e.g., Storage Blob Data Reader) and specify the Synapse Analytics workspace as the assignee.
Click on "Save" to grant the necessary permissions.

Querying Data using OPENROWSET

Now that your serverless SQL pool is set up and the necessary permissions are granted, you can start querying data using the OPENROWSET function.

Here's an example:

SELECT *
FROM OPENROWSET(
    BULK 'https://yourstorageaccount.blob.core.windows.net/container/data.csv',
    FORMAT = 'CSV',
    PARSE_QUOTE = '"',
    FIRSTROW = 2
) AS rows;

Important Notes:

Replace 'https://yourstorageaccount.blob.core.windows.net/container/data.csv' with the actual path to your data file.
Adjust the FORMAT and other parameters according to the file format and its specific configuration.
The FIRSTROW parameter specifies the row number from which to start reading data.

The Data Source is an Azure storage account and it can be explicitly referenced in the OPENROWSET function or can be dynamically inferred from URL of the files that you want to read.

Note: OPENROWSET without DATA_SOURCE provides quick and easy way to access the storage files but offers limited authentication options.

OPENROWSET without DATA_SOURCE can be used to directly read the contents of the files from the URL location specified as BULK option:

SELECT * FROM 
OPENROWSET(BULK'http://<storage account>.dfs.core.windows.net/container/folder/*.parquet',                 FORMAT = 'PARQUET') AS [file]

OPENROWSET with DATA_SOURCE can be used to access files on specified storage account:

SELECT * FROM 
OPENROWSET(BULK'/folder/*.parquet',                 
        DATA_SOURCE='storage', --> Root URL is in LOCATION of DATA SOURCE
        FORMAT = 'PARQUET') AS [file]

When using the OPENROWSET function in Azure Synapse Analytics, there are several possibilities for the URL parameter depending on the storage account and file format you're working with. Here are some examples of possible URL configurations:

1. Azure Blob Storage:

- CSV file:

BULK 'https://yourstorageaccount.blob.core.windows.net/container/data.csv'

- JSON file:

BULK 'https://yourstorageaccount.blob.core.windows.net/container/data.json'

- Parquet file:

BULK 'https://yourstorageaccount.blob.core.windows.net/container/data.parquet'

2. Azure Data Lake Storage Gen2:

- CSV file:

BULK 'https://yourdatalakeaccount.dfs.core.windows.net/container/data.csv'

- JSON file:

BULK 'https://yourdatalakeaccount.dfs.core.windows.net/container/data.json'

- Parquet file:

BULK 'https://yourdatalakeaccount.dfs.core.windows.net/container/data.parquet'

Notes:

- Replace "yourstorageaccount" with the name of your Azure storage account.

- Replace "yourdatalakeaccount" with the name of your Azure Data Lake Storage Gen2 account.

- Adjust the "container" parameter to match the container or folder where your data is stored.

- Ensure that your storage account or data lake has the appropriate permissions set to allow access from Azure Synapse Analytics.

Remember, these are just examples, and the actual URL configuration will depend on your specific storage account, container, and file format.

Performance Considerations

While using OPENROWSET with serverless SQL pools, keep the following performance considerations in mind:

Ensure that your data is properly partitioned and optimized for query execution.
Use appropriate file formats, such as Parquet or ORC, for better query performance.
Leverage partition pruning and predicate pushdown to minimize data scanning.

Conclusion

In this blog post, we explored how to use the OPENROWSET function with serverless SQL pools in Azure Synapse Analytics. By leveraging this functionality, you can easily query and analyze data residing in external sources. Remember to optimize your data and consider performance best practices for efficient query execution.