Scripting Azure Synapse Analytics

Background

Development on Azure Synapse Analytics (formerly Azure SQL Data Warehouse) is progressing at a rapid pace. Recently, in April 2020, the new workspaces preview was released. This is an exciting update that exposes the world to the Synapse ecosystem. It is the first time I’ve been able to see that the name change to Synapse is much more than just re-branding. Azure Synapse Analytics now truly is an integrated data platform for BI, AI, and continuous intelligence.

This post aims to introduce you to the new concepts this preview of Azure Synapse Analytics exposes and demonstrate how easy it can be to spin up these services via the Azure Cloud Shell.

Azure Synapse Analytics Workspaces

If you are familiar with Azure Synapse or Azure SQL Data Warehouse there are some new terms and concepts you need to become familiar with.

  • Workspaces – think of these as analogous to the previous Data Warehouse Server (but there is much more)
  • SQL Pool – your provisioned (DWU / vCore) resource
  • SQL On-Demand – serverless resource with a pay per TB processed pricing model
  • Spark Pool – a definition of a Spark instance
  • Synapse Studio – Web environment that ties everything together. This is the beginning of a true SSMS replacement for interacting with your data warehouse

While you can certainly create an Azure Synapse Analytics workspace via the portal, why not save time by parameterizing and scripting out the solution?

Please remember that this offering is still in preview, the service is provided without a Service Level Agreement (SLA), and the code explained below is subject to change.

Scripting

This script will spin up an Azure Synapse Analytics Workspace & SQL Pool. It will run a set of scripts against the newly created data warehouse and then pause the data warehouse.

Prerequisites

Microsoft / Azure Tools Used

  • Azure CLI (Cloud Shell) – where we will edit files & run scripts
  • Azure Storage Account
  • Azure Data Lake gen2
  • Azure Synapse Analytics Workspaces
  • Azure Synapse SQL Pool

Azure Cloud Shell

Azure Cloud Shell is a great browser-based shell which allows you to interact with your Azure resources. We will be using it to run Azure Command-Line Interface (CLI) commands.
Log into the Azure Portal, select your subscription, and bring up Cloud Shell by clicking the icon to the right of the search bar.

Image and Preview Themes on the toolbar

Alternatively you can go to https://shell.azure.com/ for a full-screen experience.

You now have your Cloud Shell instance. We will be working with it in bash mode but PowerShell is also available via the drop-down in the top left of the shell.

Create a directory for where you want to do save your scripts for this project. In this example I also create the main script and make it executable:

mkdir synapseScripting
cd synapseScripting
touch createSynapse.sh
chmod +x createSynapse.sh

Open the Editor

Azure Cloud Shell comes with a built-in editor based on Monaco Editor.  This is the same editor that powers VS Code.  The editor can be opened by typing

code . # Opens the editor where you can browse to any file
code ./createSynapse.sh # Opens directly to the file we created above

or by clicking the { } button.

The editor can be closed by pressing Ctrl-q

Define Script Variables

Begin the script by defining variables. These are the variables that will be used throughout the script. I have implemented a random 3 digit number to be appended to some of the values which require uniqueness. $ipaddress is set to the public IP of the Azure Cloud Shell instance and will be used to grant access through the firewall. The $createScripts variable holds SQL scripts that this script will automatically run on the newly created data warehouse. Modify these values as you wish.

#!/bin/bash
#### DEFINE VARIABLES ####
echo "DEFINING VARIABLES"
randomNumber=$((1 + RANDOM % 1000))
resourceGroupName="synapsePOC"
location="eastus2"
synapseStorageAccount="synapsestorageaccount$randomNumber" # synapseStorageAccount###
synapseFileSystem="synapsefilesystem$randomNumber" # synapseFileSystem### This is basically a container
synapseWorkspaceName="synapseworkspace$randomNumber"
synapseAdminName="sqladminuser"
synapseAdminPW="y0urg00dP@ssW0rd"
synapseSQLPoolName="sqlpool$randomNumber" # 15 char max
synapsePerformanceLevel="DW100c"
ipaddress=`curl -s checkip.dyndns.org | sed -e 's/.*Current IP Address: //' -e 's/<.*$//'`
createScripts="create_schemas.sql,create_myTable.sql"

Enable Synapse Extension

The synapse commands are not yet released by default in the az cli. We need to enable it by adding the extension. The commands are in preview and subject to change.

echo "ENABLING SYNAPSE EXTENSION"
az extension add --name synapse

Starting With A Fresh Resource Group

All resources built in this script are contained within one resource group ($resourceGroupName). To start fresh, the first step is to delete any previously created resource group with this name. Ensure you understand that this will delete all resources in this resource group. It will prompt you if you want to continue with the delete to prevent unintended consequences. (If you don’t want that prompt you can put a -y flag at the end of the az group delete command.)

#### DELETE RG IF EXISTS ####
echo "DELETING RESOURCE GROUP $resourceGroupName IF IT EXISTS"
if [ `az group exists --name $resourceGroupName` = "true" ]
    then 
        echo "IT EXISTED....DELETING"
        az group delete --name $resourceGroupName
    else
        echo "IT DIDN'T EXIST...CONTINUING"
fi

#### CREATE RG ####
echo "CREATING RESOURCE GROUP $resourceGroupName"
az group create --name $resourceGroupName --location $location

Create Storage Account & File System (Container)

Synapse Workspace requires a storage account & file system with Azure Data Lake Gen 2

#### CREATE STORAGE ACCOUNT #### --https://docs.microsoft.com/en-us/cli/azure/storage/account?view=azure-cli-latest#az-storage-account-create
echo "CREATING STORAGE ACCOUNT FOR SYNAPSE: $synapseStorageAccount"
az storage account create --name $synapseStorageAccount 
                          --resource-group $resourceGroupName 
                          --access-tier Hot 
                          --enable-hierarchical-namespace true 
                          --kind StorageV2 
                          --location $location 
                          --sku Standard_RAGRS 

#### CREATE FILE SYSTEM / CONTAINER ####
echo "CREATING FILE SYSTEM (CONTAINER): $synapseFileSystem"
connectString=`az storage account show-connection-string -g $resourceGroupName -n $synapseStorageAccount -o tsv`

az storage container create --name $synapseFileSystem 
                            --account-name $synapseStorageAccount 
                            --connection-string $connectString

Create Synapse Workspace & SQL Pool

#### CREATE SYNAPSE WORKSPACE ####
echo "CREATING SYNAPSE WORKSPACE"
az synapse workspace create --name $synapseWorkspaceName 
                            --resource-group $resourceGroupName 
                            --storage-account $synapseStorageAccount 
                            --file-system $synapseFileSystem 
                            --sql-admin-login-user $synapseAdminName 
                            --sql-admin-login-password $synapseAdminPW 
                            --location $location

#### CREATE SYNAPSE SQL POOL ####
echo "CREATING SYNAPSE SQL POOL"
az synapse sql pool create  --resource-group $resourceGroupName 
                            --workspace-name $synapseWorkspaceName 
                            --name $synapseSQLPoolName 
                            --performance-level $synapsePerformanceLevel

There is a known issue that Azure Synapse Studio workspaces created from the CLI using the SDK are not accessible. The documented workaround didn’t work verbatim so I’ve included my solution. Note that this enables connection to your Synapse Workspace from all IP addresses!

# Allow web access to workspace workaround
identity=`az synapse workspace show --name $synapseWorkspaceName --resource-group $resourceGroupName --query "identity.principalId" -o tsv`
az role assignment create --role "Storage Blob Data Contributor" --assignee-object-id $identity --scope `az storage account show -g $resourceGroupName -n $synapseStorageAccount --query "id" -o tsv`
az synapse workspace firewall-rule create -g $resourceGroupName --workspace-name $synapseWorkspaceName --name allowAll --start-ip-address 0.0.0.0 --end-ip-address 255.255.255.255

The next command also modifies the firewall to enable the following commands to interact with the DB. It isn’t technically needed since the above command opened up the full IP range but I’ve included it since I imagine the workaround won’t always be needed.

#### MODIFY FIREWALL ####
echo "MODIFYING FIREWALL TO ALLOW ACCESS FROM THIS ENVIRONMENT ($ipaddress)"
az synapse workspace firewall-rule create --resource-group $resourceGroupName 
                                          --workspace-name $synapseWorkspaceName 
                                          --name azureCLI 
                                          --start-ip-address $ipaddress 
                                          --end-ip-address $ipaddress

Connect To The DB

At this point the Workspace & SQL Pool are created. You could just connect via SSMS or Azure Studio and create tables and interact with the resource in the normal fashion. However, this post is focused on scripting and normally there are some initial scripts that need to be run whether it be creating schemas, tables, users, etc. The next steps show how that can be done using sqlcmd. The scripts need to be in the Azure Storage account associated with your Azure Shell. You can create them in the built-in editor just like we are building this script or you can upload them to the File share using az copy or Azure Storage Explorer. More details on the File share can be found here. Let’s create 2 files (sql scripts) which create a schema and a table.

create_schemas.sql

IF (NOT EXISTS (SELECT * FROM sys.schemas WHERE name = 'ods')) 
BEGIN
    EXEC ('CREATE SCHEMA [ods] AUTHORIZATION [dbo]')
END

create_myTable.sql

IF OBJECT_ID('ods.myTable', 'U') IS NOT NULL
    DROP TABLE ods.myTable 
CREATE TABLE ods.myTable 
(
    [foo]   int NOT NULL
)
WITH
( 
    DISTRIBUTION = ROUND_ROBIN,
    CLUSTERED COLUMNSTORE INDEX 
)

Now we will connect to the DB and run those scripts.

#### CONNECT TO DB & CREATE SCHEMA & TABLES  #### --https://docs.microsoft.com/en-us/sql/tools/sqlcmd-utility?view=sql-server-ver15
echo "CONNECTING TO NEW SYNAPSE DB & CREATING SCHEMA AND TABLES"
sqlcmd -S $synapseWorkspaceName.sql.azuresynapse.net    -U $synapseAdminName 
                                                        -P $synapseAdminPW 
                                                        -I 
                                                        -d $synapseSQLPoolName 
                                                        -i $createScripts

Pause the SQL Pool

Since in Synapse we pay for compute separately from storage it is a good practice to pause the compute when it is not needed. The following command will pause the SQL Pool. It can be resumed via a similar command or via the Azure Portal.

#### PAUSE SQL POOL ####
az synapse sql pool pause --name $synapseSQLPoolName --resource-group $resourceGroupName --workspace-name $synapseWorkspaceName

Conclusion

With all the code snippets above saved in createSynapse.sh and the .sql files created and named the same way they are in $createScripts, you should be able to run your script via Cloud Shell.

. ./createSynapse.sh

In the above script we have created an Azure Synapse Workspace and SQL Pool. We have run a set of initial SQL scripts and paused the SQL Pool. The full script takes about 15 minutes to run (including deleting the previous resource group). Be sure to explore the Synapse Pipelines, Synapse Studio, create a Spark Pool. Let me know if you’ve used this script or expanded it in any way. One thing I’m interested in creating and execute Spark jobs via bash script. Please leave a comment with any questions or things you’d like to see in the future! Also, feel free to reach out and follow me on Twitter @MattSzafirPro

 

Here is the full code of all the scripts mentioned above:

https://gist.github.com/ZMon3y/9328a919bc0b1581c4513b0768c52866

Stay Informed

Sign up for the latest blogs, events, and insights.

We deliver solutions that accelerate the value of Azure.
Ready to experience the full power of Microsoft Azure?

Atmosera is thrilled to announce that we have been named GitHub AI Partner of the Year.

X