At work, I use SQL a lot. It is not without annoying nuances and limitations, but in the end, it is the foundation of all data professions. For that reason, it is an absolute necessity for anyone working in the land of data. Being competent in SQL is incredibly important.
Although SQL is great, why settle for great? Why don’t we supercharge SQL?
The limitations of SQL stem from it being a declarative language, meaning we tell SQL what we want and SQL will fetch that from a specified database. For many data extraction or simple data manipulation tasks, this is all that is needed.
But what if we want more?
That is exactly what I will show you how to do in this article.
import pyodbc
from datetime import datetime
class Sql:
def __init__(self, database, server="XXVIR00012,55000"):
# here we are telling python what to connect to (our SQL Server)
self.cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server="+server+";"
"Database="+database+";"
"Trusted_Connection=yes;")
# initialise query attribute
self.query = "-- {}\n\n-- Made in Python".format(datetime.now()
.strftime("%d/%m/%Y"))
mssql_connection_short.py
This code is the foundation to supercharging MS SQL server. Once this is built we can connect to SQL from Python with just:
sql = Sql('database123')
Easy right? There are a few things going on, so let’s dissect the code.
class Sql:
The first thing to note, we are containing this within a class. I have found this approach logical as in this format we add or remove processes for this specific database. This will make more sense once seeing how this works.
We initialise our class:
def __init__(self, database, server="XXVIR00012,55000"):
Myself and my colleagues almost always connect to the same server. So I set this common server name as the default argument for server
.
Your server name can be found in the ‘Connect to Server’ dialog box or at the top of your window in MS SQL Server Management Studio:
Next, we establish our connection to SQL:
self.cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server="+self.server+";"
"Database="+self.database+";"
"Trusted_Connection=yes;")
This is made incredibly easy thanks to the pyodbc module. We simply pass a connection string to the pyodbc.connect(...)
function, more details on this can be found here.
Finally, I like to create a query string within the Sql
class which is updated with every query passed to the class:
self.query = "-- {}\n\n-- Made in Python".format(datetime.now()
.strftime("%d/%m/%Y"))
This allows us to keep a log of our code and also doubles as a more readable output to colleagues that may be more comfortable reading SQL code.
Note that I will exclude the update to _self.query_
sections of the code in the following code snippets, please check the full code in the links provided if needed.
There are a few essential functions that I find incredibly useful and use almost daily. These all focus on transferring data in or out of a database.
Let’s start with the following directory:
For our current project, we need to:
After fleshing out our SQL class a little more this will be as easy as:
import sys
sys.path.insert(0, r'C:\\User\medium\pysqlplus\lib')
import os
from data import Sql
sql = Sql('database123') # initialise the Sql object
directory = r'C:\\User\medium\data\\' # this is where our generic data is stored
file_list = os.listdir(directory) # get a list of all files
for file in file_list: # loop to import files to sql
df = pd.read_csv(directory+file) # read file to dataframe
sql.push_dataframe(df, file[:-4])
# now we convert our file_list names into the table names we have imported to SQL
table_names = [x[:-4] for x in file_list]
sql.union(table_names, 'generic_jan') # union our files into one new table called 'generic_jan'
sql.drop(table_names) # drop our original tables as we now have full table
# get list of categories in colX, eg ['hr', 'finance', 'tech', 'c_suite']
sets = list(sql.manual("SELECT colX AS 'category' FROM generic_jan GROUP BY colX", response=True)['category'])
for category in sets:
sql.manual("SELECT * INTO generic_jan_"+category+" FROM generic_jan WHERE colX = '"+category+"'")
pysqlplus_example.py
Let’s start at the top.
def push_dataframe(self, data, table="raw_data", batchsize=500):
# create execution cursor
cursor = self.cnxn.cursor()
# activate fast execute
cursor.fast_executemany = True
# create create table statement
query = "CREATE TABLE [" + table + "] (\n"
# iterate through each column to be included in create table statement
for i in range(len(list(data))):
query += "\t[{}] varchar(255)".format(list(data)[i]) # add column (everything is varchar for now)
# append correct connection/end statement code
if i != len(list(data))-1:
query += ",\n"
else:
query += "\n);"
cursor.execute(query) # execute the create table statement
self.cnxn.commit() # commit changes
# append query to our SQL code logger
self.query += ("\n\n-- create table\n" + query)
# insert the data in batches
query = ("INSERT INTO [{}] ({})\n".format(table,
'['+'], [' # get columns
.join(list(data)) + ']') +
"VALUES\n(?{})".format(", ?"*(len(list(data))-1)))
# insert data into target table in batches of 'batchsize'
for i in range(0, len(data), batchsize):
if i+batchsize > len(data):
batch = data[i: len(data)].values.tolist()
else:
batch = data[i: i+batchsize].values.tolist()
# execute batch insert
cursor.executemany(query, batch)
# commit insert to SQL Server
self.cnxn.commit()
mssql_push_dataframe_short.py
This function is contained inside our SQL class. It allows us to easily push a Pandas dataframe to the SQL database.
This is particularly useful when needing to upload a lot of files. However, the true power in allowing Python to import data to SQL comes with Python’s flexibility.
Importing a specific tab across a dozen Excel workbooks into SQL can be a nightmare. But with Python it’s easy. Now we have built a function that allows us to use Python to read those tabs and import them into SQL for us.
def manual(self, query, response=False):
cursor = self.cnxn.cursor() # create execution cursor
if response:
return read_sql(query, self.cnxn) # get sql query output to dataframe
try:
cursor.execute(query) # execute
except pyodbc.ProgrammingError as error:
print("Warning:\n{}".format(error)) # print error as a warning
self.cnxn.commit() # commit query to SQL Server
return "Query complete."
This function is actually used in the union
and drop
functions. It simply allows us to execute SQL code as simply as possible.
The response
argument gives us the option to extract the output of our query to a DataFrame. Allowing us to extract all unique values from colX
in our generic_jan
table using just:
sets = list(sql.manual("SELECT colX AS 'category' FROM generic_jan GROUP BY colX", response=True)['category'])
Now we have built the manual
function, creating the union
function is easy:
def union(self, table_list, name="union", join="UNION"):
# initialise the query
query = "SELECT * INTO ["+name+"] FROM (\n"
# build the SQL query
query += f'\n{join}\n'.join(
[f'SELECT [{x}].* FROM [{x}]' for x in table_list]
)
query += ") x" # add end of query
self.manual(query, fast=True) # fast execute
pysqlplus_union_short.py
This simply loops through a list of table names we provide via table_list
, building a UNION query for all table names given. This is then executed with self.manual(query)
.
We have the ability to upload a very large number of tables to SQL Server. This is great but can quickly overpopulate our database. To remedy this, we will create a drop
function:
def drop(self, tables):
# check if single or list
if isinstance(tables, str):
# if single string, convert to single item in list for for-loop
tables = [tables]
for table in tables:
# check for pre-existing table and delete if present
query = ("IF OBJECT_ID ('["+table+"]', 'U') IS NOT NULL "
"DROP TABLE ["+table+"]")
self.manual(query) # execute
Again, this function is incredibly simple thanks to the manual
function. This allows us to either drop a single table by providing a string to tables
or to drop multiple tables by providing a list of table names to tables
.
When combined these incredibly simple functions allow us to massively expand the capabilities of SQL Server by leveraging the strengths of Python.
If you are interested in seeing more please check out the GitHub repo for pysqlplus
. This is a small project and is simply built to enhance interaction with MS SQL Server for my own workplace.
Nonetheless, I hope it may help some other users find ways to incorporate some Python into their SQL routines. I personally use this almost daily, it is so simple yet incredibly powerful.
Thanks for reading!
☞ Python Tutorials for Beginners - Learn Python Online
☞ Learn Python in 12 Hours | Python Tutorial For Beginners
☞ Complete Python Tutorial for Beginners (2019)
☞ Python Programming Tutorial | Full Python Course for Beginners 2019