By March 6, 2014 2 Comments Read More →

A CLR Application: Deleting Three Billion Rows

In one of SQL Server Central’s recent daily newsletters, Steve Jones’s editorial was inquiring whether or not anyone has been using the CLR (common language runtime) integration within SQL Server.  Recently, a change in a business process dictated that we needed to delete over 3 billion rows from a 16 billion row table – not a small task. Rather then attempt to do this in a single transaction that would have had a very high impact to the users, we came up with an approach that utilized a CLR stored procedure.

The guts of the process involved creating two T-SQL stored procedures to supplement the CLR stored procedure.  One stored procedure identified the rows we needed to delete, but in thousands of smaller batches.  Another stored procedure performed the delete of a single batch.  Then, using the CLR, we were able to use managed code (we chose .NET, but you can use C# as well) to loop through and delete each batch as a separate transaction until all 3 billion rows were gone.  The solution below is documented using the AdventureWorks2008R2 database.  Even though this database is nowhere near 16 billion rows, the template is the same.

Step 1: Enable CLR

CLR integration is disabled by default, so we first need to enable it on our SQL Server instance.

sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'clr enabled', 1;
GO
RECONFIGURE;
GO

Step 2: Create the supporting T-SQL stored procedures

For this example, I’m going to use the Sales.SalesOrderHeader and Sales.SalesOrderDetail tables in the AdventureWorks2008R2 database.  The OrderHeader table is the parent, with the OrderDetail table being the child.  Use your imagination that these tables are much larger than they really are, and that we have a requirement to delete all sales orders older than a specific date.

First, we’ll create the stored procedure that identifies all of the SalesOrderIds that relate to the rows we want to delete.  The stored procedure will take in a date parameter that identifies the SalesOrderIds that are older than the date we pass in.

CREATE PROCEDURE usp_GetSalesOrdersToPurge
@OrderDate datetime

AS

BEGIN

SELECT SalesOrderId FROM Sales.SalesOrderHeader where OrderDate < @OrderDate

END

GO

Next comes the stored procedure that will perform each delete.  Since we will be deleting a single sale order at at a time, the stored procedure will take in a single SaleOrderId.

CREATE PROCEDURE usp_PurgeSalesOrders
@SalesOrderId int

AS

BEGIN

DELETE FROM Sales.SalesOrderHeader WHERE SalesOrderID = @SalesOrderId

END

GO

In this example, and in our real world situation, the foreign key from the child (SalesOrderDetail) to the parent (SalesOrderHeader) has the ‘ON DELETE CASCADE’ option enabled.  We just need to delete the record from SalesOrderHeader, and the cascading delete will take care of the rows in SalesOrderDetail.

 Step 3: Create the CLR stored procedure

Now that we have the T-SQL stored procedures in place, we need to create the CLR stored procedure.  We do this by creating a CLR database project within Visual Studio (screenshots below are from Visual Studio 2010).

Go to File–>New–>Project and select ‘Visual Basic SQL CLR Database Project’.  Again, the option for C# is there, but we’re using .NET for this example.

CLR_1

Next we’re prompted to connect to the database where we will be deploying the CLR stored procedure:

CLR2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The following prompt seems harmless enough at first glance, however note the ‘during debugging all managed threads on the server will stop‘.  If you’re creating the CLR stored procedure on an instance that others are using, you’re going to want to click ‘No’ here, unless you want some upset coworkers.  If you do need to debug, make sure it’s in an isolated environment.  For more information regarding CLR debugging, check out books online: http://msdn.microsoft.com/en-us/library/ms165051%28v=vs.100%29.aspx

CLR3

 

 

 

 

 

 

 

Now that our project is created, we need to add our .NET class that will contain the CLR code.  Right click the project and select Add–>Class.

CLR4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Select the ‘VB Class’ icon

CLR5

Step 4: Time for the CLR code

Double click the class you just created to begin coding.  I’ll provide the code in its entirely, with comments that explain what each part is doing.

'The following classes need to be imported for this example
Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server

Public Class StoredProcedures
'This is the name of the CLR stored procedure
<SqlProcedure(Name:="usp_SalesOrderPurge")> _ 
Public Shared Sub usp_SalesOrderPurge(ByVal dtOrderDate As Date) 'This is the same date that we will pass into usp_GetSalesOrderToPurge

'These are our variable declarations
Dim sqlConnection As SqlConnection = New SqlConnection("context connection=true")
Dim salesOrders As DataTable
Dim selDataAdapter As SqlDataAdapter
Dim delCommand As SqlCommand
Dim intSalesOrderId As Integer

Try
selDataAdapter = New SqlDataAdapter("usp_GetSalesOrdersToPurge", sqlConnection)
'We call usp_GetSalesOrderToPurge, passing in the same date parameter we used
'in the CLR procedure
With selDataAdapter
With .SelectCommand
.CommandType = CommandType.StoredProcedure
.Parameters.Clear()
.Parameters.Add("OrderDate", SqlDbType.DateTime).Value = dtOrderDate
End With

'A dataset is filled with the results of usp_GetSalesOrderToPurge
Dim datasetObj As New DataSet
.Fill(datasetObj)

salesOrders = datasetObj.Tables(0)
End With
'We close our current connection
sqlConnection.Close()

'We loop through the previously filled dataset, and call
'usp_PurgeSalesOrder for each record.  Note that a connection
'is opended and closed for each delete.
For Each record As DataRow In salesOrders.Rows
intSalesOrderId = record("SalesOrderId")

sqlConnection.Open()
delCommand = New SqlCommand("usp_PurgeSalesOrders", sqlConnection)
With delCommand
.CommandType = CommandType.StoredProcedure
.Parameters.Clear()
.Parameters.Add("SalesOrderId", SqlDbType.Int).Value = intSalesOrderId
.ExecuteNonQuery()
End With
sqlConnection.Close()
delCommand = Nothing
'This string in the SqlContext.Pipe.Send will be displayed in the 'Messages' tab in SSMS.
SqlContext.Pipe.Send("Sales Order " + intSalesOrderId.ToString + " has been purged")
Next record

'We catch any exception, and pipe it to the messages tab.  Finally, our connection is closed when
'the stored procedure has completed.
Catch ex As Exception
SqlContext.Pipe.Send("Error: " & ex.Message)
Finally
If Not sqlConnection Is Nothing AndAlso sqlConnection.State = ConnectionState.Open Then
sqlConnection.Close()
End If
End Try

End Sub
End Class

Step 5: Deploy and Execute

One ‘gotcha’ here is that depending on your version of SQL Server, you may need to alter the ‘Target Framework’ of the package.  The project I created defaulted to using the .NET 4.0 framework , which will work for SQL Server 2012.  If you’re using 2008R2, 2008 or 2005, it needs to be set to a framework between 2.0 and 3.5.  Since I was deploying to a 2008R2 instance, I needed to change it.  You can do so by going to the Properties–>Compile–>Advanced Compiler Settings of the project.

Once the targeted framework is straightened out, we can deploy.  Select Build–>Deploy ‘ProjectName’ from the Visual Studio menu.
CLR7

 

 

 

 

 

 

 

 

If in the output window you see the following, then the CLR has been deployed and is now present on the SQL Server instance you selected at the beginning of development.

========== Build: 1 succeeded or up-to-date, 0 failed, 0 skipped ==========
========== Deploy: 1 succeeded, 0 failed, 0 skipped ==========

If you need to change the instance you’re deploying to, you can do so on the ‘Database’ tab of the project properties.

With the CLR procedure deployed, you can execute it just like any other stored procedure. If we wanted to purge all records older than 7/30/2008, it would look like the following:

EXEC usp_SalesOrderPurge '2008-07-30 00:00:00.000'
GO

 

End Result

We knew this solution wouldn’t be the fastest, but run time wasn’t our concern.  We needed something that could run alongside other processes and not bring the database to its knees.  The CLR allowed us to do that by eliminating extensive blocking and giving us a process that could be stopped and pick up where it left off upon restart. We of course had to give the database a little TLC via index and statistics maintenance after each long purge session, but in the end, we accomplished our goal of purging 3 billion rows without much headache.

Posted in: SQL Server

2 Comments on "A CLR Application: Deleting Three Billion Rows"

Trackback | Comments RSS Feed

  1. jeffrey yao says:

    What is the benefit here using CLR SP ? Why cannot you use the regular T-SQL to do deletion by small chunk one at a time?

    • akreul says:

      Hey Jeffrey,
      Thanks for reading. We chose to use the CLR because it allowed us to easily split the deletes up into thousands of transactions. If you look at the CLR code, you’ll notice that we are grabbing a new connection with each delete. This way, other transactions could come in and out while our purge was running, and we wouldn’t end up blocking the entire time.
      -Adam

Post a Comment