SQLSaturday Johannesburg & Cape Town pre-con: Understanding SSIS internals and performance tuning

I am very happy and super excited to announce that I will be doing a full day pre-con the day before SQLSaturday #327 Johannesburg on 29 August, as well as SQLSaturday #301 Cape Town on 5 September 2014. Having lived in South Africa, I have had the pleasure to visit Johannesburg, Cape Town and Stellenbosch on a number of occasions. I must say, I can’t wait to go back!

Two years ago, I started speaking at SQL Server conferences, when Jody Roberts (twitter) pinged me on Twitter and asked if I didn’t want to give a talk at SQLSaturday #134 Johannesburg. I gave my first talk on Layered Partitioning, and it was such an awesome experience that I decided to go all in; which has resulted in three big conferences (and quite a few smaller ones) this year.

So, this year I am going back to South Africa where it all started to give a full day pre-con on Understanding SSIS internals and performance tuning.

Abstract:

You have worked with SQL Server Integration Services (SSIS) for a few years, but more often than not, you see performance problems with your packages. Some packages are running slow, while others are taking up more memory and CPU than before. But why are they slow or taking up a lot of resources? What is actually going on behind the scene? Join this full-day session to find out.

The first part will cover the internals of SSIS. Starting with an overview, we will move into the details of the control flow engine and the data flow engine. At the end of the internals section, you will have a deep understanding of the strengths and weaknesses of SSIS and what is the best way to design your packages.

The second part will cover how we can use the knowledge about the internals of SSIS to gain better performance. This part will show practical examples of common performance problems, and how these can be solved. Furthermore we will look at how designing and tuning your data flows can increase performance significantly.

If you are in Johannesburg or Cape Town (or feel like visiting the beautiful South Africa), I would be happy to see you at my pre-con.

Sign up for a full day of understanding SSIS internals and performance tuning in Johannesburg on 28 August.

Sign up for a full day of understanding SSIS internals and performance tuning in Cape Town on 5 September.

SQLSaturday Johannesburg & Cape Town pre-con: Understanding SSIS internals and performance tuning

Speaking at SQLBits, SQL Server Days, and PASS Summit 2014

PASS Summit 2014 Speaking

I am very honored and happy to announce that I will be speaking at three major SQL Server conferences this year.

Update: I will also be giving my SSIS internals and performance talk at SQLSaturday #310 Dublin, and am looking forward to battling beating Bob Duffy (Blog | Twitter) in Giant Jenga and a 5 minute “call my SQL bluff” lightening talk.

Speaking at SQLBits, SQL Server Days, and PASS Summit 2014

MSBuild error (and fix) with Visual Studio 2013 SQL Server tooling and SQL Server 2014

To show the possibilities with continuous integration with SQL Server development, I decided to spin up a virtual machine with Visual Studio 2013, Team Foundation Server 2013, and SQL Server 2014. While setting it up, I ran into an error that I thought I’d share here.

If you want to know more about how to set up continuous integration for SQL Server development, Jamie Thomson (blog | twitter) wrote a very good blog post about continuous deployment of SSDT database projects to Windows Azure using Team Foundation Service. This approach can also be used for on premise build servers using Team Foundation Server, which is what I was looking for.

With Visual Studio 2013 you can no longer download SQL Server Data Tool separately, as SQL Server tooling is included. You will need to use either one paid versions of Visual Studio or one of the free express editions.

If you wish to deploy to SQL Server 2014, you also need to update the SQL Server tooling from within Visual Studio.

While I was trying to set up a build server, I ran into this error:

C:Builds3MyProjectAdventureWorks Test Server BuildSourcesMainAdventureWorksAdventureWorksAdventureWorks.sqlproj (63): The imported project "C:Program Files (x86)MSBuildMicrosoftVisualStudiov11.0SSDT Microsoft.Data.Tools.Schema.SqlTasks.targets" was not found. Confirm that the path in the <Import> declaration is correct, and that the file exists on disk.

In my .sqlproj file, the Import declaration showed:

<Import Project="$(MSBuildExtensionsPath)MicrosoftVisualStudio v$(VisualStudioVersion)SSDTMicrosoft.Data.Tools.Schema.SqlTasks.targets" />

The C:Program Files (x86)MSBuildMicrosoftVisualStudiov11.0SSDT path did not exist, but the C:Program Files (x86)MSBuildMicrosoftVisualStudiov12.0SSDT did.

It looked like an old MSBuild version was being used, so the solution I found was to override the variable in the build definition to use the newest version.

This is what I did:

  1. Find build definition under Builds in the Team Explorer
  2. Click Edit build definition
  3. Click Process
  4. Click Advanced
  5. Add /p:VisualStudioVersion=12.0 under MSBuild arguments

My build arguments ended up looking like this, when adding the build, publish and SqlPublishProfilePath arguments:

/t:Build /t:Publish /p:SqlPublishProfilePath=AdventureWorks.Test.publish.xml /p:VisualStudioVersion=12.0

After that, the build worked just fine.

MSBuild error (and fix) with Visual Studio 2013 SQL Server tooling and SQL Server 2014

Create Data Warehouse extract tables from SQL Server source system

I have often found that I needed a quick script for giving me CREATE TABLE statements for each table in a source system database, so I could create these tables in my extract schema in my Data Warehouse. This can obviously be done using SQL Server Management Studio, but I found that I would then manually have to edit each statement to remove keys, indexes, identity columns, and everything else that I usually do not want in my extract tables.

Having not being able to find one, I wrote my own, and though I’d share it here. The following script writes a CREATE TABLE statement for each table in a SQL Server source database, so they can be used for creating extract tables in a data warehouse.

DECLARE @ExtractSchema NVARCHAR(100) = 'Extract'

SELECT 'CREATE TABLE [' + @ExtractSchema + '].['
      + SCHEMA_NAME (t.schema_id)
      + '_' + t.name + '] (' + LEFT(cols .column_list, LEN(cols .column_list) - 1) + ')'
FROM sys.tables t
CROSS APPLY (SELECT '[' + c .name + '] ' + UPPER(ty .name)
                    + CASE WHEN ty.name IN ('decimal', 'numeric') THEN
                              '(' + CONVERT(NVARCHAR(10), c.precision)
                              + ', '
                              + CONVERT (NVARCHAR(10), c.scale)
                              + ')'
                      WHEN ty.name = 'float' THEN
                         '(' + CONVERT(NVARCHAR(10), c.precision ) + ')'
                      WHEN ty.name IN ('datetime2', 'datetimeoffset', 'time') THEN
                         '(' + CONVERT(NVARCHAR(10), c.scale) + ')'
                      WHEN ty.name IN ('char', 'nchar', 'varchar', 'nvarchar') THEN
                         '(' + CASE WHEN c.max_length = - 1
                                   THEN 'MAX'
                                   ELSE CAST (c.max_length AS NVARCHAR(10))
                              END + ')'
                      ELSE ''
                      END
                    + CASE WHEN c.is_nullable = 0 THEN ' NOT'
                      ELSE ''
                      END
                    + ' NULL, '
               FROM sys.columns c
               INNER JOIN sys.types ty
               ON c.user_type_id = ty.user_type_id
               WHERE t.object_id = c.object_id
               ORDER BY c.column_id
               FOR XML PATH('')
      ) cols (column_list)
Create Data Warehouse extract tables from SQL Server source system

When does SQL Server partition split move data?

One of the many benefits when using table partitioning in SQL Server is the metadata only operations, such as switch, split, and merge. However, if we are not careful, splitting partition can potential take a long time if data end up being moved between two partitions.

In this blog post we will take a look at when SQL Server partition split will be a metadata only operation, and when data will be moved between partitions.

According to Books Online, we should avoid splitting or merging populated partitions:

Always keep empty partitions at both ends of the partition range to guarantee that the partition split (before loading new data) and partition merge (after unloading old data) do not incur any data movement. Avoid splitting or merging populated partitions. This can be extremely inefficient, as this may cause as much as four times more log generation, and may also cause severe locking.

But let’s take a look at what is actually happening when we split a partition. By using fn_dblog to read from the transaction log, we can see whether we have data movement or not when we split a partition.

To demonstrate this, I have created a partitioned table with two partitions; one empty and one with 20000 rows. The 20000 rows are split between 10000 rows with the date 20120301 and 10000 rows with the date 20120501. I am using right range partitioning. Using the DMVs (see my previous blog post for a query that gives you information about the partitions), let’s take a look at the two partitions:

viewpartitions1

Test #1: Split a partition with data on both sides of the new boundary point

In this test we split with a new boundary point 20120401, which means that we will have data on both sides of the new boundary point.

DECLARE @xact_id BIGINT

BEGIN TRAN

-- This is the actual split
ALTER PARTITION FUNCTION CreditPF()
SPLIT RANGE (20120401)
-- End of split

-- Get the xact ID from our current explicit transaction
SELECT @xact_id = transaction_id
FROM sys.dm_tran_current_transaction

COMMIT TRAN

-- Get the entries in the transaction log related to the
-- above transaction doing a split
SELECT [Current LSN], [Operation], [AllocUnitName], [Context]
     , [Transaction ID] , [Transaction Name], [Xact ID]
FROM ::fn_dblog(NULL,NULL)
WHERE [Transaction ID] = (
     SELECT TOP 1 [Transaction ID]
     FROM ::fn_dblog(NULL,NULL)
     WHERE [Xact ID] = @xact_id)
GO

This results in the following output, where we can see that there are 20258 entries in the transaction log.

partitionsplit-test1

If we take a further look, we can see that most of the entries are LOB_INSERT_ROWS and LOB_DELETE_ROWS, which tells us that 10000 rows are inserted into the new partition and deleted the old partition.

partitionsplit-test1-page2

After this split, the partitions now look like this:

partitionsplit-test1-partitions

Test #2: Split a partition with data on the left side of the new boundary point

In this test we split with a new boundary point 20120601, which means that we will have data on the left side of the new boundary point, and the right side will be empty.

DECLARE @xact_id BIGINT

BEGIN TRAN

ALTER PARTITION FUNCTION CreditPF()
SPLIT RANGE (20120601)

SELECT @xact_id = transaction_id
FROM sys.dm_tran_current_transaction

COMMIT TRAN

SELECT [Current LSN], [Operation], [AllocUnitName], [Context]
     , [Transaction ID] , [Transaction Name], [Xact ID]
FROM ::fn_dblog(NULL,NULL)
WHERE [Transaction ID] = (
     SELECT TOP 1 [Transaction ID]
     FROM ::fn_dblog(NULL,NULL)
     WHERE [Xact ID] = @xact_id)
GO

This results in the following output, where we can see that there are only 19 entries in the transaction log.

partitionsplit-test2

Despite we have data in the partition that we split, it is a metadata only operation. Quite a different case than our previous test. In this case, we are using right range partitioning, and if the data is only on the left side of the new boundary point there is no data movement. This makes sense, as the new partition being created is empty.

partitionsplit-test2-partitions

Test #3: Split an empty partition

In this test we split with a new boundary point 20120801, which means that we split the last partition which is empty.

DECLARE @xact_id BIGINT

BEGIN TRAN

ALTER PARTITION FUNCTION CreditPF()
SPLIT RANGE (20120801)

SELECT @xact_id = transaction_id
FROM sys.dm_tran_current_transaction

COMMIT TRAN

SELECT [Current LSN], [Operation], [AllocUnitName], [Context]
     , [Transaction ID] , [Transaction Name], [Xact ID]
FROM ::fn_dblog(NULL,NULL)
WHERE [Transaction ID] = (
     SELECT TOP 1 [Transaction ID]
     FROM ::fn_dblog(NULL,NULL)
     WHERE [Xact ID] = @xact_id)
GO

This results in the following output, where we can see that there are only 19 entries in the transaction log.

partitionsplit-test3

This result is expected, as it is an empty partition we are splitting. Let’s take a look at our partitions.

partitionsplit-test3-partitions

Test #4: Split a partition with data on the left side of the new boundary point

In this test we split with a new boundary point 20120201, which means that we will have data on the right side of the new boundary point, and the left side will be empty.

DECLARE @xact_id BIGINT

BEGIN TRAN

ALTER PARTITION FUNCTION CreditPF()
SPLIT RANGE (20120201)

SELECT @xact_id = transaction_id
FROM sys.dm_tran_current_transaction

COMMIT TRAN

SELECT [Current LSN], [Operation], [AllocUnitName], [Context]
     , [Transaction ID] , [Transaction Name], [Xact ID]
FROM ::fn_dblog(NULL,NULL)
WHERE [Transaction ID] = (
     SELECT TOP 1 [Transaction ID]
     FROM ::fn_dblog(NULL,NULL)
     WHERE [Xact ID] = @xact_id)
GO

This results in the following output, where we can see that there are 20307 entries in the transaction log.

partitionsplit-test4

If we again take a further look, we can see that most of the entries are LOB_INSERT_ROWS and LOB_DELETE_ROWS, which suggests that all rows are inserted into the new partition and deleted the old partition.

partitionsplit-test4-page2

If we take a look at our partitions, we can see that the old partition 2 is now empty (although 1 page is still allocated for it) and the new partition 3 contains all the rows. So in the case where the data is on the right side of the new boundary point, and the left side is empty, all rows are moved.

partitionsplit-test4-partitions

Left range partitioning

If we had used left range partitioning instead of right range, the results would have been opposite the ones shown above. In this case, there would be data movement if there would be data only on the left side of the new boundary point, and it would be a metadata operation only if there would be data only on the right side of the new boundary point.

Conclusion

While the recommendation in books online is to only split empty partitions, it is possible to split a partition with data and still get a metadata operation only. However, this is undocumented, and could possible change. I have seen designs where the last (or first) partition was not kept empty, and as of SQL Server 2012 that seems safe. As long as we know that there only will be data on the left side (of the last partition, in right range partitioning) or the right side (of the first partition, in left range partitioning) of the new boundary point for the split.

When does SQL Server partition split move data?