The slide deck for my talk at SQLSaturday #280 Vienna about SQL Server Integration Services (SSIS) internals and performance.

I have often found that I needed a quick script for giving me CREATE TABLE statements for each table in a source system database, so I could create these tables in my extract schema in my Data Warehouse. This can obviously be done using SQL Server Management Studio, but I found that I would then manually have to edit each statement to remove keys, indexes, identity columns, and everything else that I usually do not want in my extract tables.

Having not being able to find one, I wrote my own, and though I’d share it here. The following script writes a CREATE TABLE statement for each table in a SQL Server source database, so they can be used for creating extract tables in a data warehouse.

DECLARE @ExtractSchema NVARCHAR(100) = 'Extract'

SELECT 'CREATE TABLE [' + @ExtractSchema + '].['
      + SCHEMA_NAME (t.schema_id)
      + '_' + t.name + '] (' + LEFT(cols .column_list, LEN(cols .column_list) - 1) + ')'
FROM sys.tables t
CROSS APPLY (SELECT '[' + c .name + '] ' + UPPER(ty .name)
                    + CASE WHEN ty.name IN ('decimal', 'numeric') THEN
                              '(' + CONVERT(NVARCHAR(10), c.precision)
                              + ', '
                              + CONVERT (NVARCHAR(10), c.scale)
                              + ')'
                      WHEN ty.name = 'float' THEN
                         '(' + CONVERT(NVARCHAR(10), c.precision ) + ')'
                      WHEN ty.name IN ('datetime2', 'datetimeoffset', 'time') THEN
                         '(' + CONVERT(NVARCHAR(10), c.scale) + ')'
                      WHEN ty.name IN ('char', 'nchar', 'varchar', 'nvarchar') THEN
                         '(' + CASE WHEN c.max_length = - 1
                                   THEN 'MAX'
                                   ELSE CAST (c.max_length AS NVARCHAR(10))
                              END + ')'
                      ELSE ''
                      END
                    + CASE WHEN c.is_nullable = 0 THEN ' NOT'
                      ELSE ''
                      END
                    + ' NULL, '
               FROM sys.columns c
               INNER JOIN sys.types ty
               ON c.user_type_id = ty.user_type_id
               WHERE t.object_id = c.object_id
               ORDER BY c.column_id
               FOR XML PATH('')
      ) cols (column_list)

One of the many benefits when using table partitioning in SQL Server is the metadata only operations, such as switch, split, and merge. However, if we are not careful, splitting partition can potential take a long time if data end up being moved between two partitions.

In this blog post we will take a look at when SQL Server partition split will be a metadata only operation, and when data will be moved between partitions.

According to Books Online, we should avoid splitting or merging populated partitions:

Always keep empty partitions at both ends of the partition range to guarantee that the partition split (before loading new data) and partition merge (after unloading old data) do not incur any data movement. Avoid splitting or merging populated partitions. This can be extremely inefficient, as this may cause as much as four times more log generation, and may also cause severe locking.

But let’s take a look at what is actually happening when we split a partition. By using fn_dblog to read from the transaction log, we can see whether we have data movement or not when we split a partition.

To demonstrate this, I have created a partitioned table with two partitions; one empty and one with 20000 rows. The 20000 rows are split between 10000 rows with the date 20120301 and 10000 rows with the date 20120501. I am using right range partitioning. Using the DMVs (see my previous blog post for a query that gives you information about the partitions), let’s take a look at the two partitions:

viewpartitions1

Test #1: Split a partition with data on both sides of the new boundary point

In this test we split with a new boundary point 20120401, which means that we will have data on both sides of the new boundary point.

DECLARE @xact_id BIGINT

BEGIN TRAN

-- This is the actual split
ALTER PARTITION FUNCTION CreditPF()
SPLIT RANGE (20120401)
-- End of split

-- Get the xact ID from our current explicit transaction
SELECT @xact_id = transaction_id
FROM sys.dm_tran_current_transaction

COMMIT TRAN

-- Get the entries in the transaction log related to the
-- above transaction doing a split
SELECT [Current LSN], [Operation], [AllocUnitName], [Context]
     , [Transaction ID] , [Transaction Name], [Xact ID]
FROM ::fn_dblog(NULL,NULL)
WHERE [Transaction ID] = (
     SELECT TOP 1 [Transaction ID]
     FROM ::fn_dblog(NULL,NULL)
     WHERE [Xact ID] = @xact_id)
GO

This results in the following output, where we can see that there are 20258 entries in the transaction log.

partitionsplit-test1

If we take a further look, we can see that most of the entries are LOB_INSERT_ROWS and LOB_DELETE_ROWS, which tells us that 10000 rows are inserted into the new partition and deleted the old partition.

partitionsplit-test1-page2

After this split, the partitions now look like this:

partitionsplit-test1-partitions

Test #2: Split a partition with data on the left side of the new boundary point

In this test we split with a new boundary point 20120601, which means that we will have data on the left side of the new boundary point, and the right side will be empty.

DECLARE @xact_id BIGINT

BEGIN TRAN

ALTER PARTITION FUNCTION CreditPF()
SPLIT RANGE (20120601)

SELECT @xact_id = transaction_id
FROM sys.dm_tran_current_transaction

COMMIT TRAN

SELECT [Current LSN], [Operation], [AllocUnitName], [Context]
     , [Transaction ID] , [Transaction Name], [Xact ID]
FROM ::fn_dblog(NULL,NULL)
WHERE [Transaction ID] = (
     SELECT TOP 1 [Transaction ID]
     FROM ::fn_dblog(NULL,NULL)
     WHERE [Xact ID] = @xact_id)
GO

This results in the following output, where we can see that there are only 19 entries in the transaction log.

partitionsplit-test2

Despite we have data in the partition that we split, it is a metadata only operation. Quite a different case than our previous test. In this case, we are using right range partitioning, and if the data is only on the left side of the new boundary point there is no data movement. This makes sense, as the new partition being created is empty.

partitionsplit-test2-partitions

Test #3: Split an empty partition

In this test we split with a new boundary point 20120801, which means that we split the last partition which is empty.

DECLARE @xact_id BIGINT

BEGIN TRAN

ALTER PARTITION FUNCTION CreditPF()
SPLIT RANGE (20120801)

SELECT @xact_id = transaction_id
FROM sys.dm_tran_current_transaction

COMMIT TRAN

SELECT [Current LSN], [Operation], [AllocUnitName], [Context]
     , [Transaction ID] , [Transaction Name], [Xact ID]
FROM ::fn_dblog(NULL,NULL)
WHERE [Transaction ID] = (
     SELECT TOP 1 [Transaction ID]
     FROM ::fn_dblog(NULL,NULL)
     WHERE [Xact ID] = @xact_id)
GO

This results in the following output, where we can see that there are only 19 entries in the transaction log.

partitionsplit-test3

This result is expected, as it is an empty partition we are splitting. Let’s take a look at our partitions.

partitionsplit-test3-partitions

Test #4: Split a partition with data on the left side of the new boundary point

In this test we split with a new boundary point 20120201, which means that we will have data on the right side of the new boundary point, and the left side will be empty.

DECLARE @xact_id BIGINT

BEGIN TRAN

ALTER PARTITION FUNCTION CreditPF()
SPLIT RANGE (20120201)

SELECT @xact_id = transaction_id
FROM sys.dm_tran_current_transaction

COMMIT TRAN

SELECT [Current LSN], [Operation], [AllocUnitName], [Context]
     , [Transaction ID] , [Transaction Name], [Xact ID]
FROM ::fn_dblog(NULL,NULL)
WHERE [Transaction ID] = (
     SELECT TOP 1 [Transaction ID]
     FROM ::fn_dblog(NULL,NULL)
     WHERE [Xact ID] = @xact_id)
GO

This results in the following output, where we can see that there are 20307 entries in the transaction log.

partitionsplit-test4

If we again take a further look, we can see that most of the entries are LOB_INSERT_ROWS and LOB_DELETE_ROWS, which suggests that all rows are inserted into the new partition and deleted the old partition.

partitionsplit-test4-page2

If we take a look at our partitions, we can see that the old partition 2 is now empty (although 1 page is still allocated for it) and the new partition 3 contains all the rows. So in the case where the data is on the right side of the new boundary point, and the left side is empty, all rows are moved.

partitionsplit-test4-partitions

Left range partitioning

If we had used left range partitioning instead of right range, the results would have been opposite the ones shown above. In this case, there would be data movement if there would be data only on the left side of the new boundary point, and it would be a metadata operation only if there would be data only on the right side of the new boundary point.

Conclusion

While the recommendation in books online is to only split empty partitions, it is possible to split a partition with data and still get a metadata operation only. However, this is undocumented, and could possible change. I have seen designs where the last (or first) partition was not kept empty, and as of SQL Server 2012 that seems safe. As long as we know that there only will be data on the left side (of the last partition, in right range partitioning) or the right side (of the first partition, in left range partitioning) of the new boundary point for the split.

Your SSIS packages are taking longer than they should, but you are not sure what to do about it. This session gave 10 tips and tricks for gaining better performance. Not only 10 reasons to why your packages are running slow, but also 10 ways to do something about it. Find out when to use T-SQL instead of SSIS transformations, why you should care about data types, what a blocking transformation is, why buffers are important, and much more.

This talk were given at SQLSaturday #228 Cambridge, SQLSaturday #229 Dublin, SQLRally Nordic 2013 and SQLRally Amsterdam 2013.

 

You may have heard about the new in-memory OLTP database engine component (formerly known as Hekaton) in SQL Server 2014, but would like to see it in action. I gave this demo-packed talk at Microsoft Campus Days 2013 in Copenhagen, where we took a first look at the in-memory OLTP capability, where mission critical data can be accessed and processed more effectively by declaring tables as memory-optimized. In this session, we looked at not only how In-Memory OLTP memory-optimized tables are stored and how SQL Server is working with the data, but also how natively compiled procedures can be used to improve performance further. Also we looked at the limitations of using In-Memory OLTP.

This talk was first given at Microsoft Campus Days 2013 on 9 October 2013.

Demo scripts can be downloaded here. Video is available on Channel 9.

Update: an updated slide deck for SQL Server 2014 CTP2 can be found here.

Manage very large data sets

Loading and managing large data sets in your Data Warehouse is not always a trivial task. Especially if the source systems can re-deliver data and you want to replace an old subset of your data. The quick answer you always hear is “just use partitioning”, but nobody really tells you how you should design or implement it. This talk shows one way to load and manage your very large data sets, by using layered partitioning. We will walk through partitioned tables and partitioned views, before moving on to the concept of layered partitioning.

The talk was presented at SQLBits XI.

Video is available at the SQLBits website.

Here is a few links to articles, white papers, and best practices on partitioning on Microsoft SQL Server. I have these at the end of my slide deck on layered partitioning, which uses a combination of partitioned tables and a partitioned view, and thought that I would put them up here as well.

This query is quite useful when looking at which partitions are in the database. It is originally taken from the book Microsoft SQL Server 2008 Internals (p. 438), and then I have tweaked it a bit and added some more information.

I used this query in my demo today at SQLSaturday #152 in Bulgaria, where I talked about layered partitioning.

DECLARE @TableName NVARCHAR(200) = N'Facts.Credit'

SELECT SCHEMA_NAME(o.schema_id) + '.' + OBJECT_NAME(i.object_id) AS [object]
     , p.partition_number AS [p#]
     , fg.name AS [filegroup]
     , p.rows
     , au.total_pages AS pages
     , CASE boundary_value_on_right
       WHEN 1 THEN 'less than'
       ELSE 'less than or equal to' END as comparison
     , rv.value
     , CONVERT (VARCHAR(6), CONVERT (INT, SUBSTRING (au.first_page, 6, 1) +
       SUBSTRING (au.first_page, 5, 1))) + ':' + CONVERT (VARCHAR(20),
       CONVERT (INT, SUBSTRING (au.first_page, 4, 1) +
       SUBSTRING (au.first_page, 3, 1) + SUBSTRING (au.first_page, 2, 1) +
       SUBSTRING (au.first_page, 1, 1))) AS first_page
FROM sys.partitions p
INNER JOIN sys.indexes i
     ON p.object_id = i.object_id
AND p.index_id = i.index_id
INNER JOIN sys.objects o
     ON p.object_id = o.object_id
INNER JOIN sys.system_internals_allocation_units au
     ON p.partition_id = au.container_id
INNER JOIN sys.partition_schemes ps
     ON ps.data_space_id = i.data_space_id
INNER JOIN sys.partition_functions f
     ON f.function_id = ps.function_id
INNER JOIN sys.destination_data_spaces dds
     ON dds.partition_scheme_id = ps.data_space_id
     AND dds.destination_id = p.partition_number
INNER JOIN sys.filegroups fg
     ON dds.data_space_id = fg.data_space_id
LEFT OUTER JOIN sys.partition_range_values rv
     ON f.function_id = rv.function_id
     AND p.partition_number = rv.boundary_id
WHERE i.index_id < 2
     AND o.object_id = OBJECT_ID(@TableName);

The result is something like the following.

View partitions in SQL Server

The query shows:

  • object – The table name.
  • p# – The partition number.
  • filegroup – The filegroup the partition is located on. Note that in this result, I am using the same filegroup for all partitions, but usually you would probably not do this.
  • rows – Number of rows in the partition.
  • comparison – shows “less than” if you are using right range partitioning or “less than or equal to” if you are using left range partitioning.
  • value – The boundary point between the two partitions. This value is either in the right or left partition of the boundary point, depending on whether or not you are using right range or left range partitioning.
  • first_page – The first file:page allocated for the partition.