Integration Services (SSIS) for the DBA

Below you will find slides, demos, and references for my PASS Summit 2014 talk on “Integration Services (SSIS) for the DBA”.

Slides

Demos

Demos can be found at github.

Further reading / references

Bulk load: methods for better data warehouse load performance

Slides and demo scripts are now available for my talk at SQLSaturday #337 Oregon today.

Abstract

When loading a data warehouse you want the data inserted into the tables as fast as possible. You know you have to use bulk loading, but what do you need to do to ensure a minimal footprint on the transaction log? In this session, you will learn about minimal logged operations. You will also learn about the different methods for bulk loading data into your data warehouse; using SSIS, BCP and T-SQL.

Slides

Demos

Demos can be found at github.

SQLRally Nordic in Copenhagen – Full day training on SSIS internals and performance

SQLRally 2015On 2 March 2015, I am giving a full day pre-conference training session on SSIS internals and performance at SQLRally Nordic in Copenhagen

You have worked with SQL Server Integration Services (SSIS) for a few years, but more often than not, you see performance problems with your packages. Some packages are running slow, while others are taking up more memory and CPU than before.

The first part of this full-day session will cover the internals of SSIS. Starting with an overview, we will move into the details of the control flow engine and the data flow engine. At the end of the internals section, you will have a deep understanding of the strengths and weaknesses of SSIS and what is the best way to design your packages.

The second part will cover the architecture of the SSIS Server, which came with SQL Server 2012. We will dive into the SSIS catalog and the host process used to execute the packages.

The third part will cover how we can use the knowledge about the internals of SSIS to gain better performance. This part will show practical examples of common performance problems, and how these can be solved. Furthermore we will look at how designing and tuning your data flows can increase performance significantly.

Join me for a day full of fun with SSIS. Super-early bird price is ending on 27 October 2014. Register before your colleague.

Regional and language settings for Oracle client used with SQL Server linked server

Today, I ran into a problem while migrating a Data Warehouse solution to another server. The solution have linked servers for extracting data from an Oracle source system.

The query would go something like this:

INSERT INTO dbo.MyTable
SELECT *
FROM OPENQUERY(UNDPFSP1ICC, 'SELECT
	TO_CHAR(COL1_DT, ''YYYY'') AS Col1Year
	, COL2
	, COL3
	, SUM(AMOUNT) AS Amount
	, SYSDATE
FROM MYTABLE
GROUP BY
	TO_CHAR(COL1_DT, ''YYYY'')
	, COL2
	, COL3
')

Everything had been migrated, but while running the extract job, which ran queries like the above, we would get errors like these:

Conversion failed when converting the nvarchar value '123456,78' to data type int.

Error converting data type nvarchar to numeric.

The problem was that instead of using a dot (.) as the decimal symbol, a comma (,) was being returned. This would make the data convert to nvarchar, and an implicit conversion to a NUMERIC or INT would fail.

Everything was running fine on the old server. The collation of the server and databases were identical (Latin1_General_CI_AS), but the regional and language settings were different. On the old server they were set to English (United States) and on the new server to Danish. Changing the settings on the new server to English (United States) did not help.

After troubleshooting a bit on #sqlhelp with Amit Banerjee (twitter | blog) and William Durkin (twitter | blog), as well as via Skype with Jørgen Guldmann (twitter | blog), Jørgen ended up pointing me in the right direction.

The problem as that the NLS_LANG parameter was set to Danish (where we use comma (,) as decimal symbol). It is stored in the registry under the following subkey:

HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOMEID\NLS_LANG

Where ID is the unique number identifying the Oracle home.

The trick was to change this subkey from:

DANISH_DENMARK.WE8MSWIN1252

To:

AMERICAN_AMERICA.WE8MSWIN1252

And then everything worked fine, and the decimal symbol was now a dot (.) instead of a comma (,).

SQLSaturday Johannesburg & Cape Town pre-con: Understanding SSIS internals and performance tuning

I am very happy and super excited to announce that I will be doing a full day pre-con the day before SQLSaturday #327 Johannesburg on 29 August, as well as SQLSaturday #301 Cape Town on 5 September 2014. Having lived in South Africa, I have had the pleasure to visit Johannesburg, Cape Town and Stellenbosch on a number of occasions. I must say, I can’t wait to go back!

Two years ago, I started speaking at SQL Server conferences, when Jody Roberts (twitter) pinged me on Twitter and asked if I didn’t want to give a talk at SQLSaturday #134 Johannesburg. I gave my first talk on Layered Partitioning, and it was such an awesome experience that I decided to go all in; which has resulted in three big conferences (and quite a few smaller ones) this year.

So, this year I am going back to South Africa where it all started to give a full day pre-con on Understanding SSIS internals and performance tuning.

Abstract:

You have worked with SQL Server Integration Services (SSIS) for a few years, but more often than not, you see performance problems with your packages. Some packages are running slow, while others are taking up more memory and CPU than before. But why are they slow or taking up a lot of resources? What is actually going on behind the scene? Join this full-day session to find out.

The first part will cover the internals of SSIS. Starting with an overview, we will move into the details of the control flow engine and the data flow engine. At the end of the internals section, you will have a deep understanding of the strengths and weaknesses of SSIS and what is the best way to design your packages.

The second part will cover how we can use the knowledge about the internals of SSIS to gain better performance. This part will show practical examples of common performance problems, and how these can be solved. Furthermore we will look at how designing and tuning your data flows can increase performance significantly.

If you are in Johannesburg or Cape Town (or feel like visiting the beautiful South Africa), I would be happy to see you at my pre-con.

Sign up for a full day of understanding SSIS internals and performance tuning in Johannesburg on 28 August.

Sign up for a full day of understanding SSIS internals and performance tuning in Cape Town on 5 September.

Speaking at SQLBits, SQL Server Days, and PASS Summit 2014

PASS Summit 2014 Speaking

I am very honored and happy to announce that I will be speaking at three major SQL Server conferences this year.

Update: I will also be giving my SSIS internals and performance talk at SQLSaturday #310 Dublin, and am looking forward to battling beating Bob Duffy (Blog | Twitter) in Giant Jenga and a 5 minute “call my SQL bluff” lightening talk.

MSBuild error (and fix) with Visual Studio 2013 SQL Server tooling and SQL Server 2014

To show the possibilities with continuous integration with SQL Server development, I decided to spin up a virtual machine with Visual Studio 2013, Team Foundation Server 2013, and SQL Server 2014. While setting it up, I ran into an error that I thought I’d share here.

If you want to know more about how to set up continuous integration for SQL Server development, Jamie Thomson (blog | twitter) wrote a very good blog post about continuous deployment of SSDT database projects to Windows Azure using Team Foundation Service. This approach can also be used for on premise build servers using Team Foundation Server, which is what I was looking for.

With Visual Studio 2013 you can no longer download SQL Server Data Tool separately, as SQL Server tooling is included. You will need to use either one paid versions of Visual Studio or one of the free express editions.

If you wish to deploy to SQL Server 2014, you also need to update the SQL Server tooling from within Visual Studio.

While I was trying to set up a build server, I ran into this error:

C:\Builds\3\MyProject\AdventureWorks Test Server Build\Sources\Main\AdventureWorks\AdventureWorks\AdventureWorks.sqlproj (63): The imported project "C:\Program Files (x86)\MSBuild\Microsoft\VisualStudio\v11.0\SSDT\ Microsoft.Data.Tools.Schema.SqlTasks.targets" was not found. Confirm that the path in the <Import> declaration is correct, and that the file exists on disk.

In my .sqlproj file, the Import declaration showed:

<Import Project="$(MSBuildExtensionsPath)\Microsoft\VisualStudio\ v$(VisualStudioVersion)\SSDT\Microsoft.Data.Tools.Schema.SqlTasks.targets" />

The C:\Program Files (x86)\MSBuild\Microsoft\VisualStudio\v11.0\SSDT\ path did not exist, but the C:\Program Files (x86)\MSBuild\Microsoft\VisualStudio\v12.0\SSDT\ did.

It looked like an old MSBuild version was being used, so the solution I found was to override the variable in the build definition to use the newest version.

This is what I did:

  1. Find build definition under Builds in the Team Explorer
  2. Click Edit build definition
  3. Click Process
  4. Click Advanced
  5. Add /p:VisualStudioVersion=12.0 under MSBuild arguments

My build arguments ended up looking like this, when adding the build, publish and SqlPublishProfilePath arguments:

/t:Build /t:Publish /p:SqlPublishProfilePath=AdventureWorks.Test.publish.xml /p:VisualStudioVersion=12.0

After that, the build worked just fine.

Create Data Warehouse extract tables from SQL Server source system

I have often found that I needed a quick script for giving me CREATE TABLE statements for each table in a source system database, so I could create these tables in my extract schema in my Data Warehouse. This can obviously be done using SQL Server Management Studio, but I found that I would then manually have to edit each statement to remove keys, indexes, identity columns, and everything else that I usually do not want in my extract tables.

Having not being able to find one, I wrote my own, and though I’d share it here. The following script writes a CREATE TABLE statement for each table in a SQL Server source database, so they can be used for creating extract tables in a data warehouse.

DECLARE @ExtractSchema NVARCHAR(100) = 'Extract'

SELECT 'CREATE TABLE [' + @ExtractSchema + '].['
      + SCHEMA_NAME (t.schema_id)
      + '_' + t.name + '] (' + LEFT(cols .column_list, LEN(cols .column_list) - 1) + ')'
FROM sys.tables t
CROSS APPLY (SELECT '[' + c .name + '] ' + UPPER(ty .name)
                    + CASE WHEN ty.name IN ('decimal', 'numeric') THEN
                              '(' + CONVERT(NVARCHAR(10), c.precision)
                              + ', '
                              + CONVERT (NVARCHAR(10), c.scale)
                              + ')'
                      WHEN ty.name = 'float' THEN
                         '(' + CONVERT(NVARCHAR(10), c.precision ) + ')'
                      WHEN ty.name IN ('datetime2', 'datetimeoffset', 'time') THEN
                         '(' + CONVERT(NVARCHAR(10), c.scale) + ')'
                      WHEN ty.name IN ('char', 'nchar', 'varchar', 'nvarchar') THEN
                         '(' + CASE WHEN c.max_length = - 1
                                   THEN 'MAX'
                                   ELSE CAST (c.max_length AS NVARCHAR(10))
                              END + ')'
                      ELSE ''
                      END
                    + CASE WHEN c.is_nullable = 0 THEN ' NOT'
                      ELSE ''
                      END
                    + ' NULL, '
               FROM sys.columns c
               INNER JOIN sys.types ty
               ON c.user_type_id = ty.user_type_id
               WHERE t.object_id = c.object_id
               ORDER BY c.column_id
               FOR XML PATH('')
      ) cols (column_list)