Monday, November 26, 2007

New set of SSIS Tasks

These may have been around for a while, but I just discovered that there is a nice set of (affordable) SSIS tasks at CozyRoc.  There is a task for zipping files, managing partitions, SFTP and even using another dataflow as a source.  Here is the current info from their site:


Key Features

Beta! Package SSIS Connection Manager

  • Gain access to the current or different package at runtime.
  • Support for all current types of package storage: SQL, File, SSIS Package Store.
  • Support for package variables initialization before execution.
  • Easy to use interface.

Beta! Data Flow Source SSIS Data Flow Component

  • Based on Package Connection Manager.
  • Very fast, non-blocking, in-memory data read from another data flow.
  • Scalable alternative of the solution Microsoft is recommending with raw data file source.
  • User interface is based on the standard data flow source component dialog.

Beta! Data Flow Destination SSIS Data Flow Component

  • Exposes data from a data flow for in-memory read in another data flow.
  • Scalable alternative of the solution Microsoft is recommending with raw data file destination.
  • User interface is based on the standard data flow destination component dialog.

Database Partitions SSIS Control Flow Task

  • Saves you time by handling the intricacies of the partitioning and letting you concentrate on your workflow instead.
  • Create, insert and remove database partitions in Microsoft SQL Server 2005.
  • Combine smaller partitions into a larger one, for easy backup later.
  • Support for ADO.NET, OLEDB, ADO connection managers.
  • Easy to use interface.

Script Task Plus SSIS Control Flow Task

  • Extension of Microsoft SSIS Script Task, meaning you already have the skills to use it.
  • Greater reusability and easier maintenance of script functionality.
  • Customizable script user interface and setup.
  • SSIS Script export, import or linking to.
  • Keep your script code separate from your package and have easier time tracking changes in your source control system.
  • Requires SQL Server 2005 SP2.
  • New! Share your exported SSIS scripts with the community.

Zip SSIS Control Flow Task

  • Compression and decompression of Zip, GZip, BZip2, Unix, Tar archives.
  • Automatically switches to Zip64 format, if it is required.
  • Streamlined user interface.

SSH SSIS Connection Manager

  • Secure connections with SSH-enabled server.
  • Authenticate with password or private key file.
  • Support for Tunnel (HTTP), SOCKS4, SOCKS5 proxy.
  • Streamlined user interface.

SFTP SSIS Control Flow Task

  • Based on SSH Connection Manager.
  • Secure-FTP (SFTP) communications and management.
  • Support for SFTP commands send, receive and delete files, create and remove directory, rename file.
  • The user interface covers most of the functionality of the standard Microsoft SSIS FTP task and it is very similar.

SSH Execute SSIS Control Flow Task

  • Based on SSH Connection Manager.
  • Secure execution of shell commands on a remote SSH server.
  • Streamlined user interface.

Wednesday, October 17, 2007

More Interesting SSIS Components

I knew about the Konesans Trash Destination, but it turns out that they have a whole series of free, publicly available SSIS components.  The main product page is here, and below I have pasted in content for each of the transforms, with the link to the download page:

Checksum Transformation

 The Checksum Transformation computes a hash value, the checksum, across one or more columns, returning the result in the Checksum output column. The transformation provides functionality similar to the T-SQL CHECKSUM function, but is encapsulated within SQL Server Integration Services, for use within the pipeline without code or a SQL Server connection. Checksums can reduce network contention and increase process performance by allowing you to verify data through a single value rather than transferring all data values for comparison.

Data Generator Source

This source component literally generates data. Specify how many columns you want, and how many rows, then watch the data flow out. Build demonstration and research scenarios faster with this simple source.

File Watcher Task

The File Watcher Task does what it says really, it watches a folder waiting for files. When an available file is found the task completes, returning the name of the file for onward use within the package.

Regular Expression Transformation

The Regular Expression Transformation exposes the power of regular expression matching within the pipeline. One or more columns can be selected, and for each column an individual expression can be applied. If all columns selected pass their tests then rows are passed down the successful match output. Rows that fail to pass all tests are directed down the alternate output.

Row Count Plus Transformation

The Row Count Plus Transformation can replace the stock transformation. We have recreated and extended with more functionality and a user friendly interface for faster and easier package design.

Row Number Transformation

The Row Number Transformation calculates a row number for each row. It offers ROW_NUMBER or IDENTITY like behaviour within the Data Flow. Uses include surrogate key generation or data partitioning within the pipeline.

Trash Destination

The Trash Destination Adapter is a development aid. It allows you to quickly terminate a data flow path, and does not require any configuration. It will consume the rows without any side effects, and prevents warnings or errors you may otherwise receive when executing the data flow.

Tuesday, October 16, 2007

Additional SSIS Transforms that didn't ship with the product

There are a whole series of SSIS transforms that Microsoft has made available for download.  They have differing degrees of usefulness, and have been available for a while but I am putting them out here for future reference.  Off the top of my head, it looks like Regex would be useful, UnPackDecimal if you are deal with mainframe files, calendar transform (as I recall it has a dependency on a DLL from SSAS though).


Regex Transform
Regex is an SSIS dataflow component that applies a configured regular expression against an incoming column, matching, extracting, or splitting, as configured by the user.

ConfigureUnDouble Transform
ConfigureUnDouble takes a text column and removes bracketing quotes if present, plus replaces double quotes inside the text with sinqle quotes. Configure the quote character by, on the first tab of the advanced editor, setting the "Qualifier" custom property to the desired value.

RTrimPlus Transform
RTrimPlus takes a string or unicode column and removes trailing spaces, whether ASCII or Japanese.

UnpackDecimal Transform
UnpackDecimal takes an input column formatted in packed decimal (comp-3) and generates the corresponding Decimal value.

UnDoubleOut Transform
UnDoubleOut is an SSIS dataflow component that removes qualifiers from quoted text, either in place or via the creation of a new output column.

CodePageConvert Transform
CodePageConvert is an SSIS dataflow component that translates from and to any code page or unicode character representations.

SeeBuffer Transform
SeeBuffer is an SSIS dataflow component that sits in a data flow and is provided a look at each buffer that is presented to it.

UnDouble Transform
UnDouble takes a text column and removes bracketing quotes if present, plus replaces double quotes inside the text with sinqle quotes.

NullDetector Transform
NullDetector is an SSIS dataflow component that sits astride a data flow, and, depending on whether the value of a user-indicated column is null or not, routes rows to one or the other of its outputs.

Calendar Transform
CalendarTransform is an SSIS dataflow transform component that generates standard calendar attributes

Wednesday, September 26, 2007

Microsoft "Bulldog" (MDM)

Details are just coming to light regarding Microsoft's entry into the world of Master Data Management through their acquisition of Atlanta-based Stratature several months ago.  The product will be released to coincide with the next release of Office (Office 13), and will install as part of SharePoint.  According to the Bulldog product roadmap, the dev team is working with many Microsoft products such as PPS, Dynamics, SharePoint, Excel and more to integrate the master data hub with those products.  The stated goal is to have Bulldog remain almost identical to the Stratature +EDM product, with several enhancements aimed at improving the experience.

There is now a full product page at Microsoft, and Kirk Haseldon has updated his blog with some of the new details.  From his blog, it sounds like the planning has turned to development, and we should start to see something tangible in February 2008 for TAP program members and other early adopters.

One exciting aspect of the product is hierarchy management within an enterprise.  Taken directly from the product roadmap, here are some of the hierarchy scenario's that Bulldog will handle:

Master data contains a wide variety of hierarchies. The roll‐up hierarchies for accounts in a chart of accounts, the reporting hierarchy for employees, the organizational structure of a customer’s business, and categories of items in an item master are typical examples of hierarchies. These hierarchies are often hard coded into business systems or defined in a number of Excel spreadsheets. Hierarchies can be defined by attribute relationships: for example, city belongs to state and state belongs to region. These derived hierarchy relationships change whenever an attribute value changes. Hierarchies can also be defined as parent‐child relationships that may have an arbitrary depth along any path (ragged). The Stratature solution allows both types of hierarchies to be leveraged into a new derived hierarchy. The advantage of reuse is that it ensures that there is only one definition of the relationship at any point in time.

There is also extensive support for Business rules, workflow, flexible data models, etc.

Microsoft Performance Point Server 2007 finally RTM's

After years of hard work and a ProClarity acquisition later, Performance Point Server 2007 has finally been released.  More details here on the product home page.  Here are links to trial downloads and SDK.

PerformancePoint Server 2007 Evaluation Version (x86)

PerformancePoint Server 2007 Evaluation Version (x64)

Deployment Guide, Operations Guide, Whitepapers

Planning Server, Business rules development guide and Monitoring SDK

Thursday, September 20, 2007

Slick New MDX Editor/Debugger

This is a very nice looking tool for anyone that works with MDX.  It is currently in "CTP 1, version 0.1 Alpha" release, which is another way to say "Very early in development cycle".  It works with SSAS in all of its various versions from 7.0 to 2008.  Details and download here.

Thursday, August 16, 2007

Documenting a SQL Server 2005 Solution end to end

I had heard of this tool before, and it looks interesting if you ever need to document a full SQL Server 2005 solution.  It apparently documents SSIS, SQL and SSAS as well as SSRS.  The output shows data lineage, as well as the structure of your database.  Worth a look, pretty affordable too.  Too bad they couldn't make a version that worked as an add-on to Data Dude.  Here is the link to their homepage.

Monday, August 13, 2007

Load Testing SQL Server Analysis Services 2005

Have you ever had a need to load test SSAS and then report on the results?  Your life just got easier.  There is now a project on CodePlex for that very purpose.  Apparently MCS and other large internal projects have used it for a while, but it is now publicly available as a community project.  Download it here.

Taking the pulse of SQL Server

If you have ever been involved in a big SQL Server project, you might have wanted some easy way to figure out how things are performing on the server.  Fortunately, many other people have wanted to do the same.  There are several ways available now (and more to come in SQL 08, specifically for SSAS I believe).  You can use the SQL Server Health and History Tool (SQL H2) which tells you all kinds of interesting things about the internals of SQL and run reports against it.

The other new and cool thing that has come out is an actual data mart that is built on top of the man Dynamic Management Views (DMV) inside SQL Server.  The DMV's are what drive the reports inside SSMS, and you can use them to find out all kinds of things.  Now you no longer have to run them manually.  Thanks to the SQL Server Customer Advisory Best Practices Team, now you can just install SQL DMVStats Data Warehouse on your server and your good to go.

Here are the main features as listed on CodePlex:

• DMV data collection
• DMV data warehouse repository
• Analysis and reporting.

VSTE for DB Pros (Data Dude) Power Tools released

If you are using VSTE for DB Pros, you will probably want to pick up the newly released Power Tools (didn't they used to call these sorts of things "Power Toys"?).  It adds a lot to the product in the way of convenience with the dependency viewer (you don't have to pretend to rename something to see dependencies now), additional refactoring abilities, enhanced data generation and T-SQL code analysis among other things.  Download it here.

Read more about it at Gert Drapers' blog, including detailed explanations and screen shots.

Here's the list of new features, copied from Gert's blog:

  • Dependency Viewer
  • Refactoring
    • Move Schema
    • Expand Wildcard
    • Fully Quality Name
    • Refactor in to strongly typed DataSet definitions
    • Refactor Command Generator
  • Data Generation
    • Sequential Data Bound Generator
    • Editors for the Data Bound Generator, Sequential Data Bound Generator and RegEx String Generator to make configuration easier
    • The RegEx editor also tries to interpret your CHECK CONSTRAINTs and create a matching RegEx expression that you can use to generate data values that match the constraint definition
    • The RegEx editor can also be used for interactively defining and testing RegEx expressions and evaluate the output visually, which makes it a lot easier to create the right RegEx expression for your value domain.
  • MSBuild Tasks
    • SqlSchemaCompareTask; allows you to compare schemas between two database from the command line using MSBuild.
    • SqlDataCompareTask; allows you to compare the content of tables within two databases from the command line using MSBuild.
  • T-SQL Static Code Analysis
  •  Miscellaneous tools
    • SQL script pre-processor command-line utility, which will expand all SQLCMD includes and variable definitions (sqlspp.exe)
  • Schema Manager API

SR1 for Visual Studio 2005 Team Edition for Database Professionals (DataDude)

This news is about a month old, but the RTM version of SR1 for VSTE for DB Pros has been been released and is available for download.  One of the most important things that it adds is the ability to have "database references", which is a 3 or 4 part reference to a database other than the one that you are currently working with.  Much-needed functionality.

Wednesday, August 8, 2007

SSIS Pivot Transform is harder than it needs to be

Don't get me wrong - I am glad that they included the pivot transform, and I recognize that the dev team likely ran out of time to implement any sort of UI, but Wow!  That thing is way more difficult to use than it needs to be.

For instance, and I don't doubt that it is somewhat user error on my part, but in my first attempt to get it working today with 3 columns (UserID, AttributeName, AttributeValue), the transform coughed, choked and wouldn't start.  It just gave me errors.  Due to lack of time, I had to set it aside for now.  I appreciate its usefulness, but it is painful to use, and due to the limited info on the Internet about it (there are roughly 2 useful articles/blog posts), I am guessing that a lot of others just work around it as well.  A shame, since the Unpivot transform has a UI and everything.

In summary - I am not saying it is broken, I am not saying it is not useful.  Just that it not only is difficult to use, but that documentation and samples are lacking as well.  Here are the few useful posts I found:

Ashvini Sharma's blog


Database Journal

UPDATE: My friend Matt pointed me to a blog entry that he wrote while at our old company that is a still up in which he gives some tips on the pivot transform.

Tuesday, August 7, 2007

Useful information about Active Directory

Anyone who is new to Active Directory will likely have a hard time finding a single comprehensive source of information on the topic.  I think the problem is almost that there is so much information, that it is difficult to find the simplest things.  Take, for example, the User Properties dialog.  What if you simply wanted to tie the fields on this screen to the LDAP attribute behind each?  There is nothing out there (that is easy to find) in the Microsoft documentation that spells this out.  Fortunately R.L. Mueller put this very document (and others) into and Excel spreadsheet and RoudyBob posted it to his blog.  Now the rest of can benefit.  Here's the link to the blog post.

Here are the direct links to the various spreadsheets:

Spreadsheet of User Properties in Active Directory Users & Computers MMC
Spreadsheet of all Active Directory attributes
Spreadsheet of User Object Property Methods
Spreadsheet of attributes exposed by the WinNT provider

Monday, August 6, 2007

Interesting AJAX enabled SSAS client

I was checking my webmail today, and one of the ad-links caught my eye.  It was for an AJAX based SSAS client.  I clicked the link and it took me to IntraSight, by a company called Active Interface.  Screenshots are available at that link.  There are a lot of interesting SSAS web based interfaces floating around out there if you look hard enough.  Add this one to the list.  SharePoint integration is listed as one of the bullet points.  Pricing is pretty affordable: $1995/server.  It might be worth a look if you need a web-based OLAP tool.

Here is their feature list:

  • IE browser-based SSAS client
  • High-fidelity AJAX user interaction
  • Interactive table and chart views
  • Annotated public and private bookmarks
  • View and edit generated MDX
  • Sharepoint integration
  • Excel, Word, and PowerPoint generation
  • Publish static HTML site from bookmarks
  • Web tier caching
  • Search dimensions
  • Filter member sets
  • Sort and filter measures
  • Specialized business-centric and industry-specific views

Thursday, August 2, 2007

Referencing a Custom .NET DLL from within a SQL Server 2005 SSIS script task

It is not difficult to reference a custom .NET DLL from within an SSIS script, but there are a couple of things that you will need to do:

  1. Sign your DLL
  2. Put it in the GAC (copy it to c:\windows\assembly)
  3. Put a second copy in the C:\Windows\Microsoft.NET\Framework\v2.0.xxxx folder

Once you have done all of these things, then it will show up in the list of available libraries when you attempt to add a reference to your script task.  I found this out the hard way, so hopefully this will help at least one other person.

More details are available here.

Tuesday, July 31, 2007

Creating SQL 2005 Data Warehouse Metadata

One of my favorite tools to use when designing a data warehouse is the Dimensional Modeling Spreadsheet created by Joy Mundy and Warren Thornthwaite of the Kimball Group in conjunction with their must-have book, The Microsoft Data Warehouse Toolkit.  Not only does it allow you to fully build out your fact/dimension design, ETL logic, source-to-target mapping and more, it also generates a script that will create the first iteration of your data model once you have everything in place.  I always try to fully flesh out this spreadsheet before I create the first SQL table, it really helps get things off to a good start.  As an added bonus, Joy and Warren have a whole site full of useful tools and utilities for building a Data Warehouse.

Mark Garner has just released Beta 1 of a tool called Metashare.  He appears to have based it on the Kimball Group spreadsheet I mentioned above, and pulls the same data back out of your data model for the purposes of documentation.  I will look forward to checking out Metashare with the next DW that I build.

Microsoft SQL Server BI in Formula 1 - Case Study

I love F1 racing. It the perfect showcase for technology, and I posted a while ago that Microsoft had been selected, along with McLaren to provide BI for F1. It looks like the first case study is now available as of a week ago. Here is a white paper and a video for you to look over.

According to the white paper, each Formula One car will typically generate 1–2 gigabytes of data in a race for 100 different sensors on the car. Each team runs two cars, and there will be 17 races next season. As far as I am concerned, this is one of the coolest applications of BI that I have heard of yet. Additionally, this is all being analyzed in real-time. I hope that we get some more case studies, but I'm not counting on anything.

Monday, July 30, 2007

How to configure SQL Server Integration Services (SSIS) 2005 for Load Balancing

I should begin this post by mentioning that currently SQL Server 2005 Integration Services does not support Network Load Balancing (NLB) in the traditional sense.  You can't just string together a series of SSIS servers, point them at a list of jobs and expect them to chew through them dependent upon available processing time and bandwidth.

Instead Microsoft recommends a "manual procedure" to perform SSIS load balancing.  I am posting this for my own reference in the future, but it might come in handy to others searching for the same information.  Here is a link to the info on MSDN.

SSIS Project Icon Broken in Vista?

I just opened an SSIS project for the first time in Vista using SQL SP2 and noticed that my SSIS project used the same icon as my SSAS project. I opened an older project just to make sure, deleted, recreated, re-added, etc. and still incorrect icon. Not a big deal, but still an annoying little glitch. . .

Blog III

This is my third blog.  My first blog is at the official ASP.NET blog site, and I haven't posted there in almost a year.  Also, I don't do as much straight .NET coding as I did years ago, so my posts wouldn't be very relevant in that forum.

My second blog was at a company that I worked for.  I no longer work there, and they seem to have taken all of the blogs down as the company goes through a downsizing period.

This represents my third blog.  I am planning to post interesting things related to the work that I currently do (mostly Data Warehousing and BI, and mostly Microsoft) to things that I think are cool.  I am using Windows Live Writer for posting, and it is a slick (free) tool.  I used Beta 1, and Beta 2 seems improved.  I am just waiting on them to include native image uploading for Blogger, but will just use Picasa until then.