Josh's Stuff: Data Warehousing

Showing posts with label Data Warehousing. Show all posts

Monday, April 30, 2012

CSVed – The World’s Best (free) CSV Editor/Viewer

If you work with data, you won’t be able to avoid delimited files for very long. Sometimes you have to open these files. If you don’t have Excel installed, you might end up having to use trusty old Notepad, which is not ideal from either a formatting or large file size perspective. Excel is better, but no matter how you try to gloss it over, Excel is not designed as a delimited file viewer.

I have found that the right tool for the job in this case is a program that I have been using for years now called CSVed. The newest version updates the UI, but there are tons of tools packed into this program and it can handle almost any file I have ever thrown at it, including 1GB+ files and delimited files that have bad data. There is a unicode version of the program available as well should you need it.

Here is a link to the development homepage, you can download the latest version from there.

Complete SQL 2012 Demo Build Script

There is no need to download the 28GB BI Image X (which at the time of this writing hasn’t been updated to SQL 2012 RTM). There are instructions on how to build an all-up SQL 2012 and SharePoint 2010 demo VM with instructions for configuring all of the latest SQL 2012 tools including DQS, MDM, PowerPivot for SharePoint and more. The full instructions are here. This is a great way to familiarize yourself with a full deployment of the latest BI software from Microsoft.

Here’s the summary from the link above:

This how-to article will guide you through the process of building a Hyper-V virtual machine with Microsoft SQL Server 2012, the latest Visual Studio 2010 development tools, SQL Server Data Tools (SSDT), and an assortment of other software products to create a test environment ready for testing and product demos.

Index of all SQL Server build numbers since 7.0 Beta

Stumbled across a nice SQL Server build number site a while back that has a comprehensive (I think?) list of every SQL Server build from the SQL 7 beta 3 up through SQL 2012 RC1 including every CU and hotfix along the way. There is also a link to download the KB that corresponds to each build if appropriate. If you happen to know of a build that isn’t listed, it looks like this site is somewhat crowd-sourced, so post the details in the comments and it will likely make it up to the main page.

(If you look in the comments, you can even find little “anonymous” nuggets, such as the following: “SQL2012 RTM is 11.0.2100.19”)

Monday, March 17, 2008

Programmatically executing SSIS packages

For a good summary of many different ways to run an SSIS package programmatically, as well as advantages and drawbacks of each, look here at Michael Entin's blog. There are some good things in the comments as well. This will come in handy at some point, I am sure.

Wednesday, March 5, 2008

Upgrading Existing SSIS Packages to TableDifference Component 2.0

I am an avid user of Alberto Ferrari's SQL Server Integration Services "Table Diff" component, which you can download from http://www.sqlbi.eu. He recently released the new, improved version 2.0 of the component (the .dll still says that it is 1.0 though?) and I am upgrading a mid-size solution that makes heavy use of the prior version of TableDiff (1.x) to use the new component. Here are my observations of both the upgrade and the new capabilities of the component.

Here is a typical Type 1 style update package (in this case an import table, not a dimension) that uses TableDifference at its core to compare the new and old data flows and direct output accordingly. Behind the scenes, I overwrote the TableDifference.dll file with the new one in both the .NET Assembly Cache and the \PipelineComponents directory. When I opened the package, everything is working except the Update and Insert components have red X's. These are easily fixed by simply double-clicking on each to open properties, accepting the new automatically matched column references and saving the package. That's it, there is nothing else required to update your pacakages from TableDifference 1.x to the newly released 2.0.

Wednesday, February 27, 2008

The Data Warehouse Lifecycle Toolkit 2nd Edition is now available

The first one is a great reference, but was published in 1998. Since then the art and science of building the data warehouse has matured significantly. I don't think that the word BI really existed in 1998. . . Actually I don't know since I didn't start working in the DW/BI world until 1999 with Brio, then Business Objects.

The original had a lot of great material for organizing the pro cess around preparing for, implementing and maintaining the data warehouse, including a nice set of documents. Here is a link to the new book on Amazon.

Wednesday, October 17, 2007

More Interesting SSIS Components

I knew about the Konesans Trash Destination, but it turns out that they have a whole series of free, publicly available SSIS components. The main product page is here, and below I have pasted in content for each of the transforms, with the link to the download page:

Checksum Transformation

The Checksum Transformation computes a hash value, the checksum, across one or more columns, returning the result in the Checksum output column. The transformation provides functionality similar to the T-SQL CHECKSUM function, but is encapsulated within SQL Server Integration Services, for use within the pipeline without code or a SQL Server connection. Checksums can reduce network contention and increase process performance by allowing you to verify data through a single value rather than transferring all data values for comparison.

Data Generator Source

This source component literally generates data. Specify how many columns you want, and how many rows, then watch the data flow out. Build demonstration and research scenarios faster with this simple source.

File Watcher Task

The File Watcher Task does what it says really, it watches a folder waiting for files. When an available file is found the task completes, returning the name of the file for onward use within the package.

Regular Expression Transformation

The Regular Expression Transformation exposes the power of regular expression matching within the pipeline. One or more columns can be selected, and for each column an individual expression can be applied. If all columns selected pass their tests then rows are passed down the successful match output. Rows that fail to pass all tests are directed down the alternate output.

Row Count Plus Transformation

The Row Count Plus Transformation can replace the stock transformation. We have recreated and extended with more functionality and a user friendly interface for faster and easier package design.

Row Number Transformation

The Row Number Transformation calculates a row number for each row. It offers ROW_NUMBER or IDENTITY like behaviour within the Data Flow. Uses include surrogate key generation or data partitioning within the pipeline.

Trash Destination

The Trash Destination Adapter is a development aid. It allows you to quickly terminate a data flow path, and does not require any configuration. It will consume the rows without any side effects, and prevents warnings or errors you may otherwise receive when executing the data flow.

Thursday, August 16, 2007

Documenting a SQL Server 2005 Solution end to end

I had heard of this tool before, and it looks interesting if you ever need to document a full SQL Server 2005 solution. It apparently documents SSIS, SQL and SSAS as well as SSRS. The output shows data lineage, as well as the structure of your database. Worth a look, pretty affordable too. Too bad they couldn't make a version that worked as an add-on to Data Dude. Here is the link to their homepage.

Monday, August 13, 2007

Load Testing SQL Server Analysis Services 2005

Have you ever had a need to load test SSAS and then report on the results? Your life just got easier. There is now a project on CodePlex for that very purpose. Apparently MCS and other large internal projects have used it for a while, but it is now publicly available as a community project. Download it here.

Taking the pulse of SQL Server

If you have ever been involved in a big SQL Server project, you might have wanted some easy way to figure out how things are performing on the server. Fortunately, many other people have wanted to do the same. There are several ways available now (and more to come in SQL 08, specifically for SSAS I believe). You can use the SQL Server Health and History Tool (SQL H2) which tells you all kinds of interesting things about the internals of SQL and run reports against it.

The other new and cool thing that has come out is an actual data mart that is built on top of the man Dynamic Management Views (DMV) inside SQL Server. The DMV's are what drive the reports inside SSMS, and you can use them to find out all kinds of things. Now you no longer have to run them manually. Thanks to the SQL Server Customer Advisory Best Practices Team, now you can just install SQL DMVStats Data Warehouse on your server and your good to go.

Here are the main features as listed on CodePlex:

• DMV data collection
• DMV data warehouse repository
• Analysis and reporting.

VSTE for DB Pros (Data Dude) Power Tools released

If you are using VSTE for DB Pros, you will probably want to pick up the newly released Power Tools (didn't they used to call these sorts of things "Power Toys"?). It adds a lot to the product in the way of convenience with the dependency viewer (you don't have to pretend to rename something to see dependencies now), additional refactoring abilities, enhanced data generation and T-SQL code analysis among other things. Download it here.

Read more about it at Gert Drapers' blog, including detailed explanations and screen shots.

Here's the list of new features, copied from Gert's blog:

Dependency Viewer
Refactoring
- Move Schema
- Expand Wildcard
- Fully Quality Name
- Refactor in to strongly typed DataSet definitions
- Refactor Command Generator
Data Generation
- Sequential Data Bound Generator
- Editors for the Data Bound Generator, Sequential Data Bound Generator and RegEx String Generator to make configuration easier
- The RegEx editor also tries to interpret your CHECK CONSTRAINTs and create a matching RegEx expression that you can use to generate data values that match the constraint definition
- The RegEx editor can also be used for interactively defining and testing RegEx expressions and evaluate the output visually, which makes it a lot easier to create the right RegEx expression for your value domain.
MSBuild Tasks
- SqlSchemaCompareTask; allows you to compare schemas between two database from the command line using MSBuild.
- SqlDataCompareTask; allows you to compare the content of tables within two databases from the command line using MSBuild.
T-SQL Static Code Analysis
Miscellaneous tools
- SQL script pre-processor command-line utility, which will expand all SQLCMD includes and variable definitions (sqlspp.exe)
Schema Manager API

SR1 for Visual Studio 2005 Team Edition for Database Professionals (DataDude)

This news is about a month old, but the RTM version of SR1 for VSTE for DB Pros has been been released and is available for download. One of the most important things that it adds is the ability to have "database references", which is a 3 or 4 part reference to a database other than the one that you are currently working with. Much-needed functionality.

Tuesday, July 31, 2007

Creating SQL 2005 Data Warehouse Metadata

One of my favorite tools to use when designing a data warehouse is the Dimensional Modeling Spreadsheet created by Joy Mundy and Warren Thornthwaite of the Kimball Group in conjunction with their must-have book, The Microsoft Data Warehouse Toolkit. Not only does it allow you to fully build out your fact/dimension design, ETL logic, source-to-target mapping and more, it also generates a script that will create the first iteration of your data model once you have everything in place. I always try to fully flesh out this spreadsheet before I create the first SQL table, it really helps get things off to a good start. As an added bonus, Joy and Warren have a whole site full of useful tools and utilities for building a Data Warehouse.

Mark Garner has just released Beta 1 of a tool called Metashare. He appears to have based it on the Kimball Group spreadsheet I mentioned above, and pulls the same data back out of your data model for the purposes of documentation. I will look forward to checking out Metashare with the next DW that I build.