Thursday, October 27, 2011

Listing all columns in a SQL Server database that have empty strings in them using PowerShell

I needed to find all the columns in a database that have empty strings in them so that we could track down the source of this (these?) data. This proved to be fairly straightforward using PowerShell, but is still something that I will jot down here to save time in case I need to do the same again in the future.

To run this one-liner, open SQL Server Management Studio and right click on the database you want to run it on, then choose Start PowerShell.

Modify the following command and run it in the PowerShell window you just opened. Note that it may be possible to write a simpler command than this, but this one is simple and it works.

dir | % { dir ($_.displayname + '\columns') } | ? { $_.datatype -match 'varchar' } | select parent, datatype, name | % { invoke-sqlcmd -suppressprovidercontextwarning ("select '" + $_.parent + "', '" + $_.name + "', count(*) from " + $_.parent + " where `"" + $_.name + "`" = ''")  } | ? { $_[2] -ne 0 }

Monday, October 24, 2011

x64 assembly sample program

To get a better understanding of how the new Microsoft x64 calling convention works, I thought it would be a good idea to reacquaint myself with assembly language, this time using ml64.exe. While I do not write assembly language in my day-to-day job, I still find it surprisingly useful to be able to read and decipher assembly language. Understanding calling conventions and stack usage are a critical part of being able to debug assembly language.

A key takeaway for me from this process is that the x64 convention passes the first four arguments in volatile registers, therefore making it difficult to review the parameters as passed to a function because they will not be stored in memory. Note that space is allocated on the stack for the parameters that are passed via register, but the memory is generally left uninitialised in release-mode builds. The Advanced Windows Debugging and Troubleshooting blog has a good run-down of the difficulties this poses.

Below is a sample program I wrote, along with the command line to compile it:

extrn ExitProcess: PROC 
extrn MessageBoxA: PROC
extrn WriteConsoleA: PROC
extrn AllocConsole: PROC
extrn GetStdHandle: PROC

.data
caption db 'x64 Assembly Program', 0
message db 'Hello World!', 0

.code
main proc
  sub rsp, (5 + 1 + 1) * 8 ;5 parameters + return + stdhandle + [written length]; 

  call AllocConsole
  test rax, rax
  jz error

  mov rcx, 0FFFFFFF5h
  call GetStdHandle ;eax = handle

  mov rcx, rax
  lea rdx, message
  mov r8, 12 ; length of string
  lea r9, [rsp + 6 * 8]
  mov qword ptr [rsp + 4 * 8], 0

  call WriteConsoleA

  call DisplayMessage

  add rsp, 56
  ret

error:
  mov rcx, -1
  call ExitProcess
main endp

DisplayMessage proc
  sub rsp, 28h
  mov   rcx, 0 
  lea   rdx, message
  lea   r8, caption
  mov   r9d, 0
  call  MessageBoxA
  add   rsp, 28h
  ret
DisplayMessage endp
end

This is the command line that can be used to compile the code above:

ml64.exe /Zi Prog.asm /link /subsystem:windows /defaultlib:kernel32.lib /defaultlib:user32.lib /entry:main

Note that, in my installation of Visual Studio 2010, ml64.exe is in the C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\x86_amd64 folder. Make sure that you use the x64 version of the Visual Studio command prompt, otherwise the link process will silently fail.

Sunday, October 16, 2011

Backing up a Blogger blog

There seems to have been a flurry of activity around 2008 with a lot of tools being written to back up Blogger blogs, but not much since then. After a bit of searching, I found jblogbackup, which seems to be a reliable, command-line based solution. To get this to work, I needed to do the following:

  1. Download the Java RTE and add the bin folder to my path environment variable.
  2. Download jblogbackup.
  3. Set the Feedburner feed for my blog to download more than 25 entries by clicking on "Edit Feed Details…" and setting Original Feed to http://nzbart.blogspot.com/feeds/posts/default?max-results=999. Hopefully 999 entries will last a while.
  4. Run the tool:
    java -jar jblogbackup-onejar-1.1.jar -b BloggerBackup -c -imgthreads 4 -n nzbart

Wednesday, October 12, 2011

Converting a short (8.3) file name into a long file name in PowerShell

It took me a while to find, but converting a short DOS style (8.3) Windows file name into a long file name in PowerShell is easy. For example, my system temp folder was configured with my user folder as a short name, so I used the following to retrieve the long name:

(get-item $env:temp).FullName

Wednesday, October 5, 2011

Demonstrating the effect of CPU pipelining on performance

When writing high performance applications, you may need to understand some of the fundamentals of CPU architecture, such as CPU pipelining. Often it can be most useful to see the effect of pipelining in a simple example.

The full source code for this example is below, but the most important part is the assembly code and results.

The test uses the Windows high performance timer to see how long it takes to execute a function written in Intel x64 assembly.

The simple loop below sets the rax register to a large number (in this case 2.33 billion, since I am running on a 2.33 GHz processor), and repeatedly decreases it until it hits zero. My console application then printed out the number of milliseconds this took to execute. On my machine, it took roughly 1000 ms to execute, even though it is executing a decrement instruction followed by a jump.
.CODE             

runAssemblyCode PROC
  mov rax, 2330 * 1000 * 1000
 start:
  dec rax
  jnz start
  ret 
runAssemblyCode ENDP 
END

To complicate things even more, here I perform more, independent, operations in the same loop. Instead of 1 decrement, I now do 5 on various different registers. The total time taken is only about 2000 milliseconds.

.CODE             

runAssemblyCode PROC
  mov rax, 2330 * 1000 * 1000
 start:
  dec rcx
  dec rdx
  dec r9
  dec r10
  dec rax
  jnz start
  ret 
runAssemblyCode ENDP 
END

As to the finer details of Intel CPU pipelining, I could not explain how to calculate the expected execution time for these examples, but they do demonstrate quite well that a sequence of carefully crafted instructions can execute faster than the clock speed of your CPU would have you believe.

To run this application yourself, simply create a Visual Studio 2010 (or newer) C++ project and create two files - one .ASM file with the assembly above, and one .CPP file with the C++ code below. You will also need to enable the MASM build customisation by right clicking on the project and clicking Build Customisations.

#include <Windows.h>
#include <memory>
#include <iostream>

using namespace std;

extern "C" void runAssemblyCode();

class Timer
{
public:
    Timer()
    {
        QueryPerformanceFrequency(&_ticksPerSecond);
        Reset();
    }

    void Reset()
    {
        QueryPerformanceCounter(&_startedAt);
    }

    long long GetElapsedMilliseconds()
    {
        LARGE_INTEGER now;
        QueryPerformanceCounter(&now);
        return (now.QuadPart - _startedAt.QuadPart) * 1000 / _ticksPerSecond.QuadPart;
    }

private:
    LARGE_INTEGER _startedAt;
    LARGE_INTEGER _ticksPerSecond;
};

int wmain(int argc, wchar_t* argv[])
{
    Timer timer;

    runAssemblyCode();

    auto elapsed = timer.GetElapsedMilliseconds();
    cout << elapsed << endl;

    return 0;
}

Tuesday, October 4, 2011

Problem debugging stored procedures / T-SQL on (local) databases

If you run into the following error while trying to debug T-SQL in SQL Server Management Studio, the problem is that the DNS name (local) cannot be resolved:

Unable to start T-SQL Debugging. Could not connect to computer '(local)'. The requested name is valid, but no data of the requested type was found.

The solution is to connect to localhost rather than (local).

Monday, October 3, 2011

Comparing TFS source branch with destination branch

When a TFS branch is not "enhanced" with visualisations, it can be tricky to determine the version of the source branch.

I recently wanted to find all the changes that had been made in a branch, including some tweaks that had been made when the branch action was initially checked in. The most reliable way to do this is to compare the source branch at the revision that the branch was made with the latest code in the branch. To do this, I wanted to use the following command line:

tf folderdiff "$/SourceBranch;RevisionThatBranchWasTakenFrom" $/DestinationBranch

However, I needed a way to determine RevisionThatBranchWasTakenFrom. Since branching in TFS is essentially copying (the metadata of) each file in the source branch to the destination, there is no single concept of the 'revision' of the source branch at the point when the branch was made (in contrast to Subversion). The best I could come up with was to find the most recent changeset in the source branch before the branch was made. To do this, I used the tf merges command on the destination branch, which lists all the files that have been merged into the destination branch. I have made several assumptions, including that there was a single branch into the destination and no subsequent merges:

tf merges $/SourceBranch /r /showall /format:detailed | % {$_ -match '(\d+) ->' | out-null; $Matches[1]} | sort -Descending | select -first 1

This PowerShell one-liner will return the highest revision number of all the branched files, assuming that nothing untoward happens and it can parse the output of tf merges. In our case, I knew that all the changesets in question had four digits, so the string based sort would work fine.

Now that I had the most recent changeset, I could run the command to get the changes:
tf folderdiff "$/SourceBranch;C12345" $/DestinationBranch