Tuesday 30 April 2019

Finding databases with biggest tables in SQL Server

I created a T-SQL script today for use with SQL Server that will list the tables that have most rows for SQL Server, able to loop through all databases.

DECLARE @currentDB AS VARCHAR(100)
DECLARE @mrsdb_sqlscript AS NVARCHAR(500)
DECLARE db_cursor CURSOR FOR 
SELECT name from sys.databases where state_desc <> 'offline' and name not in ('master', 'tempdb', 'model', 'msdb') order by name 
DROP TABLE IF EXISTS #largestTables 
CREATE TABLE #largestTables (
 DatabaseName VARCHAR(100),
 SchemaName VARCHAR(200), 
 TableName VARCHAR(200), 
 TotalRowCount INT 
)

OPEN db_cursor 
FETCH NEXT from db_cursor into @currentDB 

WHILE @@FETCH_STATUS = 0 
BEGIN
 SET @mrsdb_sqlscript = N'INSERT INTO #largestTables 
  SELECT ''' + @currentDB + ''' , SCHEMA_NAME(schema_id) AS [SchemaName],
 [Tables].name AS [TableName],
 SUM([Partitions].[rows]) AS [TotalRowCount]
 FROM ' + @currentDB + '.sys.tables AS [Tables]
 JOIN ' + @currentDB + '.sys.partitions AS [Partitions]
 ON [Tables].[object_id] = [Partitions].[object_id]
 AND [Partitions].index_id IN ( 0, 1 )
 GROUP BY SCHEMA_NAME(schema_id), [Tables].name';

 PRINT 'Looking for largest table in the following DB: ' + @currentDB 
 exec sp_executesql @mrsdb_sqlscript 

 FETCH NEXT FROM db_cursor into @currentDB
END

CLOSE db_cursor 
DEALLOCATE db_cursor 

SELECT TOP 100 * FROM #largestTables 
ORDER BY TotalRowCount DESC, Schemaname ASC


The SQL script above will use a temporary table variable and database cursor and loop through all databases, executing a SQL script that will insert data into a temporary table variable and after the cursor has looped through its dataset, the result in the temporary table variable is presented to the DBA monitoring. Use it to spot where you have much data in your databases of the data base server you are working with!

Friday 29 March 2019

Quickly search your Git log using wildcards and shell function

Imagine you got a Git repository with a long history - equivalent to log - and you want find a changeset with a comment containing a word or some words and you want to get the commit SHA to quickly cherry pick that commit to your target branch. This is easier using Azure Devops or some tool, but how to quickly search the log. First we define a search log shell function inside our git config file .gitconfig :

searchlog = "!f() { git --no-pager log --color-words --all --decorate --graph -i --grep \"$1\";  }; f"

Now we can search the log quickly using a wildcard:
git searchlog "visning.*av.*nåtidslinje.*"
We separate our search words with the .* regex syntax and we can search our git log for multiple keywords quickly and get a nice presentation of commits with commit messages matching our log. And now we have a quick way to find our Git commits through the log and the command line to overcome our needly in the haystack scenario.

Monday 25 March 2019

Importing datetime columns from Excel into SPSS

I have been working with SPSS and Excel import the latest days. The following Python script written in SPSS will import your Excel datetime columns properly.

* Encoding: UTF-8.
begin program.
import spss
print "Variable count: " + str(spss.GetVariableCount())
previousVar=""
for ind in range(spss.GetVariableCount()): #Loop through variable indices
    if (ind < 0): 
     previousVar = spss.GetVariableName(ind-1) #get variable name of previous column
    varNam = spss.GetVariableName(ind) #Look up each variable name
    
    recognizedDateColumns = ('date', 'lastupdate')
    if any(s in varNam.lower() for s in recognizedDateColumns): 
     print "Variable contains Dato: " + varNam
     adjustedVariable = varNam + "_Justert"
     spss.Submit("COMPUTE " + adjustedVariable + " = DATE.MDY(1,1,1900) +( (" + varNam  + " - 2) * 24 * 60 * 60).")
     spss.Submit("FORMATS " + adjustedVariable + "(DATETIME22).")
     spss.Submit("RENAME VARIABLES (" + varNam + "=" + adjustedVariable + ") (" + adjustedVariable + "= " + varNam + ").")
     spss.Submit("ADD FILES FILE = * /KEEP=PasientGUID to " + previousVar + " " + varNam + " ALL.")
     spss.Submit("EXECUTE.")
     spss.Submit("DELETE VARIABLES " + adjustedVariable + ".")
     spss.Submit("EXECUTE.")   
end program.






Note that in my example I use the convention that a date column in my dataset contains 'date' or 'lastupdate'.