Thursday 28 March 2013

Using Powershell to download multiple files

Powershell


Powershell is a shell scripting language with command line access, based on .NET, which  gives
the user and the developer access to an abundance of system resources, plus intra- and internetworked resources. In this article, an example of how Powershell can be used to download multiple files from the internet, and save these files to a local file folder, will be given.

Powershell is as previously mentioned based upon .NET. This gives Powershell many capabilities as a command line shell and scripting language. Syntaxwise, Powershell in many respects are similar to Perl or PHP, but there are also simliaries to other .NET Languages. At the same time, this is a command line shell script. It's role is to inherit the role that the Command Line Prompt (CMD.exe) and MS-DOS previously covered. Instead of the .BAT files of CMD and  MS-DOS, scripts are written in Powershell scripts, with the extension .PS1. There are more advanced topics to cover, such as modules of Powershell, but this article will cover some basic use of Powershell.

I have used the Windows Powershell ISE, which is the Integrated Script Environment.
The term ISE is similar to IDE, which stands for Integrated Development Environment. Powershell ISE is included in Windows 8, which I have used. Windows 7 users can download the Windows Management Framework 3.0 package, which is available at the following URL:

http://www.microsoft.com/en-us/download/details.aspx?id=34595


The above package is available also for Windows 2008 Server R2 and Windows 2012 Server.

Downloading files with Powershell

Let's first review the script which downloads some files (in my example I create an Object array of Url addresses which points to Flickr thumbnail images of mine).

            
$flickrImages = 'http://farm9.static.flickr.com/8370/8595533065_4f19f05869_m.jpg',             
'http://farm9.static.flickr.com/8239/8566909448_cac6a5ea75_m.jpg',             
'http://farm9.static.flickr.com/8375/8566909032_70866e8ce7_m.jpg',             
'http://farm9.static.flickr.com/8520/8566908388_2d0bcdc572_m.jpg',             
'http://farm9.static.flickr.com/8049/8131277333_83586d9afc_m.jpg',             
'http://farm9.static.flickr.com/8291/7605276718_a7e8c2d7d8_m.jpg',             
'http://farm6.static.flickr.com/5321/7407130682_c4c26ca2d9_m.jpg',             
'http://farm8.static.flickr.com/7089/7379569808_8e0311918c_m.jpg',             
'http://farm8.static.flickr.com/7088/7378798094_ac05bedf72_m.jpg',             
'http://farm8.static.flickr.com/7245/7217153808_f4e5125b4e_m.jpg',             
'http://farm8.static.flickr.com/7243/7217105052_6b1fc818f8_m.jpg',             
'http://farm9.static.flickr.com/8143/7191404308_47bb667be6_m.jpg',             
'http://farm8.static.flickr.com/7089/7149219081_bfe587bd34_m.jpg',             
'http://farm8.static.flickr.com/7260/7145533071_c7241d0064_m.jpg',             
'http://farm8.static.flickr.com/7048/6999442724_e399e63e68_m.jpg',             
'http://farm9.static.flickr.com/8158/6986866246_d04ec09342_m.jpg',             
'http://farm8.static.flickr.com/7124/7057325243_db81f4c65a_m.jpg',             
'http://farm8.static.flickr.com/7089/6911179760_24654b32fd_m.jpg',             
'http://farm8.static.flickr.com/7268/6910639934_31cd36a854_m.jpg',             
'http://farm8.static.flickr.com/7274/7052136425_6be954f436_m.jpg',             
'http://farm8.static.flickr.com/7037/7052113599_2a249aa763_m.jpg',             
'http://farm8.static.flickr.com/7072/6906010348_1fd6513b31_m.jpg',             
'http://farm8.static.flickr.com/7089/6902820096_95a7ec11c4_m.jpg',             
'http://farm8.static.flickr.com/7280/7048911257_89ae08a75e_m.jpg',             
'http://farm6.static.flickr.com/5467/7048911105_7370eff5ef_m.jpg',            
'http://farm6.static.flickr.com/5031/6902819694_763fc65fe0_m.jpg',             
'http://farm6.static.flickr.com/5234/6902819562_0f78bc56f4_m.jpg',             
'http://farm8.static.flickr.com/7279/7048910545_f02faeda37_m.jpg',             
'http://farm8.static.flickr.com/7080/7047789289_c93ff1deac_m.jpg',             
'http://farm8.static.flickr.com/7220/7047688059_216d67e9d3_m.jpg',             
'http://farm8.static.flickr.com/7092/6901520442_637c138c1f_m.jpg'            
            
$targetDir = 'C:\users\Tore Aurstad\Documents\Powershell Scripts\WebClient\'            
            
            
function DownloadFile([Object[]] $sourceFiles,[string]$targetDirectory) {            
 $wc = New-Object System.Net.WebClient            
             
 foreach ($sourceFile in $sourceFiles){            
  $sourceFileName = $sourceFile.SubString($sourceFile.LastIndexOf('/')+1)            
  $targetFileName = $targetDirectory + $sourceFileName            
  $wc.DownloadFile($sourceFile, $targetFileName)            
  Write-Host "Downloaded $sourceFile to file location $targetFileName"             
 }            
            
}            
            
DownloadFile $flickrImages $targetDir            

Many .NET developers are unfamiliar with Powershell scripting language syntax. Variables are declared by prefixing the variable name with the $-sign. I declare an Object Array, which is basically a comma separated list. This syntax will be familiar to Perl Developers. Also note the absence of semi-colons. Semi-colons are not obligatory in Powershell, and these are therefore omitted.

Further on, I declare a variable for where to put the downloaded files. If you want to test out this script yourself, you have to change this path, obviously.
Next on, I declare a function in Powershell for downloading the files. This function takes in an Object array and a string for the target Directory or folder.

The function is called without using commas or parentheses. In Powershell every function argument is passed in using spaces. This is where the Powershell syntax feels a bit different to other Microsoft Languages. If you use commas when calling the function in this example, the first argument will receive an Object Array, while the second argument will be empty. Be aware of this Powershell gotcha - use spaces when calling a function with multiple arguments.

Let's investigate the function a bit closer. As you can see, although Powershell allows mutable types (which means that a variable containing for example a string can be redefined to an integer), prefixing the arguments with square brackets and a type, such as [Object[]] or [string], makes your Powershell scripts more type-safe. Powershell obviously in many cases feels like Javascript for example, where also a variable can be changed into another type (redefined). If you want to have more type safety, prefix your function arguments with the strongly type you expect being passed into the Powershell function. The variable $wc is instantiated into a new System.Net.WebClient instance, using the New-Object cmdlet (Command let) in Powershell. To perform the download of each file in the passed in Object Array, I use a foreach loop.

This is where Powershell shows itself as a convenient scripting language. Iterating through a collection using foreach loop makes coding much easier, note though that Perl already implemented this functionality some years ago. Inside the foreach loop, the call to the function or method $wc.DownloadFile downloads the file from the Internet using the instantiated WebClient. I am not sure if .BAT files could achieve this, but this shows how much functionality is available to Powershell users and Developers. Powershell is well suited for system maintenance and administration, plus actually are larger part of target uses than one might think of a shell script language and command line tool.

The fact that much of .NET is available means that .NET Developers in some cases must rethink their use of the relationship between compiled programs and scripts, now that Powershell is so available. The added convenience and flexibility of Powershell, plus functionality makes Powershell a contender among the more mature .NET Programming Languages such as VB and C#. If you implemement something in a Programming language, think of doing the same in a script. Sometimes going for a script based solution will be better, more flexible and optimal.

For developers using .NET, the use of SubString and LastIndexOf shows that many .NET methods and types are available for us in Powershell.

If you wonder how I formatted the Powershell code above, I downloaded the PowerShellPack utility from here:

Powershell pack

After you have downloaded PowershellPack and installed it, follow the following guide:

Copy selected text in Powershell ISE to colored Html


Click the thumbnail below to see the script running inside Windows Powershell ISE:




Windows Powershell ISE has an editor and a command line below, where Powershell scripts can be executed and Powershell itself can be used. To start the PowerShell script, I type .\WebClient.ps1, since I saved the script above in a file called WebClient.ps1.

Users of BASH and other shell scripting Languages will feel familiar using Powershell, as multiple commands are linked to Unix familiar commands, such as DIR in MS-DOS and CMD is aliased to the command ls. More commands can be aliased, such that Unix Developers can manage part of Windows Clients and servers and still have that familiar Unix syntax feel of it ... This concludes our introductory tour of Powershell. I would like to say happy coding, but it is perhaps more correct to say happy (Powershell) scripting?

Tuesday 15 January 2013

Early exit of parallel loops

This short article will present different techniques of early exiting parallel loops.

Cancelling a PLINQ loop


The following code presents some simple code that demonstrates how to cancel a PLINQ loop.


            //Create a cancellation token source 
            var cts = new CancellationTokenSource();
 
            var nums = Enumerable.Range(0, 100);

            //Support cancellation token source 
            var result = nums.AsParallel().WithCancellation(cts.Token).Select(n => Math.Pow(n, 2));

            //create a task that enumerates the PLINQ query above 
            var enumTask = Task.Factory.StartNew(() =>
            {
                try
                {
                    foreach (var r in result)
                    {
                        Console.WriteLine("Got result: {0}", r);
                        Thread.Sleep(100); //slow things a bit down 
                    }
                }
                catch (OperationCanceledException oce)
                {
                    Console.WriteLine("Caught exception of type: {0}, message: {1}", oce.GetType().Name, oce.Message);
                }
            }); 
          
            //create a cancelling task 
            var cancellingTask = Task.Factory.StartNew(() =>
            {
                Thread.Sleep(500);
                cts.Cancel(); 
            });

            //Wait for both tasks 
            Task.WaitAll(enumTask, cancellingTask);  

            //Wait for user input before exiting 
            Console.WriteLine("Press any key to continue ..");
            Console.ReadKey(); 

Note that although cancelling the token source with the Cancel method, this does not mean that no iterations will be executed afterwards. This will vary upon the amount of iterations already started in parallel. In PLINQ, use the WithCancellation method to supply a cancellation token to use for cancelling the PLINQ Query when it is enumerated (executed). Make note that cancelling a PLINQ query with a cancellation token will actually create an OperationCanceledException. This must be caught in a try-catch block.

Breaking a Parallel For or Parallel Foreach loop

It is possible to break a parallel for loop using the Break method of the ParallelLoopState object passed in, which means we must use specific overload(s) of the Parallel.For and Parallel.ForEach method of the Parallel class in TPL. Let's investigate the Break() method of a Parallel.ForEach loop:

   var nums = Enumerable.Range(0, 10000); 

            var parallelLoopResult = Parallel.ForEach(nums, 
           (int n, ParallelLoopState state) => {
                var item = Math.Pow(n, 2); 
                Console.WriteLine("Got item {0}", item);
                if (item > 1000)
                    state.Break();
            });

            Console.WriteLine(parallelLoopResult.IsCompleted);
            Console.WriteLine(parallelLoopResult.LowestBreakIteration.Value);

            //Wait for user input before exiting 
            Console.WriteLine("Press any key to continue ..");
            Console.ReadKey(); 

In the code above, note that in addition to the usual Parallel.ForEach index being passed in the lambda action, one also passes the ParallelLoopState. In addition, both Parallel.For and Parallel.ForEach will return a ParallelLoopResult. From this result it is possible to get different information such as if the loop actually was completed and the LowestBreakIteration.

Stopping a parallel loop


Let's look at also stopping a Parallel.For loop with the Stop method.

       var parallelLoopResult = Parallel.For(0, 10000, (int n, ParallelLoopState state) =>
            {
                var item = Math.Pow(n, 2);
                Console.WriteLine("Got item {0}", item);
                if (item > 1000000)
                    state.Stop();
            });

            Console.WriteLine(parallelLoopResult.IsCompleted);
            Console.WriteLine(parallelLoopResult.LowestBreakIteration.HasValue);

            //Wait for user input before exiting 
            Console.WriteLine("Press any key to continue ..");
            Console.ReadKey(); 

So to summarize, cancelling a PLINQ query is possible using the WithCancellation method providing a CancellationToken from a CancellationTokenSource. To exit early a Parallel For or a Parallel ForEach method, use the Stop() or the Break() method of the ParallelLoopState. The ParallelLoopResult object returned from executing a Parallel For or Parallel ForEach loop contains information about the execution of the parallel loop, such as if it is completed and the LowestBreakIteration. The last value is a nullable int. If it has no value, the break method is not executed. If IsCompleted is false, either Stop or Break is executed. If IsCompleted is false and LowestBreakIteration has no value, Stop was called. If IsCompleted is false and LowestBreakIteration has value, Break was called. If both IsCompleted is true and LowestBreakIteration HasValue is false - which means the parallel loop executed in ordinary manner and neither Stop or Cancel was called on the ParallelLoopState.

Saturday 12 January 2013

ConcurrentQueue in TPL

This article will present some simple code to make use of the ConcurrentQueue collection that lives in the System.Collections.Concurrent namespace in TPL. The collections available in this namespace are the following:
  • ConcurrentBag
  • ConcurrentQueue
  • ConcurrentStack
  • ConcurrentDictionary
  • BlockingCollection
This article will focus on the ConcurrentQueue collection. This class is much like a Queue, but it is also thread safe and therefore supports concurrency in a better manner than the ordinary Queue collection. To enqueue items in the ConcurrentQueue, one uses the Enqueue method. To dequeue items in the ConcurrentQueue, one uses the TryDequeue Method. If this method returns true, the out parameter will contain the object returned from the ConcurrentQueue. Make note also that as the ordinary Queue we add and remove items from the queue in the ordinary First-In First-Out manner (FIFO), obviously because it is called a queue. The next code shows how to use the ConcurrentQueue:

 class Program
    {

        static void Main(string[] args)
        {

            ConcurrentQueue<int> queue = new ConcurrentQueue<int>(); 
            foreach (var i in Enumerable.Range(0, 1000))
            {
                queue.Enqueue(i); 
            }

            Parallel.ForEach(queue, new ParallelOptions { 
                MaxDegreeOfParallelism = System.Environment.ProcessorCount },
                q =>
            {
                int currentItem;
                if (queue.TryDequeue(out currentItem))
                    Console.WriteLine("Got item: {0}", currentItem);
            }); 

            Console.WriteLine("Press any key to continue ...");
            Console.ReadKey(); 

        }

In the code above, the code methods Enqueue and TryDequeue is used of the ConcurrentQueue class. Also note the use of the out parameter and the fact that TryDequeue returns a boolean flag that shows if the dequeueing was successful or not. There is a also a method called TryPeek which will inspect the item returned from the front of the queue, but it will not remove the item from the queue. Often, one will use ConcurrentDictionary also and BlockingCollection is often used to implement Producer-Consumer patterns in a parallel world. ConcurrentStack and ConcurrentBag will also be used in many situations. Of all the concurrent classes, ConcurrentBag looks the simplest to use. The nice part of these classes is that we do not have think too much about locking access to them, since this is already implemented. Actually, do not lock access to these collections at all, one only has to understand each collection and call the correct methods upon them to interact with the collection. Problems around synchronization and parallel access will already be resolved.