Thursday, 25 September 2014

Introductory Java 8 Lambda and collection handling

As a disclaimer, I have coded C# for several years now and am familiar with .NET technology, but my experience with Java 8 is more limited, so this article is just introductory level Java 8 Lambda and collection handling. I will mainly present code examples for handling collections and using lambda expressions in Java 8, using the Eclipse Luna IDE. I also downloaded Jdk 8 to get the necessary Java Development Kit and Java 8 runtime required. The code presented here is heavily based on the tutorial Oracle provides on their websites: Lambda Expressions in Java Let's look at some code right away, defining some classes and enums:

import java.time.LocalDate;
import java.time.chrono.IsoChronology;
import java.util.ArrayList;
import java.util.List;

public class Person {

 String name;  
 Sex gender; 
 LocalDate birthday; 
 String emailAddress; 

 Person(String nameArg, LocalDate birthdayArg, Sex genderArg, String emailArg){
  name = nameArg;
  birthday = birthdayArg; 
  gender = genderArg; 
  emailAddress = emailArg;   
 }
 
 public int getAge(){
  return birthday.until(IsoChronology.INSTANCE.dateNow()).getYears();  
 }
 
 public void printPerson(){
  System.out.println(name + ", " + this.getAge());   
 }
 
 public Sex getGender(){
  return gender;
 }
 
 public String getName(){
  return name;
 }
 
 public String getEmailAddress(){
  return emailAddress;
 }
 
 public LocalDate getBirthday(){
  return birthday;
 }
 
 public static int compareByAge(Person a, Person b){
  return a.birthday.compareTo(b.birthday); 
 }
 
 public static List createRoster(){
  List roster = new ArrayList(); 
  roster.add(new Person(
    "Fred", 
    IsoChronology.INSTANCE.date(1991, 6, 20),
    Sex.MALE,
    "fred@example.com")); 
  roster.add(new Person(
    "Jane", 
    IsoChronology.INSTANCE.date(1990, 7, 15),
    Sex.FEMALE,
    "jane@example.com")); 
  roster.add(new Person(
    "George", 
    IsoChronology.INSTANCE.date(1993, 8, 13),
    Sex.MALE,
    "george@example.com")); 
  roster.add(new Person(
    "Bob", 
    IsoChronology.INSTANCE.date(2000, 9, 12),
    Sex.MALE,
    "bob@example.com"));   
  return roster;
 }
 

}



public class SimplePerson {

 public SimplePerson(String nameArg, int ageArg) {
  name = nameArg;
  age = ageArg;
 }
 
 String name;
 int age;
 
 public String getName(){
  return name;
 }
 
 public int getAge(){
  return age;
 }
 
}


public enum Sex {
 
 MALE,
 
 FEMALE

}



public interface ICheckPerson {
 
 boolean test(Person p);

}




The following code makes use of these definitions and demonstrates use of lambda and collections in Java 8:

import java.util.Collection;
import java.util.Collections;
import java.util.Comparator;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.function.Consumer;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Collector;
import java.util.stream.Collectors;


public class RosterTest {
 
 //Approach 1: Create methods that Search for Persons that match one characteristics
 
 public static void printPersonsOlderThan(List<Person> roster, int age){
  for (Person p : roster){
   if (p.getAge() >= age){
    p.printPerson();
   }
  }  
 }

 
 //Approach 2: Create more generalized search methods 
 
 public static void printPersonsWithinAgeRange(
   List<Person> roster, int low, int high){
  for (Person p : roster){
   if (low <= p.getAge() && p.getAge() < high){
    p.printPerson();
   }
  }
 }
 
 
 //Approach 3: Specify search criteria code in a local class
 //Approach 4: Specify Search criteria code in an anonymous class
 //Approach 5: Specify search criteria code with a lambda expression
 public static void printPersons(
   List<Person> roster, ICheckPerson tester){
  for (Person p : roster){
   if (tester.test(p)){
    p.printPerson();       
   }
  }
 }
 
 
 //Approach 6: Use standard functional interfaces with lambda expressions
 public static void processPersonsWithPredicate(
   List<Person> roster,
   Predicate<Person> tester){
  for (Person p : roster){
   if (tester.test(p)){
    p.printPerson();
   }
  }  
 }
 
 
 //Approach 7, second example
 
 public static void processPersonsWithFunction(
   List<Person> roster,
   Predicate<Person> tester,
   Function<Person, String> mapper,
   Consumer<String> block){
  for (Person p : roster){
   if (tester.test(p)){
    String data = mapper.apply(p);
    block.accept(data); 
   }
  }
 }
 
 
 //Approach 8: Use generics more extensively
 
 public static <X,Y> void processElements(
   Iterable<X> source,
   Predicate<X> tester,
   Function<X,Y> mapper,
   Consumer<Y> block) {
  for (X p : source){
   if (tester.test(p)){
    Y data = mapper.apply(p);
    block.accept(data);
   }   
  }
 }
 
 
 
 public static void main(String... args){
  
  List<Person> roster = Person.createRoster(); 
  
  for (Person p : roster){
   p.printPerson();
  }
  
  //Approach 1: Create methods that Search for Persons that match one characteristic
  
  System.out.println("Persons older than 20:"); 
     printPersonsOlderThan(roster, 20);
     System.out.println();
     
     //Approach 2 : Create more generalized search methods 
     
     System.out.println("Persons between the ages of 14 and 30:"); 
     printPersonsWithinAgeRange(roster, 14, 30);
     System.out.println();
     
     //Approach 3 : Specify search criteria code in local class 
     
     System.out.println("Persons who are eligible for Selective Service:"); 
     
     class CheckPersonEligibleForSelectiveService implements ICheckPerson{

   @Override
   public boolean test(Person p) {
    return p.getGender() == Sex.MALE
      && p.getAge() >= 18 
      && p.getAge() <= 25;
   }      
     }
     
     printPersons(roster, new CheckPersonEligibleForSelectiveService());
     
     System.out.println();
     
     //Approach 4: Specify search Criteria code in an Anonymous class 
     
     System.out.println("Persons who are eligible for Selective Service " + 
      "(anonymous class)");
     
     printPersons(roster, new ICheckPerson() {   
   public boolean test(Person p) {
    return p.getGender() == Sex.MALE
      && p.getAge() >= 18
      && p.getAge() <= 25;
   }
     }); 
  
     //Approach 6: Use standard functional interfaces with lambda expressions 
     
     System.out.println("Persons who are eligible for Selective Service " + 
     "(with Predicate parameter):"); 
     
     processPersonsWithPredicate(roster,
       p -> p.getGender() == Sex.MALE
         && p.getAge() >= 18
         && p.getAge() <= 25
         );
     
     //Approach 7, second example 
     
     System.out.println();
     
     processPersonsWithFunction(roster, p -> 
     p.getGender() == Sex.MALE
     && p.getAge() >= 18 
     && p.getAge() <= 25, 
     p -> p.getEmailAddress(), 
     email -> System.out.println(email));
     
     System.out.println();
     
     //Approach 8: Use generics more extensively 
     
     System.out.println("Persons who are eligible for Selective Service " + 
     "(generic version):"); 
     processElements(roster, p -> 
     p.getGender() == Sex.MALE
     && p.getAge() >= 18 
     && p.getAge() <= 25, 
     p -> p.getEmailAddress(), 
     email -> System.out.println(email));
     
     //Approach 9: Bulk data operations that accept lambda expressions as parameters 
     
     System.out.println("Persons who are eligible for Selective Service " + 
      "(with bulk data operations):"); 
     
     roster.stream().filter(
       p -> p.getGender() == Sex.MALE
       && p.getAge() >= 18
       && p.getAge() <= 25)
       .map(p -> p.getEmailAddress())
       .forEach(email -> System.out.println(email));
     
     
     //Approach 10: Creating a new collection from a query 
     
     System.out.println("Persons who are eligible for selective Service " + 
     "(with collect to create new collection"); 
     
     List<Person> selectiveServicePersons = roster.stream().filter(
       p -> p.getGender() == Sex.MALE
       && p.getAge() >= 18 
       && p.getAge() <= 25)
       .collect(Collectors.<Person>toList()); 
     
     for(Person p: selectiveServicePersons){
      p.printPerson();
     }     
        
     
     System.out.println();
     
     //Approach 11: Using map to transform a collection to another type of collection    
     
     List<SimplePerson> personsTransformed = (List<SimplePerson>) roster.stream()
       .map(p->new SimplePerson(p.getName(), p.getAge()))
    .collect(Collectors.<SimplePerson>toList()); 
     
     for(SimplePerson p: personsTransformed){
      System.out.println(p.getName() + ", " + p.getAge()); 
     }
     
     System.out.println();
     
     //Approach 12: Using sorting for Approach 11 
     
     List<SimplePerson> personsTransformedSorted = (List<SimplePerson>) roster.stream()       
       .map(p->new SimplePerson(p.getName(), p.getAge()))
    .collect(Collectors.<SimplePerson>toList()); 
     
     class CustomComparator implements Comparator<SimplePerson>{
      @Override
      public int compare(SimplePerson o1, SimplePerson o2) {
       return o1.getAge() - o2.getAge();
      };
     }
     
     Collections.sort(personsTransformedSorted, new CustomComparator());
     
     for(SimplePerson p: personsTransformedSorted){
      System.out.println(p.getName() + ", " + p.getAge()); 
     }
     
     System.out.println();
     
     //Approach 13: Grouping sorting by gender 
     
     Map<Sex, List<Person>> mapByGenderMap = 
       roster.stream().collect(Collectors.groupingBy(Person::getGender));
     
     for (Map.Entry<Sex, List<Person>> entry : mapByGenderMap.entrySet()){
     
      System.out.println("Persons that are: " + entry.getKey());
      for (Person person : entry.getValue()) {
       person.printPerson();
      }      
     }  

     System.out.println();
     
     //Approach 14: Comparator that is implemented on the fly 
     List<Person> personsSortedByNameList = roster.stream()
       .sorted(new Comparator<Person>() {
        public int compare(Person x, Person y){
         return x.getName().compareTo(y.getName());
        }
    }).collect(Collectors.toList());
     
     System.out.println("Persons sorted by name");
     for (Person person : personsSortedByNameList){
      person.printPerson();
     }  

   //Approach 15: Comparator that is implemented on the fly using a lambda expression 
   List<Person> personsSortedByNameThroughLambdaList  =  roster.stream()
     .sorted((x,y)-> x.getName()
     .compareTo(y.getName())).collect(Collectors.toList()); 
     
    System.out.println("Persons sorted by name through lambda:");
     
    for (Person person : personsSortedByNameThroughLambdaList)
    {
      person.printPerson();
    }        
          
 } 
   

}


First perspectives of the Java 8 support

From looking through the sample code, one can see that Java 8 has got support for most of what LINQ offers, there are also some major drawbacks or shortcomings that are clear. I am not experienced with Java 8, so perhaps some of my considerations here are either wrong or inprecise, but overall the syntax of Java 8 is much more verbose and less fluent than that of LINQ. Also, there are differences between Java 8 and C#. The Consumer concept of Java is great and looks like a missing part in C#. While Java 8 uses ->, C# uses =>, and the rest of the functional syntax is very similar. Runnable also looks like Action of C#. The anonymous class concept of Java is however flawed compared to C# true anonymous functions. To do projections and a requirement to create a new class definition or interface definition is not very flexible. Instantiating interfaces is supported in Java and not in C#, this is however a positive feature of Java that C# lacks. Java supports also default implementations and static implementations in interfaces, which is very practical in many cases - C# does not support these default methods inside interfaces. To do grouping and sorting requires implementing comparators and groupings go via maps. All in all, C# and LINQ is so more flexible and the chaining syntax makes complicated functional programming easy. In Java 8, much functional programming is possible - in fact most functional programming that's supported in LINQ is also supported in Java 8 I guess - it is just so verbose, fragmented and indirect to do basic things such as sorting. I also like the true anonymous functions of C# using thew new { } syntax, which Java 8 seems to lack. The dynamic type inference that the "var" keyword of C# supports seems also to lack when working with Java 8 - you often have to help the compiler by doing specific casts. Again, there might be some misunderstandings here, but all in all Java 8 functional programmings seems "harder" than using C# and LINQ, but I guess this will improve as Java 8 now has got functional programming support. Also, Scala and Clojure are similar languages that are alternatives, where functional programming have been supported for years and have matured. It looks like Java's "marriage" with Oracle has halted progression and while Java 8 has evolved, there are still many parts of the language that can be improved to support functional programming. However, much about Java 8 is positive and it is to be expected that more C# developers (and VB?) will be attracted to Java since it now supports functional programming. But beware, C# programmers - here be dragons! There will be a learning curve and that compact syntax of C# you have learned to love is not the same experience when doing Java 8 functional programming!

Friday, 18 July 2014

Implementing a fast dictionary for English words using a trie datastructure or radix tree in C#


The following article will present source code for implementing a fast dictionary for English words using a trie datastructure. The trie datastructure is basically a tree with fixed number of children, which in this case is kept as an array of Node instances. The trie is a tree of Node instances and will describe "paths", when resolving words. The application is itself a WPF application and will require .NET 4 or newer to execute. The following source code is the class Node, which is a specific node of the trie datastructure:

using System.Collections.Generic;

namespace AutoCompleteDictionary
{
    
    public class Node
    {

        private readonly Node[] children = new Node[26];

        public IEnumerable<KeyValuePair<Node, char>> AssignedChildren
        {
            get
            {
                for (int i = 0; i < 26; i++)
                {
                    if (children[i] != null)
                        yield return new KeyValuePair<Node, char>(children[i], (char)('a' + i));    
                }
            }
        }

        public Node GetOrCreate(char c)
        {
            Node child = this[c];
            if (child == null)
                child = this[c] = new Node();
            return child; 
        }

        public Node this[char c]
        {
            get { return children[c - 'a']; }
            set { children[c - 'a'] = value; }
        }

        public bool IsWordTerminator { get; set; }

    }

}

The Node class has a fixed sized Node array called children. The readonly property AssignedChildren returns which Node or letters (chars) are present at the current Node or "level", i.e. char position of the words that are mapped into the trie datastructure. In addition, the yield keyword is used here together with an iterator, which is implemented also in this class. The Node class will work with lowercase English letters, but could be adjusted for other alphabets as well. For a Norwegian alphabet, 29 letter would be used as the size of the children to accept the additional three Norwegian vowels, for example. Next, the code for the trie data structure is presented. It is itself a class.

using System.Collections.Generic;

namespace AutoCompleteDictionary
{
    
    public class Trie
    {

        private readonly Node root = new Node();

        public Node NodeForWord(string word, bool createPath)
        {
            Node current = root;

            foreach (char c in word)
            {
                if (createPath)
                    current = current.GetOrCreate(c);
                else
                    current = current[c];

                if (current == null)
                    return null;
            }

            return current;
        }

        public void AddNodeForWord(string word)
        {
            Node node = NodeForWord(word, true);
            node.IsWordTerminator = true; 
        }

        public bool ContainsWord(string word)
        {
            Node node = NodeForWord(word, false);
            return node != null && node.IsWordTerminator; 
        }

        public List PrefixedWords(string prefix)
        {
            var prefixedWords = new List<string>();
            Node node = NodeForWord(prefix, false);
            if (node == null)
                return prefixedWords;

            PrefixedWordsAux(prefix, node, prefixedWords);
            return prefixedWords; 
        }

        private void PrefixedWordsAux(string word, Node node, List<string> prefixedWords)
        {
            if (node.IsWordTerminator)
                prefixedWords.Add(word); 

            foreach (var child in node.AssignedChildren)
            {
                PrefixedWordsAux(word + child.Value, child.Key, prefixedWords); 
            }
        }

    }
}

The trie class or data structure will make use of Node instances as the nodes of its tree datastructure. It adds nodes with the AddNodeForWord method, which also sets the flag IsWordTerminator to true for the specific node. It will loop through the letters or chars of the passed in word or string and build up the Node subtree for the word. It is possible that "paths" are already registered. The method Containsword will not add new nodes but make use of the trie datastructure or Node tree and look if one for a given word ended up with a Node which is not null and that the Node has a flag IsWordTerminator which is true. The method PrefixedWords uses recursion to find "paths" or words that is, in the trie or Node tree that can be reached from the Node that the typed word matches, if any. The recursion will visit the entire subtree of the trie or Node subtree below the Node that matches the typed word, so that it is possible that the calculation will take some considerable type, if the user is not limited to typing at least some letters or chars for the prefix. In the application, three letters is set as a default minimum amount of letters to type. There are about 440,000 letters in this English dictionary. It is not complete, but there is a considerable amount. However, the time to get the matching words with the inputted prefix will for three letters or above typically take a very few milliseconds.
The next class is a simple view model that is used in the WPF client making use of the code above. The WPF xaml view sets the DataContext property to an instance of this view model to have a simple MVVM scenario, or in this case View-ViewModel scenario, as the Model is trivially the view model itself.

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.ComponentModel;
using System.Diagnostics;
using System.IO;
using System.Linq; 

namespace AutoCompleteDictionary
{
    
    public class AutoCompleteViewModel : INotifyPropertyChanged
    {

        private Trie trie = new Trie(); 

        private int prefixMinSize;
        public int PrefixMinSize
        {
            get { return prefixMinSize; }
            set
            {
                if (prefixMinSize != value)
                {
                    prefixMinSize = value;
                    RaisePropertyChanged("PrefixMinSize"); 
                }
            }
        }

        private string calculationInfo;
        public string CalculationInfo
        {
            get { return calculationInfo; }
            set
            {
                if (calculationInfo != value)
                {
                    calculationInfo = value;
                    RaisePropertyChanged("CalculationInfo");
                }
            }
        }

        private string inputWord;
        public string InputWord
        {
            get { return inputWord; }
            set
            {
                if (inputWord != value)
                {
                    value = value.ToLower(); 
                    inputWord = string.Join("", value.ToCharArray().Where(c => (int)c >= (int)'a' && (int)c <= (int)'z').ToArray()) ;
                    RaisePropertyChanged("InputWord");
                }
            }
        }

        public ObservableCollection<string> PrefixList { get; set; } 

        public void RaisePropertyChanged(string propertyName)
        {
            if (PropertyChanged != null)
                PropertyChanged(this, new PropertyChangedEventArgs(propertyName)); 
        }


        #region INotifyPropertyChanged Members

        public event PropertyChangedEventHandler PropertyChanged;

        #endregion

        public AutoCompleteViewModel()
        {
            PrefixMinSize = 3;
            InputWord = "Adv";
            PrefixList = new ObservableCollection<string>();
            ProcessDictionaryList();
        }

        private void ProcessDictionaryList()
        {
            foreach (var word in File.ReadLines("english-words"))
            {
                trie.AddNodeForWord(word);
            }
        }

        public void SetPrefixList()
        {
            Stopwatch stopWatch = Stopwatch.StartNew(); 
            PrefixList.Clear();
            var prefixes = GetPrefixList();
            
            foreach (var prefix in prefixes)
            {
                PrefixList.Add(prefix); 
            }
            //RaisePropertyChanged("PrefixList"); 

            CalculationInfo = string.Format("Retrieved {0} words prefixed with {1}. Operation took {2} ms", PrefixList.Count, inputWord, stopWatch.ElapsedMilliseconds); 
        }

        private List GetPrefixList()
        {
            if (InputWord.Length >= PrefixMinSize)
            {
                var wordsStartingWithInputWord = trie.PrefixedWords(InputWord.ToLower());
                return wordsStartingWithInputWord;
            }
            return new List<string>(); 
        }
    }
}


If the English dictionary looks like a nice little toy, it is possible to download the program. A compiled version is in the folder bin\Debug if you want to run the program without compiling the source code. Requires Visual Studio 2012 or newer. Download the English dictionary WPF client presented above using a trie or radix tree data structure here

Implementing a fast point structure in C# for large-scale comparison checks and searches

The code below is extracted from Part I of the Pluralsight course "Making .NET Applications faster", discussed and presented below. Implementing a fast structure for 2D points in C# requires using a struct instead of a class, since this is value-based and not reference type, i.e making use of the stack and not the heap and avoiding expensive header fields of objects. In addition, it is necessary to:
  • Override the Equals method inherited from System.Object
  • Implement a method called Equals that returns true and has one input parameter, another instance of the same struct
  • Mark the struct with the generic IEquatable interface, IEquatable<PointV5>
  • Implement the operators == and != to make use of the Equals method receiving an instance of the struct
  • Implement GetHashCode, using Jon Skeet's advice of creating a weighted sum multiplied by prime numbers and the struct's fields
Implementing this equality regime will reduce overall size in memory about 4x and increase speed 4x to 10x for large scale scenarios. In the example code testing this code, 10 million 2D point structs of type PointV5 was tested. Most modern games will avoid reference types of course and stick to value types, i.e. structs. Since most of us are application developers, make sure if you create a LOT of objects, consider switching from class to struct type(s), if possible. Often, it is not needed to use classes, a struct will be sufficient (and faster and lighter). Check out the course page here: Pluralsight: Making .NET applications faster Pluralsight is a great course and IT resource for IT professionals!

using System;


    struct PointV5 : IEquatable
    {
        public int X;
        public int Y;

        public override bool Equals(object obj)
        {
            if (!(obj is PointV5)) return false;
            PointV5 other = (PointV5)obj;
            return X == other.X && Y == other.Y;
        }

        public bool Equals(PointV5 other)
        {
            return X == other.X && Y == other.Y;
        }

        public static bool operator ==(PointV5 a, PointV5 b)
        {
            return a.Equals(b);
        }

        public static bool operator !=(PointV5 a, PointV5 b)
        {
            return !a.Equals(b);
        }

        public override int GetHashCode()
        {
            // 19 and 29 are primes, and this doesn't assume anything about
            // the distribution of X and Y.
            // Also see http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode
            int hash = 19;
            hash = hash * 29 + X;
            hash = hash * 29 + Y;
            return hash;
        }
    }


Actually, a simple class is in fact faster in this scenario, according to the testing I did. Consider this class:

public class PointV0
{

        public int X { get; set; }

        public int Y { get; set; }

}

However, the price on pays here is higher memory overhead, as each point will have to be stored on the heap and have the object header fields and method table pointer fields, i.e. taking up more memory.

Elementary class
        Average time per lookup: 85,70ms
        Garbage collections: 0
Naked struct
        Average time per lookup: 435,10ms
        Garbage collections: 1018
With Equals override
        Average time per lookup: 248,70ms
        Garbage collections: 510
With Equals overload
        Average time per lookup: 239,50ms
        Garbage collections: 510
With IEquatable
        Average time per lookup: 168,60ms
        Garbage collections: 0
All bells and whistles
        Average time per lookup: 170,00ms
        Garbage collections: 0
Press any key to continue ..

I cannot conclude from these results that structs always are faster than classes, but it will always be more memory overhead to resort to classes instead of structs.. It looks though, that in this example, a simple class was the fastest choice!