Wednesday, June 18, 2014

Coding Books

I love reading books... Plain and simple... In particular, if I want to get up to speed on something, or even just for a refresher... Books are a great way to go.  The problem is, there are countless books... So its tough to find the one worth diving into. (Reading a book is a large time commitment after all!)

So I'm going to reserve this page to be a place holder for the books I've read... Now I'm not saying these are the best, but for the most part, I think they were pretty helpful!  Happy reading!

Book List


  • C# In Depth, Second Edition by Jon Skeet
    • Fantastically simple and yet informative... I reference this text constantly and find that my programming skill increases just by owning it.  Anyone who uses StackOverflow will recognize the name.  He covers several techniques and features of C# and gives many overviews of the evolving language.
    • I read the second, however at this point the third is out.  I haven't read it yet, but I plan to.
    • I also follow Jon's blog... And would say if you're not already, you should.
  • Async in C# 5.0 by Alex Davies
    • This was a very quick read that gave good insight to how the Async/Await keywords were implemented, why they are useful, and when to use them.  When I read the book, the keywords were new and a bit foreign.  I felt up to speed and ready to use them after reading the text.
    • If you still feel like Async/Await is a bit strange, then this is definitely a book to check out... However, if you are comfortable with the .NET team offering you a tool that magically makes code run in parallel... then you can probably get by without reading it.

  • Software Architecture: Foundations, Theory and Practice by Richard N. Taylor; Nenad Medvidovic; Eric M. Dashofy
    • I read this book for a masters class, and we only covered the first 7 chapters.
    • The book is DEFINITELY a textbook, and found many sections VERY slow.
    • The book has very little code and mostly emphasizes the importance of strong management...
    • I can't say I would recommend it to a coder, as it's more for someone who wants to manage coders.

  • Java Concurrency in Practice by Brian Goetz; Tim Peierls; Joshua Bloch; Joseph Bowbeer; David Holmes; Doug Lea
    • I REALLY enjoyed this book.  It offered several strong insights and I think is a good read even if you don't plan on programming in Java.
    • It presents several ways of thinking in parallel, and several pitfalls that coders fall for.
    • I keep this book as reference, and plan to read it again soon just for a refresher.
  • Data Structures and Algorithms Using Java by William McAllister
    • This book was fantastic! It presented great detail but never went overboard.  It's definitely just a refresher if you're already familiar with basic data structures and their corresponding algorithms, but still worth a read.
    • Covered linked lists, queues, stacks, hashes, graphs, sorting and more. (All pretty basic, but great to reference from time to time)
    • I find myself coming back to this book often...
  • Hadoop for Dummies by Dirk deRoos
    • This is the book I'm CURRENTLY reading and am on chapter 11 of 19.
    • While I kind of hate the name, I think the books has presented quite a bit of basic knowledge about Hadoop.
    • The book seems a bit more oriented to the IT and maintenance side, but still presents the usage.
    • Mostly gives overviews to the different Apache projects that are linked to Hadoop and they're intentions.
    • I don't think I could finish this book and claim to be able to use Hadoop... But I think it's still worth reading, as it is giving me a well rounded view of the Hadoop project and its history.

More to Come!

As I said before, I will continue to add to this list as I read books! If you have any recommendations, I would love to hear them! Happy reading!


Tuesday, June 17, 2014

Less Comments, More Functions

Comments are vital to every software package.  I'm sure at some point we've all come across some code that we would have liked to know what the original (or perhaps it was our previous self) was intending...  At that point, we immediately look around for a comment or two.

However, what if the comment is out of date?  When this happens, I think the comment is quite destructive and should be deleted.  So why does this happen?  I think we've all been in a situation where we are changing code and while we're concerned about what the compiler thinks and the end results... we neglect the comments that were included with the code.  Perhaps we have every intention to go back and fix them... but never find the time...

I know I run into this more often than I would like...  To prevent this I have adopted a style that strives to self document.  I write several little functions that have a single intention and a descriptive name.  I believe and find that coders will change the name of methods and functions before they change a bunch of comments.  I've also found that coders prefer to look at code!  In fact, if you take the approach of writing code that is simple and concise, its likely that most coders will be able to read your intentions far easier that deciphering a grammatically challenged statement riddled with words that are misspelled.

This could quickly become a rant... So I'll stop here.  Feel free to post your favorite comments you've ever written/encountered... Or even just some pet peeves...

Tuesday, June 10, 2014

C# and a Functional Style

One weekend I dove headfirst into Haskell via "Learn you a Haskell for Great Good" for nothing more than something to learn... and walked away very discouraged with most of the code I wrote up to that point...  The following Monday, I went to work, and noticed the code that I previously found elegant, felt crude.  In particular, large methods that used and manipulated multiple members from the class.  Unfortunately, I didn't have the option to move from C# to Haskell or even F#... So I just remained dissatisfied.

Not all was lost however, over the next few days I noticed there was hope... C# actually had many elements that allow functional styles.  In particular:


  • Lazy Evaluation and Infinite Sequences
  • Function Chaining
  • Lambda Statements
  • Higher Order Functions

Lazy Evaluation and Infinite Sequences


While C# is not totally lazy, it does have a lazy feature via the keyword yield.  A function that uses yield is lazy.  It will not be called until the results are requested.  Lets look at an example:


/// <summary>
/// A lazy function that returns an infinite sequence of integers.
/// </summary>
/// <param name="start">The value to start at.</param>
/// <param name="incBy">The value increments by.</param>
/// <returns>An infinite sequence.</returns>
public static IEnumerable<int> Range(int start = 0, int incBy = 1)
{
  for (int i = start; true; i += incBy)
    yield return i;
}

static void Main(string[] args)
{
  // This code never accesses any values.
  var x = Range();
}


If you ran this in debug mode, and placed a breakpoint in the function Range(), you would notice that the function is never called.  This is because we never request any values from the infinite sequence.

Function Chaining

With the addition of LINQ, my code has often turned into several series of data structure transformations.  With functional programming, this is very natural, however in C# it would normally look something like the following:

A(B(C(D(9))))

Where A, B, C, and D are all methods.  I personally don't find this very elegant or even readable.  However, the .NET team provided a solution through extension methods.  Lets look at some actual code:


static class Program
  {

    /// <summary>
    /// This will adjust all the case of the lines to lower.
    /// </summary>
    /// <returns>A sequence of lower case lines.</returns>
    public static IEnumerable<String> AdjustLines(IEnumerable<String> lines)
    {
      return lines.Select(line => line.ToLower());
    }

    /// <summary>
    /// This takes two sequences and pairs each value together as a Tuple.
    /// </summary>
    /// <returns>A sequence of Tuples.</returns>
    public static IEnumerable<Tuple<T1, T2, T3>> Zip<T1, T2, T3>(
        this IEnumerable<T1> a, IEnumerable<T2> b, IEnumerable<T3> c)
    {
      var ea = a.GetEnumerator();
      var eb = b.GetEnumerator();
      var ec = c.GetEnumerator();
      while (ea.MoveNext() & eb.MoveNext()& ec.MoveNext())
        yield return Tuple.Create(ea.Current, eb.Current, ec.Current);
    }

    /// <summary>
    /// Compare each line and return only the ones that differ.
    /// </summary>
    /// <param name="values">A zipped up sequence of 2 lines and a line number.</param>
    /// <returns>Returns a sequence of differing lines.</returns>
    public static IEnumerable<Tuple<String, String, Int32>> CompareLines(
        this IEnumerable<Tuple<String, String, Int32>> values)
    {
      return values.Where(t => t.Item1 != t.Item2);
    }

    /// <summary>
    /// Verify if a number is prime.
    /// </summary>
    /// <returns>Returns true if the number is prime and false otherwise.</returns>
    public static bool IsPrime(int number)
    {
      for (int i = 2; i < (int)Math.Ceiling(Math.Sqrt(number)); i++)
        if (number % i == 0)
          return false;

      return true;
    }

    /// <summary>
    /// Filter out the differing lines for only the ones that happen to have prime line numbers.
    /// </summary>
    /// <returns>A sequence of lines that have prime line numbers.</returns>
    public static IEnumerable<Tuple<String, String, Int32>> FilterForPrimeLines(
        this IEnumerable<Tuple<String, String, Int32>> values)
    {
      return values.Where(t => IsPrime(t.Item3));
    }

    /// <summary>
    /// A lazy function that returns an infinite sequence of integers.
    /// </summary>
    /// <param name="start">The value to start at.</param>
    /// <param name="incBy">The value increments by.</param>
    /// <returns>An infinite sequence.</returns>
    public static IEnumerable<int> Range(int start = 0, int incBy = 1)
    {
      for (int i = start; true; i += incBy)
        yield return i;
    }

    /// <summary>
    /// Prints the results of everything.
    /// </summary>
    public static void PrintResults(this IEnumerable<Tuple<String, String, Int32>> results)
    {
      foreach (var r in results)
        Console.WriteLine(String.Format("#{0} -> [A]{1} != [B]{2}", r.Item3, r.Item1, r.Item2));
    }

    static void Main(string[] args)
    {
      var a = File.ReadLines(@"C:\temp\a.txt");
      var b = File.ReadLines(@"C:\temp\b.txt");

      // Normal style
      PrintResults(FilterForPrimeLines(CompareLines(Zip(a, b, Range()))));

      // Functional/Pipeline style
      a.Zip(b, Range())
        .CompareLines()
        .FilterForPrimeLines()
        .PrintResults();
    }

The most vital section of code to view is the "main()" function.  There we have the two different styles in discussion.  First we have the normal way, where each function calls the next in a traditional way. At the end of the statement we end up with 5 ')'s... It doesn't read well and is difficult to follow.

However, after that we have the "new" way, where we take advantage of extension methods and chain them together.  This offers a far cleaner and more readable statement.  (It also integrates better with any LINQ functionality you may use.)

Lambda Statements

Lambda statements are well documented all over the internet, but are still worth mentioning.  Lambda statements are HEAVILY used in functional programming and offer developers the option to make and pass very simple to very complex functions on the fly.  If you have ever used LINQ, then you have used them...

Higher Order Functions

Higher order functions (currying and partial functions) are now possible due to lambda statements.   Jon Skeet has an excellent post worth checking out on currying vs partial functions.  I don't plan to repeat his work, as I don't think I can do a better job...

Summary

Functional programming is possible (though not required) in C#.  Programming in such a style can offer several benefits and is worth considering.  At the very least, it can help a developer to think differently and approach an algorithm from a different light.

Wednesday, January 1, 2014

Yield over Lists

Synopsis

Creating a large collection by adding elements to a List<T> is inefficient, can large memory footprints and lead to OutOfMemory exceptions.  So why do so many programmers do it?  The normal response to pointing this out is: "What alternative do I have?"  My response is always: Yield

Headaches with List<T>

If you went and told somebody you were going to take an array, and re-size it several times until it is an unduly large object... you would probably wouldn't make it through the explanation before you realized how taxing of a process that would be!  And yet... we see this code frequently:

static IEnumerable<long> buildUsingList()
{
  var someList = new List<long>();

  for (int i = 0; i < Iterations; i++)
    someList.Add(i);

    return someList;
}


It looks harmless enough, but lets look at some of the problems with this technique.

List<T> uses an array to store all it's elements.  So this means each time we add one more entry than the array can hold, the list has to re-size the array.  If the array is small enough, no big deal.  In fact, it can probably stay in cache.  However, once you add enough items, the array becomes too large and has to be moved to RAM.  This is because arrays have to be a contiguous block.

This has a few implications:
  • Memory Blocks
    • Each time the OS has to go and find a new block, this can take a few clock cycles.
    • Once it finds a new block, it has to move the data from the current block to the new location.
  • Paging
    • If the array is too large, it can end up in RAM.  This will cause memory paging.
    • The CPU can't directly operate on any data that is not stored in cache.  Therefore, any data stored in RAM will have to be copied first.  Each cache miss will result in extra clock cycles.
  • Large Object Heap (LOH)
    • An object that is at least 85 KB is added to the Large Object Heap. So lets say that Iterations is 10880 (85 * 1024/8)?  This would then create an array that is added to the LOH.  Do this enough and the Garbage Collector will bring your software to a crawl.

Benchmarks

Lets look at some benchmark results to prove to ourselves that there is actually a benefit to using Yield. (NOTE: Yield benefits enormously from the optimize flag!  Be sure you have it on if you benchmark/use yield.)

First, lets present the yield replacement to the buildUsingList() function that was listed above:

static IEnumerable<long> buildUsingYield()
{
  for (long i = 0; i < Iterations; i++)
    yield return i;
}


Now lets look at the function that will call either buildUsingList() or buildUsingYield().

static void Main(string[] args)
{
  var watch = System.Diagnostics.Stopwatch.StartNew();

  var total = 0L;

  for(int i=0; i<100000; i++)
    total +=getTotal(buildUsingList()); // NOTE: This will be commented out for the yield test.
    //total +=getTotal(buildUsingYield()); // NOTE: This will be un-commented for the yield test.

  watch.Stop();

  Console.WriteLine(total);
  Console.WriteLine(watch.ElapsedMilliseconds);
  Console.Read();
}


Notice I actually do something with the results (getTotal(...) and Console.WriteLine(total)) to ensure the compiler doesn't remove the parts we care about.

Now for the results (NOTE: The results are in milliseconds):

Iterations List Yield
100 169 133
1000 1548 1265
10000 15157 12576
100000 171437 123682

Right away we can see that Yield has an advantage.  There are two reasons for that:
  1. Memory Footprint - As we discussed before, using yield prevents building several (100000) chunks of memory.  This lets the garbage collector off the hook along with the memory pool who would have to find the memory.  In fact for 100000 iterations, each list will be a LOH, which allows yield even better results in comparison.
  2. Half the Iterations - Notice that we build the collection and then go through it.  However, because yield is lazy, it only goes through Iterations as it needs to.  This leaves yield doing half the iterations as building a list.

Summary

If you have a situation where you want to build a collection of objects and then iterate through each, use yield.  You will have performance benefits and use less memory.  Yield won't help you if you need a collection that you can alter however.  We'll look at what to use in that situation in a later post.

Wednesday, December 25, 2013

"Using" Statements with Proxies

Summary

Every software developer who is doing more than simply churning out keywords and semicolons strives to create maintainable code.  In an effort to do so, we all have to take a step back and make decisions on how to handle certain complexities.  A common design that I find to be sloppy, is the try-finally statement.  I find that it does not grow well.

I think that examples do well to explain... Here, we're going to use a ReaderWriterLock.  This object in particular interests me, because it lacks a keyword that a normal lock enjoys (i.e. lock).

Normal way to use a ReaderWriterLock

var reader = new System.Threading.ReaderWriterLock();

try
{
  reader.AcquireWriterLock(TimeSpan.FromSeconds(1));

  // Do Work...
  // Blah blah blah
  // Foos and bars all over the place...
}
finally
{
  reader.ReleaseReaderLock();
}


While this works as expected, I find it hideous.  We have to instantiate the lock in a different scope than we use it and the ReleaseReaderLock() method has to be called in different scope than the AcquireWriterLock().  It just doesn't flow...

Nice to have...

Wouldn't it be nice if we were provided something like the lock keyword.  Then we could write something like the following:

var reader = new System.Threading.ReaderWriterLock();

lockReader(reader, TimeSpan.FromSeconds(1))
{
  // Do Work...
  // Blah blah blah
  // Foos and bars all over the place...
}


Way more concise!  Alas, no such beast exists.  So, why don't we create our own version!

Concise = Proxy + "Using" Statement

While we don't have a lockReader keyword available, we do have using.  So, I propose we create a wrapper class for the ReaderWriterLock that outputs a proxy whenever you acquire a reader or writer lock.  The proxy class will inherit IDisposable and the Dispose() method will release the lock. Simple enough?

Now, for some code:

First, the proxy class:

public class RWLockProxy : IDisposable
{
  private bool _readLock;
  private ReaderWriterLock _rwLock;

  private RWLockProxy()
  {
  }

  public static RWLockProxy AcquireReader(ReaderWriterLock rwLock, TimeSpan timeout)
  {
    rwLock.AcquireReaderLock(timeout);
    return new RWLockProxy { _rwLock = rwLock, _readLock = true };
  }

  public static RWLockProxy AcquireWriter(ReaderWriterLock rwLock, TimeSpan timeout)
  {
    rwLock.AcquireWriterLock(timeout);
    return new RWLockProxy { _rwLock = rwLock, _readLock = false };
  }

  public void Dispose()
  {
    if (_readLock)
      _rwLock.ReleaseReaderLock();
    else
      _rwLock.ReleaseWriterLock();
  }
}


Nothing too complicated here... Notice that I put all the ReaderWriterLock logic inside the proxy class.  I do this to keep everything together and to not spread logic out across multiple classes.  I find that it is easier to read and maintain.

Now for the wrapper class:

public class RWLock
{
  private readonly ReaderWriterLock _lock = new ReaderWriterLock();

  public RWLockProxy AcquireReader(TimeSpan timeout)
  {
    return RWLockProxy.AcquireReader(_lock, timeout);
  }

  public RWLockProxy AcquireWriter(TimeSpan timeout)
  {
    return RWLockProxy.AcquireWriter(_lock, timeout);
  }
}


Event simpler than the proxy!  Notice that we don't even provide the option to release the lock.  The proxy takes care of it.

And finally, here is the code using it:

var reader = new RWLock();

using(reader.AcquireReader(TimeSpan.FromSeconds(1)))
{
  // Do Work...
  // Blah blah blah
  // Foos and bars all over the place...
}


Just as concise as our non-existent lockReader!  Notice that we don't even care about the name of the proxy object.  We just let the using statement take it and dispose of it when we leave the scope.

Conclusion

So there you have it... in a nutshell, we have used a proxy class, a wrapper class and the using statement to create something very similar to the lock statement for the ReaderWriterLock and completely avoid a try-finally statement.  The code is easy to read, and should be easier to maintain.

Saturday, December 21, 2013

About Me

Who am i...

I am a Software Engineer at Lockheed Martin Missiles and Fire Control.  I am the software lead for the Telemetry group.  I have a beautiful wife and two small boys who all inspire me. 


Schooling

I received my BS of Electrical Engineering from the University of Oklahoma (OU).  (Which was the last time I did anything that resembled Electrical Engineering...)  I am now starting my MS of Software Engineering at Southern Methodist University (SMU).


Software Upbringing

Throughout my career I have gone through the normal paradigm progression:
  • Imperative Programming in C.
  • Grew to appreciate library development with Object Oriented Programming in C++.
  • The desktop application world lead me to WinForms/WPF and C#.
  • And now I am dabbling in the elegance of functional programming with F# and Haskell.

Software Fervor

I am jubilant to build a data structure for a complex problem.  Excited to convert a synchronous process to a streamlined asynchronous system.  And fulfilled to find a benchmark that matches my predictions.

This blog will center around C# and my different ideas and design structures.  I encourage everyone to love and hate my ideas.  With everything, I do this hoping to learn.  If I present something that is misguided, inaccurate, or just plain wrong... then don't hesitate to say so!


Where else online...

You can find/contact me on LinkedIn and StackOverflow.  I try to stay active on both.  And if you're interested, I have an online resume out there as well.