Andy Crouch - Code, Technology & Obfuscation ...

Incorrect Use Of .Net Garbage Collector

Dustbin On Misty Street

Photo: Unsplash

I received a Pull Request a few days ago which included the following 2 lines repeated in 2 methods in a class.

GC.Collect();
GC.WaitForPendingFinalizers();

Now if ever a couple of lines would make me sit up and take notice it would be these two.

Some context. The code I was reviewing was a console application which was to run as a web job on Azure. The aim of the job was to join two sets of data and pick out the largest value for each month. The results were then persisted back to the database.

I started to read through the code. Most of it was boilerplate for how we build web jobs. The crux of the whole application was in a service processing class. When I got to the bulk of the service class I found something like:

var dataList = GetDataToBeProcessed();
while (dataList.Count > 0)
{    
  var subDataList = dataList.Take(500);
  RunCalculations(db, subList);
  
  // some other code .....
  
  GC.Collect();     
  GC.WaitForPendingFinalizers(); 
}

I peeked into RunCalculations() and found:

var entity	= _unitOfWork.GetRepository<Entity>().GetByKey(object.EntityId);
var fromDate = new DateTime(model.Year, model.Month, 1);
var toDate = new DateTime(model.Year, model.Month, DateTime.DaysInMonth(model.Year, model.Month));
model.ValueToBeSet = entity.CalculateValue(fromDate, toDate, _unitOfWork).Value;
GC.Collect();
GC.WaitForPendingFinalizers();

(As an aside I hate the use of nullable fields in all but a very select set of circumstances. I will write them down one day).

I decided to run the application and see what it would do. The logic was not far off the spec but I wanted to see how fast it would process our data. Locally it would be touching about 300,000 records. So I set it off. I waited and I read some emails and realized it had still not run. Tldr; I left it for an hour. One of the things that I like to add in Web Jobs is a lot of console writing/logging. It really helps debug issues once on Azure. There was none in this initial version of the application so I was wondering if it was even running still.

I killed off the app and re-read the code and instantly saw one of the issues. During the loop of the RunCalculations() method the developer was using a UnitOfWork class to pull back an Entity record on each iteration. That meant for each iteration we had a database call.

It was at this point I got the developer to look at the code with me. I started by asking them to walk me through the code. I wanted to get an idea of how they saw the code and how it executed. When they were finished I asked how long it took to run on their machine.

“Oh, the first time it will take a long time as there is a lot of data to process. I had problems running it on my machine so I decided to add code to make it process faster. The GC calls free up memory and the Take command batches the processing into smaller pieces.”

Now I had an insight into why the code was written in such a way. After discussing the approach they had taken I asked the developer if they knew how many database calls their code made. They did not. Out came SQL profiler and we run the job again on my machine. They saw straight away the issue.

The number of database calls was only one of many issues. The query used to return the bulk of the data was not filtering on any criteria. They were bringing back all of the data in the database. They had added logic to entity classes (which by choice we keep anaemic).

We spent a significant amount of time pair programming to refactor the service class. When doing exercises like this with a developer I lead with questions the whole way. The 5 Whys method comes in handy in this situation.

“Why do You have GC calls in your code?” - “Because the memory gets eaten up by the objects required for the calculations”. “Why do the calculations have so many objects” - “Because the calculation needs data and logic”. etc …

After refactoring chunks of code as we progressed through the questions our initial method ended up looking like:

processingDataList = processingDataFactory.GetPendingDataToBeProcessed();
RunCalculationsOn(processingDataList);
var dataToImport = processingDataList.Except(_failedItems.Keys.ToList());
tableGateway.ImportData(dataToImport.ToList());

and the RunCalculations() Method ended up looking like:

Parallel.ForEach(processingDataList, (dataItemToProcess) =>
{
  try
  {
    var fromDate = new DateTime(dataItemToProcess.Year, dataItemToProcess.Month, 1);
    var toDate = new DateTime(dataItemToProcess.Year, dataItemToProcess.Month, DateTime.DaysInMonth(dataItemToProcess.Year, dataItemToProcess.Month));
    
    using (var calculation = new Calculation())
    {
      var result = calculation.RunFor(dataItemToProcess.MeterId, fromDate, toDate);
      
      if(result.HasValue)
          dataItemToProcess.MaxDemand = result.Value;
      else
          _failedMeterMaxDemand.Add(dataItemToProcess, new Exception("There was no data to be processed"));
    }
  }
  catch (Exception ex)
  {
    _failedMeterMaxDemand.Add(dataItemToProcess, ex);
  }
}

The orchestration method in the processing service is now nice and clean and has all behaviour injected (omitted for brevity) into it. It gets, processes and imports the data. Simple, clean logic. The RunCalcuaitos() method is also much improved. The calculations have been removed from the entity and are wrapped in their own, disposable, class. With a parallel ForEach loop, the data needed to be processed access prior to the processing rather than during it and no GC calls.

With the improvements made a rerun of the application resulted in all records being processed and imported into the database in under 1 minute. The developer’s machine never once looked like it was struggling and there was no need for manual GC collection.

If you’d like to discuss any of my thoughts here then as always please contact me via twitter or email.

String Calculator Kata

Orange Ball Of Wool

Photo: Philip Estrada - Unsplash

I wrote recently about using Kata’s regularly to practice and improve your skills. I have been using them with a junior developer to support his training plan. The latest kata we have added to his practice is the String Calculator kata from Roy Osherove.

The premise of the kata is really straightforward. There are 9 steps to fully complete the kata with a time limit of 30 minutes. The aim is to define a method that accepts a string of numbers and returns an integer containing their sum. As the kata goes on you have to improve the method to handle delimiters of varying lengths.

I have created a C# based solution for the kata which can be found on Github. I attempted to commit as I implemented each test or point of the kata. If you read back through the commit history you can see the approach I took.

The important aspect of this kata is to following the steps without reading ahead. Implement your tests and only the code that you need to pass the test. You can and should refactor as you go but only to ensure you are doing the minimal amount to pass the next test.

If you’d like to discuss anything relating to the kata or my solution then as always please contact me via twitter or email.

Moto G Bluetooth Issue

Motorola G

Photo: Andy Crouch

The Moto G is an amazing phone. I do not own one but my wife has owned every model up to the 4th gen. For around £150 you get a very reasonable phone which does exactly what my sons and my phones do for a lot more money. The camera is very usable, the memory expandable and games play really well. So the last 5 years or so have been plain sailing with zero support from me needed.

Until this weekend!

Google Play’s, Play Protect, notified her that it had found a harmful Bluetooth sharing app. Having lived with me for 20 years she knew that it needed to be take seriously and hit the Deactivate button. She then carried on without a thought about it. That was until she tried to connect to the car’s Bluetooth later that day.

Play Protect had actually flagged the Lenovo installed Bluetooth Sharing app as harmful. That took me the best part of an hour to find out from some serious Googling about. The solution is to

  • Open the Settings app.
  • Got to Applications
  • From the menu on the top right of the screen select “Reset Application Settings”
  • Then reboot your phone.

This will solve the issue. I am led to believe that many Moto G users have reported this to Google so it should not happen again.

If you’re having a simular issue and this solution does not work then feel free to reach out via twitter or email and I will see if I can help.

The Power Of .Net Generics

Man Cutting Cloth From Template

Photo: Dương Trần Quốc - Unsplash

Generics are a genuinely useful and powerful tool in .Net. Sometimes developers see part of the potential but not the whole story. This was the case in a recent Pull Request I reviewed.

The Pull Request in question related to some cleaning up that a developer had undertaken within our user management pages. These are standard lists of different types of user which can be filtered, exported to excel etc. The original developer of these pages had done a great Ctrl-C, Crtl-V for the first 2 different types. The developer of the Pull Request I was reviewing had set about to clean this up and used generics to do so. So far so good. The problem came when I noticed a raft of switch statements throughout two of the methods in the revised service. For example:

public MemoryStream GenerateExcelFile<T>(string searchString) where T : IUserManagementModel, new()
{
    var obj = GetUsersQuery<T>(searchString: searchString, page: 0, countPerPage: 0, sort: null, sortAsc: true);
	
    switch (obj.UserType)
    {
        case UserType.Customer:
            return _excelExportService.GenerateExcelFile(obj.UserMembershipList.Select(x => x as CustomerModel).ToList());
        case UserType.Supplier:
            return _excelExportService.GenerateExcelFile(obj.UserMembershipList.Select(x => x as SupplierModel).ToList());
        case UserType.Admin:
            return _excelExportService.GenerateExcelFile(obj.UserMembershipList.Select(x => x as UserProfileModel).ToList());
    }

    throw new Exception($"Unexpected UserType: {obj.UserType}");
}

When I see a switch statement like this I have two clear thoughts. One, this does not allow for easy further extension or modification and two, there is actually just one line of code that needs executing on different models so lets write the code to do that.

public MemoryStream GenerateExcelFile<T>(string searchString) where T : class, IUserManagementModel, new()
{
    var userQuery = GetUsersQuery<T>(searchString: searchString, page: 0, countPerPage: 0, sort: null, sortAsc: true);
    
    return _excelExportService.GenerateExcelFile<T>(userQuery.UserMembershipList.Select(x => (T)x).ToList());
}

This is a nice solution that allows for further IUserManagementModel based objects to be exported to Excel without this code needing to change. This is a real world example of Bertrand Meyer’s Open/Closed principle that I mentioned in my last post.

As I continued the review I came across the issue further as shown in the GetUsersQuery() method show below:

private UserQueryResult<T> GetUsersQuery<T>(string searchString, int page, int countPerPage, string sort, bool sortAsc) where T : IUserManagementModel, new()
{
    IQueryable<GenericUserObject> query;
    UserType userType;
    
    if (typeof(T) == typeof(CustomerModel))
    {
        userType = UserType.Customer;
        query = (from c in _context.Customer
                 where c.Deleted == false
                 select new GenericUserObject { Id = c.Id, Email = c.Email, FullName = c.Name, Name = c.Name });
    }
    else if (typeof(T) == typeof(SupplierModel))
    {
        userType = UserType.Supplier;
        query = (from s in _context.Supplier
                 where s.Deleted == false
                 select new GenericUserObject { Id = s.Id, Email = s.Email, FullName = s.Name, Name = s.Name });
    }
    else if (typeof(T) == typeof(UserProfileModel))
    {
        userType = UserType.Admin;
        query = (from a in _context.Supplier
                 where a.Deleted == false
                 select new GenericUserObject { Id = a.Id, Email = a.Email, FullName = a.Name, Name = a.Name });
    }
    else
    {
        throw new Exception($"Unknown Type: {typeof(T)}");
    }

    // some other code ...
}

Once again the multiple switch statements were replicating code on objects that were all based on IUserManagementModel. It seemed obvious that the UserType and query values could actually be returned from the models. So that’s what we did:

private UserQueryResult<T> GetUsersQuery<T>(string searchString, int page, int countPerPage, string sort, bool sortAsc) where T : IUserManagementModel, new()
{
    IUserManagementModel model = new T();
    UserType userType = model.UserType();
    IQueryable<GenericUserObject> query = model.GetModelQueryFor(_context);

    // some other code ...
}

This was achieved by modifying the IUserManagementModel interface:

IQueryable<GenericUserObject> GetModelQueryFor(DbContext dbContext);
IOrderedQueryable<GenericUserObject> ApplySort(IQueryable<GenericUserObject> query, string sortBy, bool asc);

(The ApplySort() method follows the same approach as GetModelQueryFor() to remove other code in the class that started off with a switch statement handling sorting the query result).

By the time we’d finished pairing up, the developer and myself were happy with the results. We’d removed the switch statements and realised more of the power of generics. That power is released by ensuring that you push all data and behaviour down to your abstractions. That way, your generic code can be written to interact with the abstracted type and not the instance of the objects.

If you’d like to discuss any of my thoughts here then as always please contact me via twitter or email.

A Clean Fizzbuzz Solution

Water Splash

Photo: Unsplash

Fizzbuzz is an interesting little test based on a maths game used in some schools. Used to teach division, the Wikipedia entry states:

“Fizz buzz is a group word game for children to teach them about division.[1] Players take turns to count incrementally, replacing any number divisible by three with the word “fizz”, and any number divisible by five with the word “buzz”.”

Developers have used this game for many years to test interview candidates. It is also used for general programming and TDD practice (know as Kata’s). The profession has slightly altered the game and the version generally used is:

“Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.”

I have been using the Fizzbuzz problem as a Kata for one of my junior developers. They are very junior. Their training is focusing not only on language and framework features but also agile and clean approaches to coding. A lot of developers use Fizzbuzz to improve their TDD skills. Given this, I’ve always been quite surprised by the basic implementations found online.

My developer spent a couple of weeks creating slightly different versions. At each training session, we would discuss the approach taken and I would suggest changes. The developer was interested to see what my solution might look like. So I sat down and created what I consider to be an acceptable solution to the problem. This is very much a C# solution so please bear this in mind.

The full solution can be viewed on Github.

A lot of the solutions I see implement the logic in some fixed manner. For example:

public void DoFizzBuzz()
{
    for (int i = 1; i <= 100; i++)
    {
        bool fizz = i % 3 == 0;
        bool buzz = i % 5 == 0;
        if (fizz && buzz)
            Console.WriteLine ("FizzBuzz");
        else if (fizz)
            Console.WriteLine ("Fizz");
        else if (buzz)
            Console.WriteLine ("Buzz");
        else
            Console.WriteLine (i);
    }
}

It would be hard to test this code and no way to extend it without changing the implementation. If we had to change the logic for printing “Buzz” in place of numbers divisible by 4 this code would need to change. This is a clear violation of Bertrand Meyer’s Open/Closed principle. A clear explanation of the principle can be found here. To adhere to the principle I took a simple approach using an abstraction and reflection.

My FizzbuzzGenerator class was straightforward to write following a test first approach:

public class FizzbuzzGenerator
{
    List<INumberParser> _numberParsers;

    public List<string> GenerateResultsBetween(int rangeStart, int rangeEnd)
    {
        var resultSet = new List<string>();
        var inputRange = Enumerable.Range(rangeStart, rangeEnd).ToList();
        
        inputRange.ForEach(i =>
        {
            INumberParser numberParser 
                = Parsers.First(p => i % p.Divisor == 0);

            resultSet.Add(numberParser.Parse(i));
        });
        
        return resultSet;
    }
    
    private List<INumberParser> Parsers
    {
        get
        {
            if (_numberParsers != null)
                return _numberParsers;
        
            var interfaceType = typeof(INumberParser);
            return 
                _numberParsers 
                   = AppDomain.CurrentDomain.GetAssemblies()
                         .SelectMany(x => x.GetTypes())
                         .Where(x => interfaceType.IsAssignableFrom(x) && !x.IsInterface && !x.IsAbstract)
                         .Select(x => Activator.CreateInstance(x))
                         .Cast<INumberParser>()
                         .OrderByDescending(p => p.Divisor)
                         .ToList();
        }
    }
}

I defined an INumberParser interface to outline the behaviour of any object that could parse an integer to a result string. This means that I have multiple INumberParser based objects to implement the Fizzbuzz rules. My generator class is then just used to find all of the instances of INumberParser and process each number in the range specified before adding the result of the rule to the resultSet list.

Each rule is implemented as follows:

public class BuzzNumberParser : INumberParser
{
    public int Divisor => 5;
    
    public string Parse(int number)
    {
        if (number % Divisor == 0)
            return "Buzz";
        
        return number.ToString();
    }
}

This is straight forward to read. We specify the Divisor that the rule will handle and provide a mechanism to parse an input number and return the result of the rule.

I do appreciate I could actually implement a base class that inherits INuberParser and reduce the parse code down to one instance but in this case I was trying to show agility of code to the developer.

The full solution can be found here.

The benefits of a solution along these lines is clear. Firstly, this code can be easily tested and was the result of using a Test First approach. Secondly, any extensions to the rules or changes of logic are restricted to each INumberParser based rule class. The Generator class never has to change. If I now need to handle a rule that prints “burp” for any number divisible by 50 then I define a new INumberParser based class and implement the logic. I build and run the program and it will work. A single place to update our logic. There are no messy if or switch statements.

The last thing I would say about this solution is that it didn’t take very long to create. Therefore, I would question a reasonable developer not being able to create a solution similar to this when given 20 minutes to code. If I ever used FizzBuzz as an interview question I would certainly rank a candidate higher for implementing a clean, agile solution like this as this is the approach we follow and the standard I’d expect within our codebase.

If you’d like to discuss any of my thoughts here then as always please contact me via twitter or email.