Andy Crouch - Code, Technology & Obfuscation ...

Debugging The Right Problem

Man With Head In Hands

Photo: Unsplash

“Continuous effort - not strength or intelligence - is the key to unlocking our potential.” - Winston Churchill

Sometimes it helps to have experience.

A developer was having an issue with a SQL query running over a large data set. The query was straightforward and on their machine, it ran on under a second. On a similarly sized data set in the Staging environment, it took 50. That’s quite the difference.

The developer had look at all the obvious things. Reviewed the execution plan and tested the code that executed the query. They talked me through all that they had done and to be honest it was the exact approach I may have taken.

Yet, the one thing that stood out was the fact that the query was running on a new Staging database on Azure. So I suggested that we look at its configuration. Th developer was as surprised as me to see that the database had been restored on a Basic Azure pricing tier. Thus, the amount of DTU’s available for processing the query was limited to 5. We scaled the database up to a standard tier with an increased number of DTU’s that matched Production. Our query now matched the performance expected on running it.

Sometimes when looking at issues it is easy to assume it is the code that is at fault. It’s the most logical. Although I would argue that if you have tests based on specifications your code should rarely be the issue. Many a developer has been tricked by a SQL query that isn’t quite thought through but which works on a sample data set only to fail in production. But, it can be an environment issue or setting or something you would not immediately think of. This is where talking the issue through with someone will save you a lot of time. We stress to our less experienced developers that they should seek help with any issue if they have struggled for more than 15 minutes. Talking the issue through with someone (or a rubber duck) results in a faster solution.

If you’d like to discuss experiences of difficult problems you’ve had to debug message me via twitter or email.

Modern Data Exchange

Chimney Smoking In Country Setting

Photo: Unsplash

It’s 2017 and many of the partners that I deal with are still exposing their data via FTP. This makes me sad!

That (s)FTP is still a mainstream method of exchanging data demonstrates laziness on both sides of the connection.

From a supplier side, you are saying you do not trust your customers. Or you are saying your system is not designed with collaboration in mind. If trust is an issue and you are generating files for a customer to collect what is different to providing that data via an API endpoint? If you do not want customers accessing your application then you can segment an API. You are running an FTP server so why not use it to make the customer feel enabled?

(s)FTP is not as secure as Https. In the majority of cases that I have used it, (s)FTP is used to get files that are then processed. That means when processing you could suffer undesirable behaviour from the malicious code. You have no way of verifying that what was uploaded to the (s)FTP server has not been tampered with. (You are taking precautions right, I mean you are not storing data straight from an FTP file to your database?!!?) Using a secure, encrypted API endpoint reduces those risks.

From a customer side, why are we putting up with it? I am building modern, fast software. Why should I settle on working with company’s that insist on using such old (40 years) technology? Why should I have to carry the burden of checking a server for updated files? Why would I choose this over a company that provides mechanisms to notify me of changes (webhooks etc)? Why do I want to have to process files? I was exchanging XML back in 2005 and JSON simplifies things even more.

Here is a real example of a company that I deal with. They provide their data as CSV files. When we had the meeting to talk technical details they asked what version of the file I wanted. “Erm, isn’t there only one CSV file format?” You’d think! Turns out that the data is provided to utility company’s. They each insist that their data is formatted in a particular order, same data just different order. This has lead my partner to maintain over 90 versions of the same file. Each version contains the same data in just a different order. Now imagine how easy it would be to do this via an API?

If you have similar frustrations message me via twitter or email to vent.

Code Naming Tips

Open Books

Photo: Patrick Tomasso - Unsplash

Code style is like fashion style. It is very opinionated and covers a spectrum of good to bad taste with ease. Ask 10 developers how to name things in code and you will likely get 30 answers.

The main thing to remember, as I often state here, is that you will read code many more times than you write it. So, it figures that if you use well defined and clear names in your code it will make it easier when you come back and read it.

Below are a handful of suggestions that I try to adhere to in my own code and which I guide my team to. I do not suggest this is the perfect approach but I have adjusted it over the years through many projects.

1 - If your language has namespaces, use them to segment your code.

An odd opinion to share first but let me explain. Your variable and object names should be short and descriptive. What they should not do is repeat the namespace. Nor should they be tied to a feature by name or be used to group related objects. I have seen namespaces ignored many times in projects which leads to code like

namespace Foo
{
    public class FooReader{  }
    public class FooParser{  } 
    public class FooCalculatorService{  } 
    public class FooDatabaseService{  }  
}

There is no benefit to prefixing Foo to each class name. Use the namespace functionality of your language to segment and organise your objects.

2 - Names should be as descriptive and short as possible.

A bit of an oxymoron there but bear with me!

var o = new NamespaceFeatureFunctionPattern();

// lots of code

if(o.SomeProperty == true)
{
    // do something
}

So a variety of the above code is very common. Short variable names and abbreviations were very common back before I started. A raft of reasons including memory and screen real estate caused their use. But, this is 2017 and I am typing this on a Full HD screen with 32gb of RAM. So those reasons are no longer valid (in most cases). The trouble with very short or abbreviated names is that it makes reading the code really hard. If you use lots of o, a, j, i type variable names then very soon you won’t be able to determine which relates to what type. There are times that short and abbreviated names make complete sense such as counters in loops or indexes.

At the other end of the spectrum is the really, really, long variable names. I worked with a dev once that was so descriptive you would have to scroll to read the whole variable name. I am not joking! This is only just as bad as short names as it makes the code hard to follow and format. Although you do know what the variable relates to.

So how can we improve the code above? First, we need a short and descriptive name for o. We also need to name our object better.

using Myproject.Facades;
var gcalFacade = new GmailCalendarFacade();

// lots of code

if(gcalFacade.SomeProperty == true)
{
   // do something
}

As per point 1, do not include the namespace in your names. I have added a using directive in the code above. Then I have named the object GmailCalendarFacade. It is clear what the object is and indicates any pattern it is based on. The variable is then a shortened name that clearly states what it is referencing. The approach you take can vary based on the items you are referencing together.

3 - Code should read like prose.

Given we read code so much more than write it, it makes sense to make it as easy to read as possible. So, write your code like you would write a story.

A great test for this is to debug some code with a non-developer. Get a business side colleague to sit and run through your code and see if they can follow it. If you have written it clearly enough that they can then you have well named and maintainable code.

4 - It is OK to negate a condition in a name

Some guides you read state that you should not use names such as IsNot or DoesNot. I do not agree and would say you are still aiming for point 3 above.

When did your favourite author last use != in a book?

5 - Use your languages style guide as a guide

All languages have style guides created, usually by the community. Some guides are more strict than others (yes PEP 8 I am talking about you). These guides will offer specific naming and formatting advice for your language. Sometimes a language has more than one community written guides. Pick one, stick to it and remember that consistency is the key. This is especially true in teams.

I have written this based on my experience on many projects built using .Net, Python and Ruby. I am sure that the advice might not carry over to some languages. Functional languages especially put in place different naming conventions. These may recommend shorter more math base naming conventions. The important thing is to ensure your code is clear and concise and following an agreed standard.

If you’d like to discuss my thoughts here then message me via twitter or email.

Incorrect Use Of .Net Garbage Collector

Dustbin On Misty Street

Photo: Unsplash

I received a Pull Request a few days ago which included the following 2 lines repeated in 2 methods in a class.

GC.Collect();
GC.WaitForPendingFinalizers();

Now if ever a couple of lines would make me sit up and take notice it would be these two.

Some context. The code I was reviewing was a console application which was to run as a web job on Azure. The aim of the job was to join two sets of data and pick out the largest value for each month. The results were then persisted back to the database.

I started to read through the code. Most of it was boilerplate for how we build web jobs. The crux of the whole application was in a service processing class. When I got to the bulk of the service class I found something like:

var dataList = GetDataToBeProcessed();
while (dataList.Count > 0)
{    
  var subDataList = dataList.Take(500);
  RunCalculations(db, subList);
  
  // some other code .....
  
  GC.Collect();     
  GC.WaitForPendingFinalizers(); 
}

I peeked into RunCalculations() and found:

var entity	= _unitOfWork.GetRepository<Entity>().GetByKey(object.EntityId);
var fromDate = new DateTime(model.Year, model.Month, 1);
var toDate = new DateTime(model.Year, model.Month, DateTime.DaysInMonth(model.Year, model.Month));
model.ValueToBeSet = entity.CalculateValue(fromDate, toDate, _unitOfWork).Value;
GC.Collect();
GC.WaitForPendingFinalizers();

(As an aside I hate the use of nullable fields in all but a very select set of circumstances. I will write them down one day).

I decided to run the application and see what it would do. The logic was not far off the spec but I wanted to see how fast it would process our data. Locally it would be touching about 300,000 records. So I set it off. I waited and I read some emails and realized it had still not run. Tldr; I left it for an hour. One of the things that I like to add in Web Jobs is a lot of console writing/logging. It really helps debug issues once on Azure. There was none in this initial version of the application so I was wondering if it was even running still.

I killed off the app and re-read the code and instantly saw one of the issues. During the loop of the RunCalculations() method the developer was using a UnitOfWork class to pull back an Entity record on each iteration. That meant for each iteration we had a database call.

It was at this point I got the developer to look at the code with me. I started by asking them to walk me through the code. I wanted to get an idea of how they saw the code and how it executed. When they were finished I asked how long it took to run on their machine.

“Oh, the first time it will take a long time as there is a lot of data to process. I had problems running it on my machine so I decided to add code to make it process faster. The GC calls free up memory and the Take command batches the processing into smaller pieces.”

Now I had an insight into why the code was written in such a way. After discussing the approach they had taken I asked the developer if they knew how many database calls their code made. They did not. Out came SQL profiler and we run the job again on my machine. They saw straight away the issue.

The number of database calls was only one of many issues. The query used to return the bulk of the data was not filtering on any criteria. They were bringing back all of the data in the database. They had added logic to entity classes (which by choice we keep anaemic).

We spent a significant amount of time pair programming to refactor the service class. When doing exercises like this with a developer I lead with questions the whole way. The 5 Whys method comes in handy in this situation.

“Why do You have GC calls in your code?” - “Because the memory gets eaten up by the objects required for the calculations”. “Why do the calculations have so many objects” - “Because the calculation needs data and logic”. etc …

After refactoring chunks of code as we progressed through the questions our initial method ended up looking like:

processingDataList = processingDataFactory.GetPendingDataToBeProcessed();
RunCalculationsOn(processingDataList);
var dataToImport = processingDataList.Except(_failedItems.Keys.ToList());
tableGateway.ImportData(dataToImport.ToList());

and the RunCalculations() Method ended up looking like:

Parallel.ForEach(processingDataList, (dataItemToProcess) =>
{
  try
  {
    var fromDate = new DateTime(dataItemToProcess.Year, dataItemToProcess.Month, 1);
    var toDate = new DateTime(dataItemToProcess.Year, dataItemToProcess.Month, DateTime.DaysInMonth(dataItemToProcess.Year, dataItemToProcess.Month));
    
    using (var calculation = new Calculation())
    {
      var result = calculation.RunFor(dataItemToProcess.MeterId, fromDate, toDate);
      
      if(result.HasValue)
          dataItemToProcess.MaxDemand = result.Value;
      else
          _failedMeterMaxDemand.Add(dataItemToProcess, new Exception("There was no data to be processed"));
    }
  }
  catch (Exception ex)
  {
    _failedMeterMaxDemand.Add(dataItemToProcess, ex);
  }
}

The orchestration method in the processing service is now nice and clean and has all behaviour injected (omitted for brevity) into it. It gets, processes and imports the data. Simple, clean logic. The RunCalcuaitos() method is also much improved. The calculations have been removed from the entity and are wrapped in their own, disposable, class. With a parallel ForEach loop, the data needed to be processed access prior to the processing rather than during it and no GC calls.

With the improvements made a rerun of the application resulted in all records being processed and imported into the database in under 1 minute. The developer’s machine never once looked like it was struggling and there was no need for manual GC collection.

If you’d like to discuss any of my thoughts here then as always please contact me via twitter or email.

String Calculator Kata

Orange Ball Of Wool

Photo: Philip Estrada - Unsplash

I wrote recently about using Kata’s regularly to practice and improve your skills. I have been using them with a junior developer to support his training plan. The latest kata we have added to his practice is the String Calculator kata from Roy Osherove.

The premise of the kata is really straightforward. There are 9 steps to fully complete the kata with a time limit of 30 minutes. The aim is to define a method that accepts a string of numbers and returns an integer containing their sum. As the kata goes on you have to improve the method to handle delimiters of varying lengths.

I have created a C# based solution for the kata which can be found on Github. I attempted to commit as I implemented each test or point of the kata. If you read back through the commit history you can see the approach I took.

The important aspect of this kata is to following the steps without reading ahead. Implement your tests and only the code that you need to pass the test. You can and should refactor as you go but only to ensure you are doing the minimal amount to pass the next test.

If you’d like to discuss anything relating to the kata or my solution then as always please contact me via twitter or email.