Refactoring Ids_from_emails in Ruby

ruby

As part of the development of UpsideOS, I refactor old but functional code on a rolling basis. As a lone developer the trade-offs between ‘ship it’ and ‘do it right’ are even more pressing than ever, so there is plenty to refactor. Here is the thought process I went through while refactoring one of the functions in UpsideOs.

We let users add new users to a team by specifying their email address. We need the ids of the user records to create the association. Here is a static method that takes a mixed array of integer user ids and email addresses and returns an array of integer user ids. The User class is a Rails ActiveRecord class.

def self.ids_from_emails(array)
  if array
    new_array = []
    array.delete("")
    array.each do |id|
      if id.is_a? String and id.to_i == 0
        new_array.push User.find_by_email(id).id if User.find_by_email(id)
      else
        new_array.push id.to_i if id.to_i and id.to_i != 0
      end
    end
    new_array
  end
end

reek identifies two code smells:

ids_from_emails calls User.find_by_email(id) twice (DuplicateMethodCall)
ids_from_emails calls id.to_i 4 times (DuplicateMethodCall)

We can ‘fix’ these duplicate method call code smells using temporary variables:

def self.ids_from_emails(array)
  if array
    new_array = []
    array.delete("")
    array.each do |id|
      id_int = id.to_i
      if id.is_a? String and id_int == 0
        temporary_user = User.find_by_email(id)
        new_array.push temporary_user.id if temporary_user
      else
        new_array.push id_int if id_int and id_int != 0
      end
    end
    new_array
  end
end

This fixes the code smells in reek, but introduces a new TooManyStatements smell. The tests still pass. But the is not improved. The code smells pointed to deeper issues.

The method is written in a procedural, imperative style. It is 14 lines long. Moving towards a declarative style could improve it. The is_a? call in 6th line stands out; explicit type checking often indicates deep issues, though in a mixed array of integers and strings it may be unavoidable.

First, skip checking if array is nil by setting it to a default empty array. If set to [], the array.each method will not iterate over anything and the proper, empty new_array will be returned. Whether the method is ever called with nil as the argument is another problem, but outside the scope here.

def self.ids_from_emails(array)
  array = array || []
  new_array = []
  array.delete("")
  array.each do |id|
    #...
  end
  new_array
end

Creating a new_array and returning it explicitly is imperative as is calling .each. Since each returns the array it was called on, it can only do useful work by side effects. We can instead use map, which returns an array of block results.

However if the iterator block fails to find a record, nil will be added to the returned array. The array.delete("") line was also added to address this, so we can simplify things by just removing nil values before we return. Ruby’s array#compact does nicely:

def self.ids_from_emails(array)
  array = array || []
  array = array.map do |id|
    #...
  end
  array.compact
end

That inner block is now more than half of the method, suggesting we could refactor it out into a new method. Whether or not this makes sense depends on whether we expect this method to change much in the future, but we can try it:

def self.ids_from_emails(array)
  array = array || []
  array = array.map{|value| id_from_email_or_id(value)}.compact
end

def id_from_email_or_id(value)
  if value.is_a? String and value.to_i == 0
    User.find_by_email(value).id if User.find_by_email(value)
  else
    value.to_i if value.to_i and value.to_i != 0
  end
end

The tests still pass, but the newly created method is ugly. It still has a call to is_a?, which is sub-optimal. There are in fact three cases that have to be handled here: either an id is passed, in which case it must be converted to an integer, or an email is passed and the User is in the database, in which case we should look up their id and include it, or some string that is not an integer id or a valid email address is put in in which case we should return nil.

The sets of valid ids and valid email addresses are mutually exclusive. Hence:

def self.ids_from_emails(array)
  array = array || []
  array = array.map{|value| id_from_email_or_id(value)}.compact
end

def id_from_email_or_id(value)
  (User.find_by_email(value).id rescue nil) || (User.find(value).id rescue nil)
end

This final refactor has no code smells in reek and is closer to idiomatic ruby. The style is more declarative than imperative, and the result is easier to read. Using rescue so heavily may not be quite as efficient as the old imperative style, but until speed becomes an issue, the maintainability benefits are probably worth it. The method has additionally been reduced from 14 lines to 3 between two methods, an (admittedly subjective) aesthetic improvement.


I'm looking for better ways to build software businesses. Find out if I find something.