SecondForge

The Cost of Proposals

Dec 9th, 2014 1:52 am mistakes

Organizations are reasonably good at weighing the costs and the benefits of potential reorganizations. Most will do a careful analysis, receive comments from stakeholders, subject the analysis to an independent review, and allow an accountable governing body to make a decision.

On the other hand, organizations rarely understand the cost of their decision making process. Not in dollars, cents, and hours, but in lost reputation. This essay is a cautionary tale on the hidden costs of proposed re-organizations.

In 2011 and 2012, Dr. Cammy Abernathy, the Dean of the University of Florida’s College of Engineering, was faced with budget cuts. She proposed that, in order to save about one million dollars per year, the Department of Computer & Information Science & Engineering (“the computer science department”) be merged with the Department of Electrical Engineering. This would entail cutting some nontenured staff, and in some iterations of the proposal, eliminate the PhD in Computer Science and other CS research positions.

At no point did Dean Abernathy propose cutting the undergraduate computer science program, or entirely eliminate computer science. To her credit, many of the sub-disciplines that would be in CISE at other schools are already in EE at UF - in particular, UF’s highly successful Machine Intelligence Lab is a part of Electrical Engineering, not Computer Science.

There was a strong public outcry against this move. As is often the case, in the process of raising support, the story changed from “merge CISE into EE” to “Dean Abernathy is going to cut computer science”. Undergraduate students posted Facebook statuses about having to find a new major (at no time did anybody ever consider forcing current CS undergraduates to find a new major). Things were heated.

Fine. There’s a whole story about the cost of that outcry and the face that was lost by the College of Engineering’s administration in the eyes of faculty. But now, years after the fact, the costs continue.

Recently Ars Technica ran an article titled “To address tech’s diversity woes, start wit the vanishing Comp Sci classroom” about the decline of computer science classes in secondary education. The focus of the article was on the Advanced Placement Computer Science Exam, which is offered by fewer schools than it used to be. The article discussed this in the context of the diversity crisis in tech. A good article all around.

However, as part of the article, the author stated that “Colleges including the University of Florida and Albion have cut their programs in the last few years.” Given that the sentence immediately preceding, it’s clear that the author believes the University of Florida - a top 50 school, and top 100 globally - does not have any Computer Science department at all.

And, conservatively, 25,000 people active in the tech industry have read that article, and that sentence, without batting an eyelash.

Authors make mistakes - this is not the fault of the author. Readers are generally right to trust authors on reputable sites like Ars Technica.

This - the fact that tens of thousands of technologists now think UF doesn’t do computer science - is just a cost of the proposal made years ago.

Acer C720 Chromebook Review

Dec 8th, 2014 10:37 am

I’ve been lusting after the Acer C720 for some time now. I obtained one the day after Thanksgiving - $150, shipped, from Amazon - and have a chance to reflect on how I use it now.

For context, I have a home built desktop assembled in May 2014 and a nice Thinkpad-X220-T which was purchased in July 2011. I run Linux full time on all my machines - currently, Xubuntu 14.10.

The Thinkpad has served me well, and I have come to appreciate the fantastic keyboard and trackpoint mouse that it comes with. However, after building my desktop, I use the Thinkpad much less - really, only for taking the occasional note in class, reading documents, and sending email.

For that, the Thinkpad is much heavier than necessary. It also has only mediocre battery life (about 4 hours at 50% brightness, with an old battery) and throws off more heat and noise than I’d like. It also represents more of an investment than I care to travel with - I took my ThinkPad to Finland for Frozen Rails 2014, and always had to be conscious of where I left it in the hostel, on the bus, etc, as replacing it would cost on the order of a thousand dollars.

The chromebook addresses all of these issues, though with significant trade-offs. The C720 has an 11.6” screen (only marginally smaller than the X220T’s 12.5” screen), about 8 hours of battery life with Linux installed and at normal brightness, and only cost me $150, making it much less stressful to travel with. On the other hand, it has a tiny 16GB hard drive, just 2GB of RAM, and limited ports.

I’m happy to report that getting Xubuntu set up on the Chromebook was reasonably easy. I haven’t removed the write-protect screw to make it boot into Linux full time, yet, as it is still under Acer’s warranty, but I haven’t been bothered by having to hit Ctrl+L at boot to tell Chrome to ignore the other operating system. The only catch was that getting the trackpad to work required upgrading to Kernel 3.17 - have a mouse on hand when you’re setting it up.

In terms of day to day use, working on the chromebook required some adjustments, but after adjusting, has been a peasant experience.

The major complaint is that the keyboard is very poor, especially when adjusting from a ThinkPad or a mechanical desktop keyboard. The keys have very little travel, little resistance, and as with all chiclet keyboards, require bottoming out to register a key press. Not only are the function keys not labeled as such, but there are missing ones; only F1 through F10 are available (with the wrong labels), and making them wider means that they aren’t in the locations you expect - when trying to hit F4 I consistently hit F3, when trying to hit F7 I hit F5, and it’s impossible to hit F11 and F12 as they’re missing entirely.

This might not be an issue for you, but it has required some adjustments and some changes to my work flow and keyboard shortcut sets.

The lack of hard drive space. I haven’t had an issue with hard drive space yet - but I have kept my installed programs to a bare minimum. I have Xubuntu, but have removed the office suite tools. I installed Chrome, but removed several other programs I rarely use. I don’t keep my entire development directory on this machine - just one project, with one ruby, and a minimal set of gems, and a handful of documents.

I did try using nitrous.io, but found it to be far too slow and clunky for my needs. It’s a great product - I wish them the best - but if all I’m going to use it for is a box with vim + ruby in the cloud, I’ll just set it up myself.

As such, I use SSH a lot. I keep my home desktop running nearly 24/7, with dynamic DNS set up, so I can access it from the classroom, the library, the bus, or an airport lounge. I run tmux with vim and a handful of other command line tools, so latency is barely an issue. And hard drive space and RAM are the last of my issues on the desktop box. I scp files to a temporary folder as I need them to get access to large documents I only need occasionally.

The only other change that I’ve had to make is keeping fewer tabs open. Instead of 10+ tabs on each of four workspaces on my desktop, I run one workspace, with one browser, usually with just two or three tabs. This has yet to prove to be an issue for me. I think it might even be beneficial as it makes it harder to become distracted - opening reddit either requires admitting defeat and closing my productive tabs, or risking running out of RAM and the machine becoming unresponsive.

Perhaps what is most frustrating is just how tantalizingly close this machine is to the perfect machine, along with the knowledge that I’ll probably never get what I want. Extend the battery life to 12 hours, put in a good keyboard, and add a built-in 3G/LTE modem, and this laptop would be exactly what I’ve always wanted. Sad, then, to realize that keyboards are getting worse and that I haven’t seen a LTE modem in a laptop in years.

Final verdict: Great purchase, though I can’t tell you if Chrome OS is any good.

Debugging Encoding and String Whitespace Issues in Ruby

Nov 6th, 2014 1:46 pm ruby

Dealing with text from the wild is often a painful experience. Even though ruby has great libraries like Nokogiri which can help you parse XML and HTML and avoid losing your mind, it can’t save you from some subtle string encoding issues that occasionally crop up.

On a current project, I’ve been scraping a bunch of data from a variety of websites that offer similar products to each other. Each product has a code that looks something like AAA 0213; i.e. three or four letters, a space, and then four numbers. This is easy to pull out with a ruby regular expression - just use string.scan(/([A-Z]+\s[0-9]+)/), stick the result in the database, and go.

Eventually, in the applications’ search component, we ran into problems - although I could clearly see record KAA 1011 in the product listing and in the database, it wouldn’t turn up in search.

To debug the problem, I opened up a REPL with pry and confirmed the bug. Even though the product.code string appeared indistinguishable from the string that I created by typing out ‘K - A - A - spacebar - 1 - 0 - 1 - 1’, the two were not equal. Ruby’s .ord method, which returns the numerical value of a single-character string, helped diagnose the problem.

With .ord, I confirmed that product.code[3].ord was 160, while "KAA 1011"[3].ord was 32. A quick search confirmed that the Unicode character with integer value 160 is in fact the “no-break space”, explaining how they were indistinguishable.

The fix from that point was trivial - in the pre-processing filters, replace all non-breaking spaces with regular, breaking spaces - but this issue would have been far harder to debug without the Ruby String .ord method.

Happy Coding!

Simple API Authentication in Sinatra

Nov 5th, 2014 10:09 am ruby

Sinatra is a great choice for building out simple JSON APIs. Combined with Ember.js, it is now my preferred stack to build out web apps on, as opposed to using Ruby on Rails. However, there aren’t as many existing, ‘standard’ libraries for Sinatra, like Rails’ devise. So, how shall we secure a Sinatra API?

Note that I’m not a security expert and this is probably not best practice. Please treat what I’m doing here as a learning experiment.

First, a word on security: if every request between the application and the outside world is going to be routed over a secure connection using TLS (formerly SSL), then we can simplify our authentication design significantly. For many APIs using TLS, just using the bog-standard HTTP Basic Authentication is totally acceptable. Note that HTTP Basic Auth just sends a “username” and “password” in the HTTP headers without any encryption, on every request.

So, let’s assume that we’re going to use TLS when we deploy the application. We could just use HTTP Basic Auth - this is well documented. But, there are certain oddities of Basic Auth, including the way that browsers often interact with it. Plus, if we use basic auth, the user has to send their password to the server with every request, and we have to do a password comparison on every request. This isn’t fun, especially if we’re using a computationally expensive password hashing algorithm, as we should.

Instead, we will implement a basic variation of token based authentication.

Here’s the procedure:

User sends their username and password to a /login action
The server generates and stores a very large random token, and returns it to the user
The user sends their token with future requests to the server

There are several major advantages here. One, the user doesn’t repeatedly send their password - if a token gets stolen or packet sniffed, it is a concern, but we can just expire the token. We can also change our token scheme easily, issuing multiple tokens to the same user, revoking all of a user’s tokens, issuing different tokens with different rights, and so on, as our applications’ requirements change. If we want, we can issue different tokens with different expiry times - say, a 120 day token for a user who runs our app on a kiosk, and a 15 minute token when the user logs in from a new IP. Finally, we don’t have to do any password hashing or user lookup by email or username when we receive a token, which can increase speeds.

First, a basic user model - I’ll not mention how the user is persisted, but assume that it is. To secure the password, we can use the bcrypt-ruby gem.

class User
  include BCrypt
  attr_accessor :email, :password_hash, :token

  def password
    @password ||= Password.new(password_hash)
  end

  def password=(password)
    self.password_hash = BCrypt::Password.create(password)
  end
end

The ruby BCrypt gem is great, because it lets us do password comparison between regular strings and the encoded hash automagically. So, our login check can be as simple as:

class App < Sinatra::Base

  before do
    begin
      if request.body.read(1)
        request.body.rewind
        @request_payload = JSON.parse request.body.read, { symbolize_names: true }
      end
    rescue JSON::ParserError => e
      request.body.rewind
      puts "The body #{request.body.read} was not JSON"
    end
  end

  post '/login' do
    params = @request_payload[:user]

    user = User.find(email: params[:email])
    if user.password == params[:password] #compare the hash to the string; magic
      #log the user in
    else
      #tell the user they aren't logged in
    end
  end
end

The before do block is just some magic to parse a JSON request body - you can replace it as you desire if you’d rather accept non-JSON requests, auth details in headers, or something else. If the user is logged in, we want to generate a token, so let’s do that:

class User
  #...

  def generate_token!
    self.token = SecureRandom.urlsafe_base64(64)
    self.save! #persist
  end

end

The SecureRandom.urlsafe_base64 method just generates a nice, long, random, url-safe string. There is a similar base64 method that doesn’t guarantee url-safety; for web applications, there is no real disadvantage to making it url-safe. Whether or not you make a generate_token! method that does the saving itself or defer persistence to the controller is up to you.

Now, we can finish out our controller method:

class App < Sinatra::Base

  post '/login' do
    params = @request_payload[:user]

    user = User.find(email: params[:email])
    if user.password == params[:password] #compare the hash to the string; magic
      user.generate_token!
      {token: user.token}.to_json # make sure you give hte user the token
    else
      #tell the user they aren't logged in
    end
  end

end

Now we need to be able to protect API routes and restrict them to logged in users. All we need to do is check the token that the user sends us to see if it is valid. Here’s an example:

class App < Sinatra::Base

  def authenticate!
    @user = User.find(token: @request_payload[:token])
    halt 403 unless @user
  end

  get "/protected" do
    authenticate!
    do_something_special_with_user(@user)
  end

end

If the user has a valid token, we’ll be able to pull up their record, set the user record as appropriate, and do something. If not, halt and send them a 403 Unauthorized response - by calling halt, Sinatra will make sure nothing more gets executed from that route handler.

From here, you can add features as desired - accept the token in a HTTP header instead of in the request body (good for when clients might do pre-flight HEAD requests), expire the tokens automatically based on their age, create a delete action that removes tokens, allow users to have multiple tokens, associate certain access rights with a token, and so on, all within the same basic framework. Just make sure you put your web service behind TLS!

Happy coding!

Matching Newlines or Another Character in Ruby Regular Expressions

Nov 1st, 2014 2:00 am ruby

While working on a recent project, I ran into an interesting problem: how can I make a regular expression match either the end-of-line, or another character?

The context is this: I have a list of things that I want to get the values of, which are separated by a certain character, but is not terminated by the same character. For example, consider the string “a thing I want to match; another thing I want to match; a third thing I want to match”. How can we use .scan to get an array [["A thing I want to match"], ["another thing I want to match"], ["a third thing I want to match"]]?

If you’re still learning regular expressions or not familiar with the subtleties of Ruby regular expressions, you might try the following, with the intention that [;$] match either ; or the end of line, usually represented by $.

string.scan(/(.+?)[;$]/)

However, within a character class, $ always matches the character $ - see the first sentence of the first answer on this stack overflow question. Which means that the above regex will only match the first two targeted strings.

How to solve this? As far as I can tell, there isn’t an easy way to do it with regular expressions - the easiest way around this is to just make use of another Ruby string method, .split. Just call string.split(";"), then use a regex on each of the split strings to do more filtering if need be.

Happy coding!