Setting Up Rails on AWS Elastic Beanstalk

Jul 4th, 2015 11:39 pm rails

Background

Moving a project from an exploration and prototyping stage to a full product development stage requires re-evaluating technology decisions. One of my client projects is undergoing this process right now - moving from a small demo product to gain insight into the problem and show to investors to a going concern with dedicated staff and engineering resources.

Previously, we’d built a ruby on rails application with a ember-cli app embedded inside it using ember-cli-rails. The application was deployed to heroku, which provided some great benefits - very easy deployment (even though ember-cli-rails requires adding a second buildpack), very easy scaling, and pretty reasonable prices.

But increased engineering resources means re-evaluating the stack. We expected that at the scale we needed, heroku would be fairly expensive. The application does a serious amount of back-end processing and requires a heft database, meaning that we were looking at $200/m for Heroku Postgres tier 1, plus more than $250/month for five 2x professional dynos, before we even implemented the web tier. And if we fail in certain aspects of our engineering effort, we’re going to need to patch it with caching - looking at between $70 and $165/m for Memcachier (which we had had very good experiences with!) was not good.

Additionally, we’d like to gain flexibility on how we process some events and handle our back end technology stack. Although I love ruby and ruby on rails, ruby doesn’t offer the best tools for the heavy duty data processing and machine learning we want to work towards. Pretty much anything we want to deploy can be deployed on heroku, either using the JVM or Python build packs, but getting the auto-scaling and batch processing we think we might need would be a lot of effort.

So, we decided to re-evaluate working with AWS Elastic Beanstalk. Elastic beanstalk is ultimately a fairly thin wrapper around a whole suite of Amazon Web Service tools - anything you can do with Elastic Beanstalk you could do without it, but perhaps not as easily. It also has a command line tool set to allow heroku-esque git deployments, though they are fundamentally not the same as pushing to heroku’s embedded git repository.

Prerequisites

You’ll need an Amazon Web Services account (you can do everything here except connect the service to a domain using Route 53 using the free tier, but the Route 53 bill should be less than $1/m). Additionally, you’ll want to install the amazon web services command line toolkit.

To install the AWS CLI, you can follow the most up to date installation aws cli installation instructions. The long and the short of it is (on unix): $ sudo pip install awscli. You do need to use sudo because of some of the binaries that the installer provides.

Then install the Elastic Beanstalk CLI - $ sudo pip install awsebcli. You’ll need to set up your credentials from AWS; run aws configure and enter an AWS access key and secret key.

Unfortunately, the old version of the elastic beanstalk CLI tool gained a lot of traction, and so the documentation that relies entirely on the old tool is still frequently first in the search results. If you end up on an AWS documentation page and the last modified date is 2010, it probably isn’t accurate anymore. I suggest using Google’s convenient “in the past year” filter on your searches to avoid going down the wrong path.

For Elastic Beanstalk, there is a distinction between applications and environments. Applications include multiple environments. Environments can either be web environments, or worker environments - much like heroku’s web / worker dyno distinction. You can have extra environments for testing, staging, etc., which mirror or differ in settings from your primary production environment appropriately.

In addition, the application/environment distinction along with Route 53 (AWS’s DNS setup tool) makes a swapping-based deployment process easy. Instead of having a static ‘staging’ environment and an alternate ‘production’ environment, you can create two ‘production’ environments - say production-1 and production-2. Using Route 53, you can route your web traffic to production-1, and use production-2 as a staging environment. When it comes time to deploy, instead of re-doing the deployment process in production-1, just change your Route 53 to make your web traffic point at production-2 (which, due to the way Route 53 works, is very, very fast). Leave production-1 running - it’ll automatically scale down to minimal resources, and if you need to rollback, instead of having to roll back a deployment, just direct your traffic back at production-1!

Of course, this setup does mean that you’ll be incurring charges for a minimum of two sets of production web servers, but if you’re a reasonably large application that could be worth it.

Setting Up Rails on Elastic Beanstalk

First, create a demo Rails app, and go ahead and add some custom route that will generate a response. The default “welcome” page is not served correctly in production, so you will need something other than that. Go ahead and create a git repository and commit the code to the git repository.

You can actually create your elastic beanstalk application and environment entirely from the command line, but I do not suggest doing so - there are a lot of parameters to set and for your first setup, browsing through them is a good idea.

In the AWS console, select elastic beanstalk and “create new application”. Frustratingly, when I did so it did not give me an opportunity to create an application and instead immediately created “My First Elastic Beanstalk Application”. You can click through, delete it, and then go back to “Create a New Application”.

After you’ve created the application, go ahead and create an environment. The application will ask if you want to choose an IAM (identity access management) role; just click through and one will be created for you.

Now choose your configuration (Ruby). If this is a production environment, you’ll want to pick “Load balancing, auto scaling” as your environment type, instead of “Single Instance”. Note that creating a “Load balancing, auto scaling” environment will incur charges from creating an Elastic Load Balancer (about $15/m). Single instance environments are appropriate for testing and the occasional specialized job server.

On the next screen, go ahead and choose “Sample Application”. The default Batch Size is fine. Click next. AWS suggests that you name environments as application-environment, for example, myproject-production.

AWS will now ask if you want to create an RDS DB instance - If you don’t have one already, go ahead and make one, though it will frustratingly default to Magnetic storage and you’ll need to change it later from the RDS console if you want SSD-backed databases.

Finally, choose an instance type. For testing out the system, t2.micro is great. All of the other defaults are fine, you can enter your email address if you want to receive updates about the environment health (great for production).

Click through, configure your database if necessary (username and password are the important parts) and finally confirm everything. Creating a new Elastic Beanstalk application takes quite a while - easily ten minutes, in my experience, especially if you are creating a database. Don’t fear, though, once it’s created deployments are quick and easy.

Once it is set up and the environment says that it is “Healthy” on the elastic beanstalk dashboard, you can go ahead and upload your application. To do so, go to your application directory, and run eb init. Then run eb deploy - it’ll take the current commit, zip it up, upload the application to S3, and then deploy it to all of your instances. Once that’s done (the first time can take some time), run eb open to view your application live. Done!

Except, if you used a newly generated Rails app, you’ll probably get an error. To view the logs, run eb logs. The error message should be something about secret_key_base. If you run your rails app with -e production, you should see the same error locally. If so, go ahead and set the SECRET_KEY_BASE environment variable by editing the environment from the AWS console. Select the environment, then “Configuration” on the left hand side, then the “Software Configuration” card (should be fourth down, after ‘notifications’) and then enter the environment variable.

That’s it. To deploy again, run eb deploy.

A Word on Instance Types

After several rounds of releasing new and improved instance types at ever decreasing prices, making sense of the AWS instance options is not straightforward. Here’s my current understanding:

All of the ’t’ instances (t1.micro, t2.micro, etc.) are bistable - they can’t maintain full power for extended periods of time, but can run at full power for a little bit. This should make them good for hobby projects and for test environments where overall speed isn’t necessary.

The number in the instance name is its generation - all else equal, always prefer later generations, as they are cheaper and more powerful. The later generations don’t always have all of the options, though; e.g. there is no m3.small instance type. This means that m3.medium is the smallest instance type that should be used in production for business critical web applications. The m3.medium instance costs about $35/month to run, with about a 30% discount for making a one year commitment, and has more RAM than a Heroku dyno, so it should be a competitive option.

You can only set one instance type for each Elastic Beanstalk environment, because instances are designed to be totally expendable - they’ll crash or be pulled down pretty regularly, and it’s up to the scaling manager to take care of bringing up new ones. This means that the size of your instances impacts the granularity of your scaling - if you’re running $0.25/hr instances and decide to scale, it’s going to cost an extra $0.25/hr.

However, creating environments is very cheap - there’s really no overhead to do so. So go ahead and create test, staging, etc environments to your heart’s content, and use different instance types for different environments. In theory, you could even host the same application on two different subdomains (maybe app1.example.com and app2.example.com) if you really needed heterogeneous instance types, and perform some simple in-app load balancing to spread traffic across them.

Connecting to a Database and Connecting Your Service to a Domain

I’ll cover connecting to a database and connecting the web service to a domain in a future post.

Caching Ruby Methods With Memoization

Mar 26th, 2015 2:44 am ruby

Have you ever been writing a piece of code, and had to do a lot of computing in one method? For whatever reason, you have a method that takes a long time to execute, or takes just a little bit of time, but needs to be called frequently with the same argument. Perhaps you run into this situation when you need to access the network in the body of the method. Or when calling the method requires instantiating a heavy library object, like an instance of a PDF reading library.

Memoization is the technical term for “caching the result of a method”. Instead of re-running the method body every time the method is called, we instead check to see if the result has already been calculated, and if so, just return that result. If we have not yet calculated the result, then we run the method to do the calculation and store the result for future use before returning it. This way we only need to spend the time to do the long-running operation once.

You can implement a memoization pattern for a zero-argument function easily in Ruby, using the same pattern that you could use in most any language like Java, Python, or even C:

class YourAwesomeClass
  def a_long_running_method
    return @cached_value if @cached_value
    @cached_value = do_some_stuff #This might be many lines!
    @cached_value
  end
end

Simple enough! This is the most literal translation of the recipe we described earlier - check to see if we calculated the value, return it if so, if not, calculate and store, then return.

Of course, there is a beautiful Ruby idiom we can use as well:

class YourAwesomeClass
  def a_long_running_method
    @cached_value ||= do_some_stuff
  end

  private # You can probably hide do_some_stuff
    def do_some_stuff
      # some stuff
    end
end

This relies on a few facts about ruby. First, every expression has a return value, and functions return the return value of their last expression by default. Hence, a_long_running_method will return the result of @cached_value ||= do_some_stuff. Second, ruby uses short-circuiting evaluation for ||, so if @cached_value is truthy (e.g. not nil), then @cached_value ||= do_some_stuff will immediately evaluate to true, and return @cached_value. And third, the string relies on ruby’s convenience syntax that makes @cached_value ||= do_some_stuff equivalent to @cached_value || @cached_value = do_some_stuff, where the variable assignment on the right side returns the value that was assigned.

It turns out the @cached_value ||= do_some_stuff pattern is so eloquent and common, and works so often, that Ruby on Rails, the most popular Ruby library, actually removed their custom memoization library, ActiveSupport::Memoization, in 2011. The pull request inspired some interesting discussion, and points out some major problems with the @var ||= pattern.

Most important to note is that sometimes you want nil to be an appropriate value for @cached_value, and the @var ||= pattern does not allow this. It will still return nil if do_some_stuff returns nil, but it requires running do_some_stuff every time, which might not be a good trade off. If you wanted to explicitly allow nil values, you would need to use a more verbose pattern like so:

class YourAwesomeClass
  def a_long_running_method
    @cached_value = do_some_stuff unless defined? @cached_value
  end
end

This way, we can check if @cached_value has ever been assigned - to nil or otherwise - instead of just checking if it is truthy.

There are some other gotchas, in particular with how instantiating Hashes work - if you want to pass a parameter and use it as a hash key for ac cached hash, you’ll need a couple more lines of code. But usually, @var ||= is sufficient.

Luckily, if you do need to do some more complex memoization, there is a gem called Memoist that extracts all the lost behavior from ActiveSupport::Memoizable and gives it back to you. With Memoist, you can pretty much ignore how the memorization takes and just trust the gem to do its job, using this pattern:

class Person
  extend Memoist

  def a_long_running_method
    do_some_stuff
  end
  memoize :a_long_running_method

end

Super simple! If you’re writing a lot of code that requires memoization, be sure to check it out - and watch out for those nil values when you’re using @var ||=!

Unpaid Internships Aren't Just Immoral - They Hurt Your Business

Mar 20th, 2015 11:21 am business

I have long been morally opposed to unpaid internships. People should be paid for their work, period. But, I argue that unpaid internships hurt businesses that don’t even use them as well.

The Department of Labor has very clear-cut rules on unpaid interns. ¹ Rules 3 and 4 are too often violated - unpaid interns are displacing employees or providing an immediate advantage to their employer, for free.

The rule here is fairly simple: If an employer benefits immediately from the work of an intern or uses them to displace an employee, the intern must be paid. This rule is as clear cut as rules can be. And I think that it is a clear moral issue: if you believe that labor has value, you should support these rules.

But unpaid internships have negative impacts beyond just depriving a junior employee of income. So long as there are some bad employers who don’t pay their interns around, good employers who do will be hurt.

Unpaid Internships Hurt Income Polarization and Diversity

Unpaid internships aren’t an option for young workers without support for living expenses. As such, they act as a filter: does your family have a high income? If so, here’s some work experience. If not, too bad! ²

Later, good employers in the same industry are presented with a options between already-advantaged candidates with more experience, and less-advantaged candidates with less experience. The good employer must make an active effort to correct for this filter. If they just hire the most experienced candidate, they will end up hiring the advantaged candidates first.

This exacerbates income polarization by giving jobs to the already advantaged. And, if family income and any diversity metric are correlated³, it will will reduce the good employer’s workforce diversity.

All that is required for this to happen is for the bad employer to exist and the good employer to not make a heroic effort to correct for the actions of their competitors⁴.

Unpaid Internships can Drive Up Low-Experienced Labor Costs

Or at least, they might. I don’t think that there is a way to settle this argument without a fairly complex empirical study, but consider this argument.

Some of the ‘bad employers’ who aren’t paying the interns they should be paying can’t pay the interns they are hiring. These employers are implicitly compensating their interns in other ways - “resume lines” ⁵, fun activities, training. If these employers were removed from the market, it would increase the supply of low-experience workers in the market, as those interns seek other opportunities - paid ones. As supply of very low-experience workers increases, it should decrease the total cost of hiring those workers, unless their total cost is already floored by the minimum wage.

I suspect that in many instances, the market-clearing hourly pay of these otherwise-unpaid interns is below the federal minimum wage. But the total cost to the employer might not be. If interns are placing a high value on the experience - on things like the catered lunches, computers, and cachet of working for a cool company that are so commonly used as compensation in the technology industry - then the total cost of compensating them may be far higher than the hourly minimum wage.

Increasing the supply of paid low-experience interns can still decrease their total cost to employers.

Unpaid internships are bad for your business

Even if they occur at other companies. They provide filters you likely will forget to correct for, reducing diversity and exacerbating income polarization. And if you’re not working at a cool startup, they might increase your labor costs. Don’t abide them.

US Department of Labor ↩
This is not an original argument, but I cannot remember who first introduced it to me.↩
They are.↩
There is an argument that unpaid internships are better than negative-pay grad school, which may replace them; I am sympathetic but it is out of scope here.↩
Anyone have a bid-ask spread on a resume line?↩

A Month With Betterment

Mar 19th, 2015 2:02 am investing

The title of this post is a lie - I have actually only been using Betterment for two weeks. But it feels like a month, because in just two weeks Betterment has become my new favorite financial product, period.

For the unfamiliar - Betterment is a ‘robo advisor’, that is, a fully automated financial advisor. It is designed to compete against the financial advisors and private wealth managers that advise clients on what stocks (or really, mutual funds and ETFs) to buy. Because everything is online, Betterment offers fees that are significantly cheaper than a traditional financial advisor’s, and can take on clients with much smaller portfolios.

The ‘catch’, if you consider it one, is that Betterment puts everyone into basically the same portfolio, only varying the proportion of different assets to reflect your particular risk tolerance. But, the portfolio it puts you in is very efficient, backed up by solid academic research and lots of consideration by experts - chances are that it’s about as good as any retail financial advisor can possibly do.

I actually approached Betterment not as someone looking to save money on advising fees, but from the opposite place - as someone who had previously done their own investing and wanted to pay someone else to take care of it. I have had a self-service brokerage account (through Fidelity, then Interactive Brokers, then Fidelity again) for years; I (will soon have) a graduate degree in Finance, and am as financially literate as anyone. To the extent that any puny human brain can grasp the power of compound interest and efficient investing, I have grasped.

Even so, Betterment has totally changed how I manage my personal finances, and for the better¹. Although I have always known how important investing regularly was, until I had the ability to simply set up an automatic deposit every payday - and know that it’d be invested in an optimal portfolio in a tax efficient way automatically - I never really did. Not having to log into a brokerage account with a horrible interface and manually calculate how many shares of each fund I wanted to buy is a huge plus.

Perhaps more importantly, because of the small sums I am investing, Betterment lets me do something Fidelity never could - diversify. My portfolio is small enough that a 5% allocation is often less than a single share. With betterment, “I” (Betterment) can easily buy fractional shares effortlessly and cheaply. This boosts the efficiency of the portfolio and avoids the problem I have often had of having a couple hundred un-invested dollars just lying in an account, not earning anything at all. And they really do let you do fractional shares - one of my Betterment goals currently has $2.03 invested in LQD - or 0.02 shares!

The crowning achievement, though, is just how motivating Betterment’s “Advice” interface is. Plug in time horizons, differing immediate contributions, and various monthly or bi-weekly auto deposits to see how much money you can expect to have under different scenarios.

Seeing that adding a mere additional $10/month to my IRA will work out, even in the worst case scenario, to an extra $22,000 at retirement - and nearly $100,000 in the best case - truly motivates me to save in a way that nothing else can. Since signing up, I’ve reduced my spending on restaurant dining and other nice-to-haves markedly and have never felt better.

In short, I can’t recommend Betterment (or their competitor, Wealthfront, who I have not tried myself because of their minimum investment requirements) enough. Completely worth the 0.33% per year fee charged for accounts under $10,000, much less the 0.15% charged on large accounts!

Writing that felt cheesy, but props to the team that named the company.↩

Talking About Commuter Cycling in Interviews

Feb 23rd, 2015 2:38 pm business

When people think of a “cyclist”, they most likely imagine someone like Lance Armstrong: an athlete who races bicycles. They might alternatively imagine one of the thousands of recreational cyclists who cycle quickly for exercise.

I on the other hand am a commuter cyclist ¹. I usually wear button down shirts and slacks (with one leg rolled up); not spandex. Depending on the route, I average between 10 and 15 miles per hour - nothing compared to the 30+ that competitive triathletes manage.

Being a commuter cyclist provides great answers to some common interview questions. Many tech firms look for folks who can work in a “fast paced environment” and deal with “high stakes situations”. They need people with “attention to detail”. If the company is a startup, they might have some combination of adjectives like ‘scrappy’ or ‘underdog’ to try to gently remind that nobody knows who they are.

When is the last time I did that? This morning, I decided to perch myself upon a 25 lb steel frame, and then played a game called ‘cycling’.

The other players are cars and trucks weighing between 80 and 800 times as much as my bike. Fast paced? The speed limit for the other players on certain parts of my route is 45 miles per hour. Low visibility? Half the drivers forget I exist. If they do remember I’m lucky to be tolerated: at least once per week a driver shouts at me to ride on the sidewalk if there is no bike lane, regardless of the law.

As for attention to detail, bikes are vulnerable to road imperfections that a car would never even notice. Having to dodge rocks as small as a quarter while maneuvering between pickup trucks should certainly qualify.

Finally, it is high stakes: The chance of me surviving a collision is so depressingly low that I consciously avoid looking it up.

And all of this is just a random weekday morning, before I set about my real work. The inherent exercise (cardiovascular exercise can boost creative thinking for 2-3 hours after the exercise) and eco-friendly nature of the transportation are just added bonuses.

If you have stories of commuter cycling, I’d love to hear them: twitter @jamescgibson.

In part because I do not own an automobile in a car-oriented city.↩

SecondForge

james c gibson