Learning Us a Haskell for Great Learnings

haskell

I recently reconnected with a friend from high school, Vrushank Vora. Vrushank had been trying to teach himself Haskell, and suggested that we collaborate on teaching ourselves Haskell over the next few months.

I thought this was a great idea. I’ve tried Haskell before, but have never really gotten deep into it. I’m not quite sure why this is - I’ve written plenty of functional-style code in Ruby and Clojure - but perhaps being forced to write in a functional style stymied me more than I’d like to admit. It is, after all, very easy to fall back into an imperative style, if there’s nothing preventing you from doing so.

Here’s the quick notes of what we did, and what we learned. To prepare, we read chapters 2 through 5 of Learn you a Haskell for Great Good. Without much of a formal plan, we decided to try some exercises from Code Wars. We ended up only being able to complete the screener that Vrushank needed to do to sign up and one other exercise.

First discovery: you access the fields of a record-syntax defined type with functions that corresponds to the field names. That is, if you have data Person = { name: String } as your type, and a Person that you have bound to, say, person, you would access the name of the person with name person. This was a little counter-intuitive to me coming from an Object Oriented / Message Passing background, but is really quite a clean way to do it.

We then did an exercise on writing a function to determine if a string is an “Isogram”, that is, if all the letters in the string are unique. This one took us a little longer.

I will say that the red-green-refactor strategy worked wonderfully for us. Our first version of the function was something to the effect of:

1
2
3
4
5
isIsogram :: String -> Bool
isIsogram xs = if null xs
                 True
               else
                 not ( (head xs) `elem` (tail xs) ) && isIsogram ( tail xs )

But, we ended up refactoring the code to be fairly beautiful, something to the effect of:

1
2
3
4
5
6
caps :: String -> String
caps = map toUpper

isIsogram :: String -> Bool
isIsogram "" = True
isIsogram (x:xs) = (toUpper x) `notElem` (caps xs) && isIsogram xs

There was an alternative solution that did not define a caps function. One of the questions I have is - which would be “better” Haskell? Is it better to define lots of named functions, or fewer?

Start Using Squish in Rails

rails

Just a quick tip - how many times have you found yourself cleaning up strings (maybe user input?) by doing things like this:

1
string = string.gsub(/\s+/, " ").strip

and the like? You want to replace consecutive whitespace with single whitespace and remove beginning and trailing whitespace.

If you’re using rails, try squish - squish. Note that squish removes newlines, so it may not be appropriate for every scenario, but it’s a generally very useful function.

Homoiconicity [in Rails]

programming, rails

For the past ~ year, I’ve been working full time on helping students save money on college tuition with Quottly. Quottly is a rails application, and far from my first - I think my first Rails app was Rails 3.0, build working off of a book written for 2.1, and now we’ll shortly be upgrading to Rails 5.

One thing has previously never bothered me about ruby and rails, but now is: the lack of, depending on your perspective, homoiconicity or idempotent database operations.

Here’s the situation. Traditional applications have a sharp divide between data and code. Things that go in databases are data. Things that the programmer writes, and that are stored in your version control system, are code. This works pretty well for most applications. I have a users table in my database, and if I need different code to work with some subset of users, I just give them an appropriate class in my class hierarchy. When the user’s detail changes, I just update the appropriate database row.

Quottly, on the other hand, is sort of an odd beast. There are lots of objects that, in some senses, straddle the line between data and code.

Let’s take a university as an example. Quottly matches college students with the best classes for them across all schools, so we have to store some information about each university that we work with in our database.

What is a university? Parts of it - for example, let’s say the current price of a credit hour, are clearly data. But this “data” has some interesting properties:

One, there can only ever be one instance of a university. There is only one University of Florida, and it is uniquely named; if I ever have two instances of the University of Florida in my database, something is very wrong. And the existence of the University of Florida has nothing to do with its existence in my database, it is, in some ways, a property of the world.

Second, there are bits of things that look very much like code associated with each university. For example, the University of Florida has a rule that you must complete the last 30 credits of your degree at UF. Is that data, or code? I can convert it into “pure data” by making some sort of LastCreditsRule class, putting a row in my database with 30 as the value for “number of credits”, and associating it with the database row that’s associated with UF. But, I could equally (and in many ways, more easily) define that rule with a small bit of code - in effect, that LastCreditsRule is a Verb in what should properly be a Kingdom of Nouns.

I can combine this with a whole lot of wrappers - or some nonstandard active record modifications - to ensure that the create and update operations for the University model in Rails both correspond to what I would describe colloquially as ‘create-or-update-as-appropriate’.

If I wanted to, I could equally implement the University of Florida functionality by making a University of Florida class, and defining a bit of code in it that would implement the last-30-credits rule. In this case, it’d be pure code - the University of Florida would be a singleton class, information about which is only stored in our version control system, and which would require a redeploy to fix.

Neither of these solutions is very appealing. The first, the option in which we treat the university as pure data, involves creating (potentially) a whole lot of Verb classes (things ending in Rule or Policy), which are a code smell. There’s going to be a lot of dumb classes floating around, and that whole class hierarchy will be hard to maintain.

The second option is just not very scalable - if we expand to all 3,000 universities in the US, I’ll end up with 3,000 files in my /models/universities folder? But, that solution does have the nice option of letting me easily grab the object I want by its globally unique name whenever I want, which can be convenient.

There is a third, halfway option, in which ruby code is shove into a database - making it into data - but then is eval()’d out of the database to implement the -Rule classes. This has the advantage of making the Rule class reasonable, constrained, and somewhat maintainable, as the class won’t need to change, but has other disadvantages (security issues being one of the major ones, in addition to inelegance, and a lack of reliability and ability to test easily).

I believe that (rare) situations like this are where homoiconic languages show their value. Homoiconic languages “allow all code in the language to be accessed and transformed as data, using the same representation” - that is, to the programmer, there’s no difference between data and code. The languages in the lisp family are the only example I know of.

In Lisp, I think I could implement this (at first) with a simple file that defines some structs that represent universities, eventually replacing that with some sort of macro that can fetch the university from a database or hash table lazily - I am, sadly, not well versed enough in lisp to understand how that would work, exactly, but I believe it could be done.

This would seem to me to give the best of both worlds. Since there is no difference in representation between code and data, no decision need be made about what is code and what is data; the different code-like and data-like aspects of the objects may be put into their proper storing places appropriately and easily (and that decision can be changed later without much fanfare).

I am interested in which other situations homoiconic languages would have obvious value. I believe that anything that involves diverse business rules would be a prime candidate - perhaps medical billing systems? - as ‘rules’ naturally fall on the line between code and data. Or, if you have an idea on how to address this situation in ruby/rails, I’d love to hear it! If you have an example, ping me on twitter - @jamescgibson.

Using pdf2htmlEX on Heroku

heroku

Want to use the awesome pdf2htmlEX on Heroku? You’re not alone. For Quottly, we do quite a bit of PDF processing - turns out, a lot of colleges and universities like to publish information in PDF format. We always try to use the pdf-reader ruby gem if we can, since it’s easy to deploy and maintain, but sometimes pdf-reader just doesn’t have enough power for what we’re trying to do.

We recently got pdf2htmlEX running on our Heroku app. Here’s how.

apt buildpack

pdf2htmlEX is distributed either from source or as a Linux package. To install the debian package for pdf2htmlEX on Heroku, we first added heroku-buildpack-apt to our application’s buildpacks.

Some old sources (including the README.md on heroku-buildpack-apt) will indicate that the best way to do this is to create a .buildpacks file in your project. However, Heroku now recommends adding the buildpacks from the command line, and/or using an app.json for reproducible deploys.

We added the following to our app.json:

1
2
3
4
5
6
7
8
9
10
  ...
  "buildpacks": [
    {
      "url": "https://github.com/heroku/heroku-buildpack-ruby.git"
    },
    {
      "url": "https://github.com/ddollar/heroku-buildpack-apt.git"
    }
  ]
  ...

Then, add an Aptfile for heroku-buildpack-apt to pull from. Each line in the Aptfile is either the name of an apt package, in which case the package will be installed from the standard source archives available on Heroku, or is a link to a specific .deb package.

Either by running apt show on the pdf2htmlEX package, or by referencing this stack overflow post, you might come up with the following dependency list:

1
2
3
4
5
6
7
8
9
libc6
libcairo2
libfontforge1
libfreetype6
libpoppler44
libgcc1
libstdc++6
ttfautohint
https://launchpad.net/~coolwanglu/+archive/ubuntu/pdf2htmlex/+files/pdf2htmlex_0.12-1~git201411121058r1a6ec-0ubuntu1~trusty1_amd64.deb

It’s worth noting that since listing the .deb on its own line installs it without automatically resolving dependencies, you will not receive a build error in the event that pdf2htmlEX installs but is unusable. The only way to confirm that pdf2htmlEX is installed correctly is to:

1
2
$ heroku run bash --app YOURAPP
$ pdf2htmlEX --version

and confirm that the output is correct.

After deploying with the Aptfile above, you likely will run into an error about a missing libpoppler57.so. I believe this is because the .deb file that is listed was built against a different libpoppler than the one that is installed here - in this case, libpoppler57 vs libpoppler46.

To fix, let’s just replace the libpoppler44 reference with an explicit reference to the correct .deb file - I found this by looking up libpoppler on the Ubuntu archive website:

1
2
3
4
5
6
7
8
libc6
libfontforge1
libgcc1
libjs-pdf
libstdc++6
http://mirrors.kernel.org/ubuntu/pool/main/p/poppler/libpoppler57_0.38.0-0ubuntu1_amd64.deb
https://launchpad.net/~coolwanglu/+archive/ubuntu/pdf2htmlex/+files/pdf2htmlex_0.12-1~git201411121058r1a6ec-0ubuntu1~trusty1_amd64.deb
ttfautohint

This should resolve the libpoppler error. However, after deploying this, I still ran into the same problem listed on that stack overflow post -

1
2
3
pdf2htmlEX: /app/.apt/usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by pdf2htmlEX)
pdf2htmlEX: /app/.apt/usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by pdf2htmlEX)
pdf2htmlEX: /app/.apt/usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /app/.apt/usr/lib/x86_64-linux-gnu/libpoppler.so.57)

The issue here is that the version of libstdc++6 being installed doesn’t include glibcxx_3.4.20 - we just need a newer version of libstdc++6. A quick upgrade:

1
2
3
4
5
6
7
8
libc6
libfontforge1
libgcc1
libjs-pdf
http://mirrors.kernel.org/ubuntu/pool/main/g/gcc-5/libstdc++6_5.3.1-5ubuntu2_amd64.deb
http://mirrors.kernel.org/ubuntu/pool/main/p/poppler/libpoppler57_0.38.0-0ubuntu1_amd64.deb
https://launchpad.net/~coolwanglu/+archive/ubuntu/pdf2htmlex/+files/pdf2htmlex_0.12-1~git201411121058r1a6ec-0ubuntu1~trusty1_amd64.deb
ttfautohint

And this should work!

A few caveats: I’m not entirely familiar with how linking on mirrors.kernel.org works, so I believe it is possible that these links may break some time in the future. Additionally, I would feel more comfortable if every one of the dependencies were locked down to a specific .deb - I’m concerned that a version bump on e.g. libgcc1 may break this build.

However, I think that it shouldn’t be too terribly difficult to cross that road if and when it arises - all that is needed to do is to determine which version of libgcc1 is installed on a working system, and then hard link to that `.deb.

Happy deploying!

Thinking About Investing in Dimes Per Day

investing

Everyone - especially young people - could probably do well to invest more of their income. One of the reasons that is is hard to get the motivation to invest is that the outcomes are not clear, and the pace of progress is oftentimes hard to gauge.

After all, if you check your portfolio regularly, you might see swings in its value of hundreds of dollars every day, even though, on average, over the course of years, you can be confident that your portfolio will become more valuable.

And, what is the goal? Many of my friends who are the most thrifty in general - who have to make the fewest life changes to begin investing - are the ones who see the point the least. They don’t value material things, but they do value experiences and freedom. So what is the point of amassing a large portfolio?

Every $1000 you invest gives you one dime per day

That’s how I’ve started thinking about it. If I invest $1,000, I can then (99 times out of 100) count on being able to spend one dime per day, in current money, forever, without reducing my portfolio balance.

How? To withdraw one dime per day, my portfolio must on average earn $36.50 per year - a return of just 3.65%.

With an aggressive but diversified portfolio of stocks and bonds and reasonable inflation - say, 70% stocks earning on average 7.0%, 30% bonds earning on average 3%, and 2% inflation, the math works out:

Your 70% allocation to stocks - or $700 - will on average earn 7.0%, or ($700 * .07) = $49.00 per year. Your $300 of bonds will earn on average ($300 * .03) = $9.00 per year, for a total of $58.00 per year.

You need to increase the balance of your portfolio by 2% - or $20 - each year to counter act the effects of inflation, leaving you $38 per year to spend.

$38 per year is just a hair over a dime per day, or $36.50 per year. So thinking in terms of a dime per day is both easy and suitably conservative.

Of course, there’s more to this - if you actually live on a 3.65% withdrawal rate, you’ll run out of money in 30 years about 1 out of every 100 times - but in general, it’s a safe way to think about things.

Backing things out

A dime per day isn’t much, or doesn’t seem like much. But thinking about things starting with needs, not wants, makes it clear how much it can be.

What is the bare minimum to live? There are lots of things that make live liveable - but ultimately the only thing you must have is food and water.

Buying in bulk, cornmeal is about $0.15 per 200 calories. Pasta is about $0.20 per 200 calories. Rice and beans, which together provide all the essential amino acids for dietary protein, if prepared yourself, are in approximately the same range - say $0.15 per 200 calories. Canola oil, a healthy fat, is less than $0.10 per 200 calories.

Net, you should be able to eat a relatively healthy 1600 calorie per day diet (certainly not enough to thrive, but enough to live) for about $2.00 per day. Twenty dimes.

Hence, if you can get $20,000 into an investment account, you will never, ever need to go hungry.

That’s a powerful statement. All that is required to never be hungry in your life is saving $20,000 - quite a reasonable sum.

This extends to all the other aspects of life as well. In low income neighborhoods of most non-major cities, you can find rooms for rent for as little as $300/m, or 100 dimes per day, and in many cities you can actually buy a (granted, low quality) house for as little as $50,000 - which, with a 30 year mortgage, will cost you about $300/m as well.

If you can get $100,000 invested, you will never be homeless.

$20,000 for food and $100,000 for housing - plus, say, another $0.05 per day for odds and ends - and you can be assured that you will never be destitute if you can get $125,000 together.

That’s powerful, because investing $125,000 is eminently doable. If you have a college degree, you should be able to find a job that will pay at least $30,000 per year - and if you simply continue to live as you did in college, which was likely on less than $10,000 per year, you can save an additional $15,000 per year and have $125,000 in just 7 years.

And every extra $1000 you save - which is only 50 hours of work at $20 / hour, a pretty reasonable rate for freelance work - increases your standard of living by $0.10 per day, forever.

And all of this is without leaving the US - there are many countries where you can have a significantly better quality of life for this price of about $12.50 per day Check out Millenial Moola’s post on this topic.