Materialized Views Part 2 - MVs in Rails

rails

In Supercharge your app with materialized views, we discussed some of the benefits of using materialized views in a web application, and some of the situations in which a Materialized View may be a good choice.

My framework of choice is Rails. Rails’ default ORM, ActiveRecord, is very good, but it does not provide any real integration for views, materialized or otherwise. This becomes an issue, as views can only be dropped and re-defined - not incrementally changed with add_column or the like - thereby requiring us as developers to manually ensure that we’re correctly defining the view each time we wish to update it.

Originally, this post was going to be a discussion of some of the ways of managing this complexity, but I have since discovered scenic, a gem by thoughtbot designed specifically for managing materialized views. I recommend you check it out for all of your view management needs.

Supercharge Your App With Materialized Views

rails

Materialized views a very powerful PostgreSQL feature that can allow you to dramatically speed up complex reads. Since they aren’t often used in Rails, here are some notes on when to use Materialized views and how to effectively use them

On Views & Materialized Views

A database view is a result of a stored query. You can read from a view just like one would read from a table, and perform additional filtering on the results of the view. They’re particularly useful when the stored query is very complex, for example, when the query is the result of several complex JOINs. Evey time that a view is read from, the query that defines it is performed.

A materialized view is a view that has been executed and has its results stored on disk, effectively caching the results of the query. Because the results of the query are written to disk, reading from and querying a materialized view is just as fast as doing a straight read on a single table - scary fast. For complex database operations, this can be a significant speedup.

The cost of that increase in read performance is that the materialized view does not automatically change with changes underlying data. A materialized view must be periodically “refreshed” to bring it up to date - essentially, performing the query that defines the view again, and storing the results. Depending on the complexity of the query that defines the view, it could be practical to refresh the view frequently, even upon every write to the underlying table, but it is not done automatically.

A short example

Here’s a quick example. Let’s say that we have a website that allows customers to book cruises.

What makes up a cruise? Well, a cruise is a specific customer renting a specific cabin, which has a type, during a specific sailing of a ship, which itself is from a specific port. Maybe our schema looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
cabins
  - id
  - cabin_number
  - ship_id
  - cabin_type_id

cabin_types
  - id
  - price_per_day

sailings
  - id
  - ship_id
  - departure_port_id
  - embark_date
  - return_date
  - price_adjustment_factor

ships
  - id
  - name

ports
  - id
  - name
  - port_fee

cabin_sailings
  - cabin_id
  - sailing_id
  - booked

We want to show the user CabinSailings that are not booked, but allow them to search by criteria that’s stored through a lot of JOINs. To allow the user to search by the name of the port, for example, we’re going to need to do something like:

1
2
3
SELECT * FROM cabin_sailings
  JOINS sailings ON sailings.id = cabin_sailings.sailing_id
  JOINS ports ON ports.id = sailings.departure_port_id

Not too bad, but then we’re going to need to select by the ship as well -

1
2
3
4
5
SELECT * FROM cabin_sailings
  JOINS sailings ON sailings.id = cabin_sailings.sailing_id
  JOINS ports ON ports.id = sailings.departure_port_id
  JOINS cabins ON cabins.id = cabin_sailings.cabin_id
  JOINS ships ON cabins.ship_id = ships.id

Oh, and while we’re at it, let the user select only, say, four day cruises, and sort by price:

1
2
3
4
5
6
7
8
9
10
11
12
SELECT
  cabin_sailings.*,
  (sailings.return_date - sailings.embark_date) AS duration,
  (duration *  cabin_type.price_per_day) * sailings.price_adjustment_factor + ports.port_fee AS price,
FROM cabin_sailings
  JOINS sailings ON sailings.id = cabin_sailings.sailing_id
  JOINS ports ON ports.id = sailings.departure_port_id
  JOINS cabins ON cabins.id = cabin_sailings.cabin_id
  JOINS ships ON cabins.ship_id = ships.id
  JOINS cabin_types ON cabins.cabin_type_id = cabin_types.id
WHERE duration = ?
ORDER BY price DESC

That’s not too bad, but you can quickly see how this is getting a little out of hand. Putting this much SQL into any traditionally define model is going to be unwieldy.

And as our application gets more complex, and our data set grows, doing these very complex JOINs will rapidly consume significant database power. Soon, we’ll be taking ten seconds to display results to the user - every time they search.

This is the perfect situation for a (Materialized) view. We can keep the complexity of the SQL that defines the concept we’re interested in - let’s call it a “Listing” for now - out of the application, and in the database where it belongs by using a view. And by using a materialized view, we can keep the queries super speedy, at the cost of having to put a bit more work into keeping them up to date.

Next Up

Next up: Some tips and tricks for using materialized views with Ruby on Rails.

Inline Percent Operator in Ruby

ruby

A quick tip, which I hadn’t discovered even after several years of using ruby professionally - you can use the % operator as an infix operator to perform string formatting.

1
"%d%d%d-%d%d%d%d" % [8, 6, 7, 5, 3, 0, 9] == "867-5309"

It follows all the same formatting codes as the regular string format functions.

Dell D3100 Dock by DisplayLink With XPS 13 Developer Edition

computers

I own a 2015 Dell XPS 13 Developer Edition, which is a truly fantastic laptop. A quick product review: I recently purchased the Dell USB 3.0 Triple Display UltraHD Universal Dock (D3100) in order to help me use my laptop more effectively as a desktop replacement at work.

I bought the dock mostly because of the promise of being able to use 2 monitors with my laptop. The dock worked fairly well and it looked elegant. It replaced an Amazon Basics 7 port USB 3.0 that I had previously bought.

However, getting monitors set up was not easy. I attached two, year old 1080p monitors. They were, eventually, auto detected, but I had to reboot the computer several times to get them working, even after installing the drivers from DisplayLink.

But the worst part - using the monitors used 50% of my entire CPU speed. It seems that the display-over-usb-3.0 works by compressing video for sending on the CPU; in any case, I obviously could not afford to constantly use one full CPU core just for my monitors - and monitors that caused the mouse to flicker on the laptop screen, at that.

This is an issue that has been reported widely, so DisplayLink should have gotten a fix out already - and apparently, it might be ameliorated on more recent versions of Ubuntu. But the bottom line is that the Dell D3100 does not work out of the box with the XPS 13 Developer Edition - would not recommend.

Learning Us a Haskell for Great Learnings

haskell

I recently reconnected with a friend from high school, Vrushank Vora. Vrushank had been trying to teach himself Haskell, and suggested that we collaborate on teaching ourselves Haskell over the next few months.

I thought this was a great idea. I’ve tried Haskell before, but have never really gotten deep into it. I’m not quite sure why this is - I’ve written plenty of functional-style code in Ruby and Clojure - but perhaps being forced to write in a functional style stymied me more than I’d like to admit. It is, after all, very easy to fall back into an imperative style, if there’s nothing preventing you from doing so.

Here’s the quick notes of what we did, and what we learned. To prepare, we read chapters 2 through 5 of Learn you a Haskell for Great Good. Without much of a formal plan, we decided to try some exercises from Code Wars. We ended up only being able to complete the screener that Vrushank needed to do to sign up and one other exercise.

First discovery: you access the fields of a record-syntax defined type with functions that corresponds to the field names. That is, if you have data Person = { name: String } as your type, and a Person that you have bound to, say, person, you would access the name of the person with name person. This was a little counter-intuitive to me coming from an Object Oriented / Message Passing background, but is really quite a clean way to do it.

We then did an exercise on writing a function to determine if a string is an “Isogram”, that is, if all the letters in the string are unique. This one took us a little longer.

I will say that the red-green-refactor strategy worked wonderfully for us. Our first version of the function was something to the effect of:

1
2
3
4
5
isIsogram :: String -> Bool
isIsogram xs = if null xs
                 True
               else
                 not ( (head xs) `elem` (tail xs) ) && isIsogram ( tail xs )

But, we ended up refactoring the code to be fairly beautiful, something to the effect of:

1
2
3
4
5
6
caps :: String -> String
caps = map toUpper

isIsogram :: String -> Bool
isIsogram "" = True
isIsogram (x:xs) = (toUpper x) `notElem` (caps xs) && isIsogram xs

There was an alternative solution that did not define a caps function. One of the questions I have is - which would be “better” Haskell? Is it better to define lots of named functions, or fewer?