Property-based testing in Ruby
For the past year or so I have slowly been dipping my feet into the vast functional programming seas. From taking the awesome Coursera offerings from Typesafe to slowly working through Rúnar Bjarnason’s and Paul Chiusano’s Functional Programming in Scala, my mind has been expanding proportionally to the time I dedicate to learning its ways. It has been incredibly rewarding and humbling.
One such reward has been coming into direct touch with property-based testing. This technique, first developed by QuickCheck in Haskell-land, spins automated testing on its head: instead of codifying what is proper behavior by asserting that the outputs for given inputs match what is expected, the tester establishes logical properties about what should happen and lets the tool generate loads of inputs to check if they hold. If something goes wrong, the tool will then try to find the smallest test input that breaks the property (falsifies it), a process called shrinking; if it can’t find anything, you can sigh with relief and think about what to scrutinize next.
Having a QuickCheck-like tool at your disposal can be incredibly powerful. The more complex the software or the algorithm, the greater the likelihood of your carefully curated unit and integration tests having blind spots. Basho, for instance, have written about the stark realization that their worker pool library was full of subtle bugs by using QuickCheck for Erlang, and you can find other instances of how the technique helped make better software.
I don’t know about you, but when I come in contact with stuff like that I immediately think of how improved parts of my day job would be if I just could apply it. Considering that my daily duties are conducted in Ruby, I felt it was time I explored the subject in that realm.
A contrived setup that hopefully shows how it can work out
Let’s say we’ve decided to implement our own linked list class in Ruby. We would probably start our implementation with something like this:
require "singleton"
class Nil
include Singleton
def empty?; true; end
def to_s
"Nil"
end
end
class Cons
def initialize(head, tail = Nil.instance)
@head = head
@tail = tail
end
attr_reader :head, :tail
def empty?; false; end
def to_s
"(#{head} . #{tail.to_s})"
end
end
Using that very convenient API, we can build lists:
>> l = Cons.new(1, Cons.new(2, Cons.new(3, Nil.instance)))
>> l.to_s # => "(1 . (2 . (3 . Nil)))"
We know that, in a linked list, adding to the head is O(1), while appending to the end is O(n). So we build algorithms that respect its efficiency guarantees. However, when we, say, map this list into another list, it results in the following situation:
def do_something_amazing(list, acc = Nil.instance)
super_value = list.head * 1337
if list.tail.empty?
acc
else
do_something_amazing(list.tail, List.new(super_value, acc))
end
end
>> do_something(l).to_s # => "(4011 . (2674 . (1337 . Nil)))"
Processing things from head to tail means the list ends up reversed. It’s common, then, to reverse it back when we’re done processing, to preserve the order an external user would expect. Let’s add a reverse
method to a List
helper module:
module List
def self.reverse(list, acc = Nil.instance)
if list.empty?
acc
else
reverse(list.tail, Cons.new(list.head, acc))
end
end
end
So when we try to reverse what was created in do_something_amazing
, we get what we need:
List.reverse(do_something_amazing(l)).to_s # => "(1337 . (2674 . (4011 . Nil)))"
Awesome. I think this is enough for us to start exploring properties. If you’re getting bored, take a sip of coffee and come back when you’re ready. There’s a few cool tricks below the fold.
Testing the old way
Being the good developers we are, we are covering that code with tests:
class ListTest < MiniTest::Test
def test_reversing_lists
assert_equal "(3 . (2 . (1 . Nil)))",
List.reverse(Cons.new(1, Cons.new(2, Cons.new(3)))).to_s
assert_equal "(9 . (400 . (321 . (1 . (10 . Nil)))))",
List.reverse(Cons.new(10, Cons.new(1, Cons.new(321, Cons.new(400, Cons.new(9)))))).to_s
assert_equal "Nil", List.reverse(Nil.instance).to_s
assert_equal "(1 . Nil)", List.reverse(Cons.new(1)).to_s
end
end
We’re pretty confident that’s enough, even if it was kind of boring to do manually. That amount of testing would let us go home and sleep soundly.
Testing the QuickCheck way
First, we’ll need something like QuickCheck in Ruby. The best, most idiomatic, most maintained, least-Monad-y thing I have found is Rantly. It has both primitive value generation built-in and property testing with shrinking. We’ll skip over the basic API and go straight to defining a property to check if my algorithm is really bullet-proof. To aid in the creation of lists from Arrays, we’ll add a new helper:
module List
# ...
def self.from_values(*values)
values.reverse.inject(Nil.instance) { |ls, v| Cons.new(v, ls) }
end
end
To check that it works, let’s change the existing tests and see if they still pass:
class ListTest < MiniTest::Test
def test_reversing_lists
assert_equal "(3 . (2 . (1 . Nil)))",
List.reverse(List.from_values(1, 2, 3)).to_s
assert_equal "(9 . (400 . (321 . (1 . (10 . Nil)))))",
List.reverse(List.from_values(10, 1, 321, 400, 9)).to_s
assert_equal "Nil", List.reverse(Nil.instance).to_s
assert_equal "(1 . Nil)", List.reverse(List.from_values(1)).to_s
end
end
Run options: --seed 48889
# Running:
.
Finished in 0.001256s, 796.1783 runs/s, 3184.7134 assertions/s.
1 runs, 4 assertions, 0 failures, 0 errors, 0 skips
Great. Now to the newfangled thing. As I mentioned before, writing a property to check requires us to think differently than we would with regular unit tests. Your formulation should state something logical, something that does not rely on specific inputs. Following that guideline, we can reason about reversing lists in the following manner:
# ...
def test_reversing_by_property
property {
length = range(0, 1_000_000)
List.from_values(array(length) { integer })
}.check { |list|
assert_equal list.to_s, List.reverse(List.reverse(list)).to_s
}
end
# ...
The meat is in the check
block. Determining that a list has been reversed correctly requires us to check if reversing it again gets us back to the original list. To seed our check, we build a property
block that creats an array with a random length between 0 and 1_000_000, itself filled with random integers. Let’s run the tests again:
$ bundle exec ruby list.rb
Run options: --seed 17130
# Running:
.
..........
success: 100 tests
.
Finished in 121.969127s, 0.0164 runs/s, 0.8527 assertions/s.
2 runs, 104 assertions, 0 failures, 0 errors, 0 skips
It took a while (we wanted to be thorough, with those million-item arrays), but we’re pretty sure it works. I’m a believer and I’m stoked; when I look at you, however, I see a face that says “look, it’s cool and all, but isn’t it kind of worthless? The tests we had were telling us the same thing, and we only needed the power of our minds to generate the correct inputs. Why go through so much trouble?”
Well, what about those times when ours minds fail us?
Catching a bug with Rantly
Let’s say you’re excited about building your own data structures and want to wrap that linked list inside a very inefficient Set. You mutter to yourself that you should make sure items are not inserted twice, which for now seems to be the main difference between Sets and Lists as storage containers.
You build a little more structure into what you already have, adding a prepend
method and inlining reverse
into a List
base class:
class List
def to_s
raise "Don't use this directly, fool"
end
def empty?; true; end
def prepend(v)
Cons.new(v, self)
end
def reverse(acc = Nil.instance)
if empty?
acc
else
tail.reverse(Cons.new(head, acc))
end
end
def self.from_values(*values)
values.reverse.inject(Nil.instance) { |ls, v| Cons.new(v, ls) }
end
end
class Nil < List
include Singleton
def to_s
"Nil"
end
end
class Cons < List
def initialize(head, tail = Nil.instance)
@head = head
@tail = tail
end
attr_reader :head, :tail
def empty?; false; end
def to_s
"(#{head} . #{tail.to_s})"
end
end
To check if an item exists, you add a contains?
method:
class List
# ..
def contains?(v); false; end
# ..
end
class Cons < List
# ..
def contains?(v)
head == v || tail.contains?(v)
end
# ..
end
Then you write your immutable Set and matching tests:
class DumbSet
def initialize(storage = Nil.instance)
@storage = storage
end
attr_reader :storage
private :storage
def push(v)
if !storage.contains?(v)
DumbSet.new(storage.prepend(v))
else
self
end
end
alias_method :<<, :push
def contains?(v)
storage.contains?(v)
end
def to_a
values = []
list = storage
until list.empty?
values << list.head
list = list.tail
end
values
end
end
class DumbSetTest < Minitest::Test
def setup
@s = (((DumbSet.new << 1) << 2) << 3)
end
attr_reader :s
def test_contains
assert s.contains?(3)
assert s.contains?(2)
assert s.contains?(1)
assert !s.contains?(4)
end
def test_uniqueness
assert_equal [-32, 1, 2, 3], (s << -32 << -32 << -32).to_a.sort
end
end
And because I spotted you writing new code and yelled “HEY USE RANTLY IT’S SO COOL YIPEE”, you add some property tests:
class DumbSetTest < Minitest::Test
# ...
def test_contains_property
property {
array(range(0, 100)) { integer }
}.check { |vs|
s = vs.inject(DumbSet.new) { |ds, v| ds << v }
assert vs.all? { |v| s.contains?(v) }
}
end
def test_uniqueness_property
property {
array(range(0, 100)) { integer }
}.check { |vs|
ns = vs.inject(DumbSet.new) { |ds, v| ds << v }
rs = vs.inject(ns) { |ds, v| ds << v }
assert_equal vs.sort, ns.to_a.sort
}
end
# ...
end
It looks good:
$ bundle exec ruby set_test.rb
Run options: --seed 15625
# Running:
..........
success: 100 tests
..
..........
success: 100 tests
..
Finished in 0.119717s, 33.4121 runs/s, 1720.7247 assertions/s.
4 runs, 206 assertions, 0 failures, 0 errors, 0 skips
You then implement the removal of items:
class DumbSet
# ...
def delete(v)
ls = storage
ns = DumbSet.new
while !ls.empty?
if ls.head != v
ns = ns << v
end
ls = ls.tail
end
ns
end
# ...
end
class DumbSetTest < Minitest::TestCase
# ...
def test_delete
os = (((DumbSet.new << 1) << 2) << 3)
ns = os.delete(1337)
assert_equal [1, 2, 3], ns.to_a.sort
ns = os.delete(3)
assert_equal [1, 2], ns.to_a.sort
ns = ns.delete(2)
assert_equal [1], ns.to_a.sort
ns = (ns << 432).delete(1)
assert_equal [432], ns.to_a.sort
ns = ns.delete(432)
assert_equal [], ns.to_a.sort
end
# ...
end
Your tests pass, but this time you don’t listen to me about adding another property. You’re just not that convinced they’re worth their salt, and it looks good enough to ship with all the tests you’ve added. The Pokémon Collecting app you work on can benefit from it right now, instead of 20 minutes from now. To production it goes.
Time goes by, and you’ve forgotten about me and our little adventure. Your system is humming along and moved on to maintenance mode. Days have been kind of slow, so you decide to add an optimization you’ve read about in Hacker News, detailing how a node.js program got a 10x speedup. You modify your delete method accordingly:
# ...
def delete(v)
ls = storage
tmp = DumbSet.new
while !ls.empty?
if (ls.head != v) && (ls.head < 1500) # secret performance trick
tmp = tmp << ls.head
end
ls = ls.tail
end
tmp
end
# ...
CI still reports all green.
A few days later, you receive a report from a User telling she deleted their Pokémon with power level 3, but her Pokémons with levels 4013, 1551 and 20000 disappeared. Your first line of defense — your tests — have not caught any issues. Sweating bullets and drowning in emails from stakeholders and other Pokémon fiends, you’re about to collapse.
And then you remember: what about trying to express a property to see if it holds?
# We'll add at most 10 unique items and then delete the first
# 2. If there's anything wrong, this will blow up.
def test_delete_property
property {
array(10) { range(0, 5000) }.uniq
}.check { |values|
os = values.inject(DumbSet.new) { |s, v| s << v }
ds = values[0..1].inject(os) { |s, v| s.delete(v) }
assert_equal (values.size - 2), ds.to_a.size
}
end
You run it and it explodes:
$ bundle exec ruby set_test.rb
Run options: --seed 46455
# Running:
..........
success: 100 tests
..
failure: 0 tests, on:
[384, 437, 120, 718, 1850, 4579, 3178, 4191, 533, 2669]
F
..........
success: 100 tests
...
Finished in 0.093858s, 63.9264 runs/s, 2248.0769 assertions/s.
1) Failure:
DumbSetTest#test_delete_property [set_test.rb:69]:
Expected: 8
Actual: 2
6 runs, 211 assertions, 1 failures, 0 errors, 0 skips
What? How come you’ve only got 2 when you expected 8? Well, there must be something wrong with delete, after all. Let’s take that array and try it on an pry session to see what happens:
[1] pry(main)> values = [384, 437, 120, 718, 1850, 4579, 3178, 4191, 533, 2669]
=> [384, 437, 120, 718, 1850, 4579, 3178, 4191, 533, 2669]
[2] pry(main)> os = values.inject(DumbSet.new) { |s, v| s << v }
=> #<DumbSet...>
[3] pry(main)> values[0..1].inject(os) { |s, v| s.delete(v) }.to_a
=> [718, 533]
Wait a minute! Should delete also remove everything that’s over 1000-ish? Is there anything in the code that stipulates such a thing? Maybe that node.js optimization was not so great after all. Let’s remove it and run the tests:
$ bundle exec ruby set_test.rb
Run options: --seed 2727
# Running:
.
..........
success: 100 tests
..
..........
success: 100 tests
.
..........
success: 100 tests
..
Finished in 0.099329s, 60.4053 runs/s, 3120.9415 assertions/s.
6 runs, 310 assertions, 0 failures, 0 errors, 0 skips
Voilà: properties have saved the day, and you’ve learned not to trust Hacker News bravado ever again.
Is using Rantly the same as using QuickCheck or ScalaCheck?
Sort of. For one, you have to write your own generators every time you want something other than basic types, while both QuickCheck and ScalaCheck can figure out a lot by themselves. This can make expressing what you mean a lot easier, and you don’t spend time debugging your property
blocks in search of mistakes. That said, writing a generator for your own types requires only that you instantiate them in the property
blocks with other auto-generated values.
Shrinking is not as good in Rantly. It works ok a lot of the time, but it could be improved. On the surface, from skimming the algorithms used in ScalaCheck and Rantly, it doesn’t seem that different, but over that side of the line the patterns in minimization seem easier to spot.
There’s also no mechanism to test stateful code. ScalaCheck has Commands to help in modeling state changes, and I’m aware PropEr and QuickCheck for Erlang also provide something in that direction.
One minor thing is that integration with RSpec and MiniTest could be improved. Its output pollutes the test run, and on large suites it becomes hard to know the relationship between a failed property and a failed test. It should be easy to fix for anyone motivated. On that note, there’s no ready-made extension for MiniTest (although adding one is trivial enough that I’m sending a PR to fix it).
Final considerations
I hope I have proven, even if with a craptastic example, that property-testing can aid you in writing better Ruby code. Our community is great when it comes to using tests both as a design and as a verification tool, and QuickCheck (via Rantly) is a new way of thinking about them. You should keep your TDD/BDD to carve out your objects and responsibilities, but also add property checks where suitable to strengthen your confidence in the system.