This blog post was originally published in the Orbs Engineering blog in 2018.
My previous post described how the Go reference implementation of the Orbs Network blockchain protocol used Hexagonal Architecture to facilitate upfront design, while allowing the inner design of components inside the system to emerge via TDD. We left off with a promise to discuss Component Tests, as well as elaborating more about how to make use of the Hexagonal Architecture to deal with concurrency and flakiness. In this post, we will discuss the component testing strategy we used at Orbs.
Martin Fowler says that
A component test limits the scope of the exercised software to a portion of the system under test… using test doubles to isolate the code under test from other components
Given that our system was composed of bounded contexts defined via Protobuf, it made a lot of sense to use Component Tests as a major driver in our TDD workflow. This meant that the tests tell the stories of the components’ behavior, given the initial state (before the test) and the expected outcomes. When we started expanding the system, following the completion of our Walking Skeleton, each member of the development team had ownership of a specific component and could start implementing their component using Outside-In TDD, by writing component tests.
Let’s use a time machine to walk through the history of one of the component tests I wrote at Orbs — adding a new transaction into the system.
Naive Implementation
gossip := &gossiptopics.MockTransactionRelay{} gossip.When("RegisterTransactionRelayHandler", mock.Any).Return()
txpool := transactionpool.NewTransactionPool(gossip, log)tx := test.TransferTransaction().Build()
gossip.When("BroadcastForwardedTransactions", &gossiptopics.ForwardedTransactionsInput{
&gossipmessages.ForwardedTransactionsMessage{
SignedTransactions: []*protocol.SignedTransaction{tx}
},
}).Return(&gossiptopics.EmptyOutput{}, nil).Times(1)
txpool.AddNewTransaction(&services.AddNewTransactionInput{tx})
Expect(gossip).To(test.ExecuteAsPlanned())
Above is the first test I wrote for adding a new transaction. It uses a mocking library to setup a test double for our collaborator (the Gossip service, which propagates transactions to other nodes in the system), then invokes the SUT and asserts that the expected side effects (mock invocation) took place.
I also used the Ginkgo testing framework for running our test suite. Implementation is rather naive — add the transaction to a queue, wrap it in some boilerplate and pass it to the Gossip service. Since generating a valid transaction is complex and requires a lot of setup, I used the Test Object Builder pattern, assuming that it will pay off soon when we add further test cases.
Next, I added another test case; we don’t want to process invalid transactions, so we verify that they are not propagated to other nodes via gossip:
gossip := &gossiptopics.MockTransactionRelay{}
gossip.When("RegisterTransactionRelayHandler", mock.Any).Return()
txpool := transactionpool.NewTransactionPool(gossip, log)
tx := test.TransferTransaction().WithInvalidContent().Build()
gossip.When("BroadcastForwardedTransactions", mock.Any}).
Return(&gossiptopics.EmptyOutput{}, nil).Times(0)
txpool.AddNewTransaction(&services.AddNewTransactionInput{tx})
Expect(gossip).To(test.ExecuteAsPlanned())
Test Harness
Note that the two tests are very similar, differing only in the expected behavior of the mocked gossip service. Specifically, the code for setting up the service is completely identical between the two test cases, so taking a leap of faith and hoping that ignoring the rule of three will eventually pay off, I introduced a test driver (dubbed “harness” across the Orbs codebase) that abstracts away service initialization and mock setup. I also decided to move from Ginkgo to Testify because we could run tests directly from our IDE of choice (Goland) and receive a visual PASS/FAIL report for every test and has better reporting in general.
func TestForwardsANewValidTransaction(t *testing.T) {
h := newHarness()
tx := builders.TransferTransaction().Build()
h.expectTransactionToBeForwarded(tx)
err := h.addNewTransaction(tx)
require.NoError(t, err, "a valid tx was not added to pool")
require.NoError(t, h.verifyMocks())
}
func TestDoesNotForwardInvalidTransactions(t *testing.T) {
h := newHarness()
tx := builders.TransferTransaction().
WithInvalidContent().Build()
h.expectNoTransactionsToBeForwarded()
err := h.addNewTransaction(tx)
require.Error(t, err, "an invalid t was added to the pool")
require.NoError(t, h.verifyMocks())
}
By adding the harness, I started introducing the component test’s own semantic layer of abstraction. It helped in several ways: 1) by reducing clutter (for instance, hiding away Protobuf-related boilerplate); 2) by expressing the semantic intent in a concise way (compare the calls to expectTransactionToBeForwarded and expectNoTransactionsToBeForwarded with the messy setup code of the previous example); and 3) by abstracting away implementation details, making the test more resilient to changes (if we, for instance, choose to change our mocking library).
And indeed, it quickly payed off when I added the next test case:
func TestDoesNotAddTheSameTransactionTwice(t *testing.T) {
h := newHarness()
tx := builders.TransferTransaction().Build()
h.ignoringForwardMessages()
h.addNewTransaction(tx)
err := h.addNewTransaction(tx)
require.Error(t, err, "a tx was added twice to the pool")
}
Note that I added another mock setup function to the harness; since I covered the scenarios for transferring (or not) transactions via gossip, for the next test cases we can just instruct the mock to ignore all calls to this method. This aids in making the test more stable, as changes in implementation only affect those tests that are concerned with said implementation detail.
func (h *harness) ignoringForwardMessages() {
h.gossip.When("BroadcastForwardedTransactions", mock.Any).
Return(&gossiptopics.EmptyOutput{}, nil).AtLeast(0)
}
Next, I dealt with a variant of the previous test case: If a transaction that was already committed is sent again, we return the transaction receipt rather than an error. To aid with that, I added a new method to the test harness, reportTransactionAsCommitted, which invokes the complex logic instructing the transaction pool to mark a transaction as committed.
func TestReturnsReceiptForACommittedTransaction(t *testing.T) {
h := NewHarness()
tx := builders.TransferTransaction().Build()
h.ignoringForwardMessages()
h.addNewTransaction(tx)
h.reportTransactionAsCommitted(tx)
receipt, err := h.txpool.AddNewTransaction(
&services.AddNewTransactionInput{
SignedTransaction: tx,
})
require.NoError(t, err, "a committed tx was wrongly rejected")
require.Equal(t,
protocol.TRANSACTION_STATUS_COMMITTED,
receipt.TransactionStatus,
"expected transaction status to be committed")
require.Equal(t,
tx.Hash(),
receipt.TransactionReceipt.Txhash(),
"expected transaction receipt to contain transaction hash")
}
Further test cases added new complexity and required further changes to the harness, and — in few cases — to previous tests. You can see the current version of the transaction pool component tests here.
Component Test Bloat
If you take a look at the source code for the transaction pool component tests, you might feel that there are too many component tests; and indeed, at the time of writing this post, there were 27 component tests for the different methods exposed by the transaction pool. You might think to yourself, “Component tests cover a large scope of the system. Why didn’t you push most of the cases into unit tests?”
I think that the important number is not the amount of component tests, but rather, their level of complexity and the time it takes them to run. Since all of these tests run inside the inner hexagon and incur no IO costs, they are extremely fast, running in about 120ms for the entire suite on my machine. And since I used the test harness to abstract away a lot of the complexity, and since I did not repeat the same assertions in multiple tests, the maintenance load of these tests is greatly reduced. Going back to the testing matrix from my previous post, we define component tests as medium-scoped and fast. As long as they stay that way, there’s no real reason to break them down into unit tests.
When, then, do you introduce unit tests? I have a few heuristics:
Introduce unit tests when behavior branches a lot in a narrow area, that would result in a lot of very similar component tests. Validation logic is a good example.
As a variant of the previous point, introduce unit tests when there are a lot of implementation details surrounding a narrow area. In the Orbs case, the pending transaction pool is a good example, as it’s built upon a composite data structure (the building blocks of which should always be kept in sync), deals with synchronization and has some configurable limits that are too specific to test in the component tests.
Introduce unit tests when there’s some temporal logic (scheduling / timer-related) or concurrency, confounded with business logic that can be made referentially-transparent. In the unit test, only test the logic, without timers or concurrency, creating tight and concise test cases that drive functional code with no side-effects. Then, in an integration test, only test the temporal / concurrent logic, mocking out the business logic. This test might include sleeps, busy waits, or be repeated multiple times to help iron out any flakiness resulting from race conditions.
And finally, introduce unit tests as a means to extract reusable components that are shared across system modules. In many cases, these components will exhibit additional motivations for extracting unit tests (any of the previous heuristics), which might be a hint that we are indeed extracting the right unit of behavior out. In Orbs' case, we created a unit test for the BlockTracker in order to deal with problems pertaining to synchronization and temporal logic, then found out a couple of days later that the same unit of behavior can be reused across the entire system — which was when we moved it outside of the specific component where it was initially written, and into a shared package.
What about IO?
The transaction pool is an in-memory component with no persistence. You might be wondering how component tests dealing with persistent data, such as the block storage, are kept fast and concise. If these components need to deal with the file system or network, surely we’ll have to pay the penalty of IO operations, resulting in slower tests, and toppling the entire approach.
The answer is, of course, no. Going back to my previous post once again, you’ll note that I introduced, as part of writing the walking skeleton, a series of adapters that abstract away all IO concerns, and provided in-memory/in-process implementations of these adapters. When testing those components, such as the block storage, that interact with the outside world (via the adapters), we simply provide the in-memory implementations, keeping the test fast and simple. We can even add test-specific logic to the in-memory implementations, allowing us, for instance, to wait — in the block storage component test — until a block with a specific transaction has been committed.
Our thinking was that if and when we transition to microservices for some or all of the components of the system, we would strive to leave the current component tests as inner-component test. If you think of the component as a nested hexagon (inside the bigger hexagon of the Orbs node), these are inner-hexagon tests and so do not concern themselves with running a separate process or communicating via RPC. This would leave the microservice layer untested, requiring the addition of a very thin, additional layer of outer-component tests, which start the microservice in another process, and communicating with the component via RPC.
Summary
Component tests served as a major driver of the TDD process at Orbs, comprising more than 50 percent of the entire test suite. If kept devoid of such concerns as IO, synchronization, or temporal behavior, they can be as quick as your run-of-the-mill unit test, and in many cases provide better feedback for outside-in development than solitary unit tests. The important figure is not how many component tests you have, but rather how long it takes them to run and how fragile they are.
Comments