Testing tabular data with hamcrest in Java

In the course of writing functional tests, I often had to check the correctness of the data in various tables. Tables are found on web pages, databases, or even excel files. In any case, it was necessary to verify that their contents correspond to the given, that is, to what is created in the test scenario.

This post is about how to record such checks using the hamcrest library and why.

At first, everything was simple.

Type checks: in the column "Salary", in the 15th row, there should be a "million". For this, I had a method, such as this one: assertCellByColumnAndRowNumber which was duplicated with enviable regularity, here and there.

Then everything got a little complicated, and it was necessary to check not by the number of the row, but by the primary key: in the column “Salary” for “Name” “Vasily Ivanovich” should be “million”. It's okay, the method was born assertCellByColumnAndPrimaryKey , and of course, he was born in more than one place.

Then there were places where the primary key was composite. The methods that worked with the composite key began to take even more values ​​as input, it became more difficult to understand the code.

I was finished off by the case when it was necessary to check the following: For rows where “Status” is “OK”, the column “Type” contains the values ​​“A, B, C” in order from top to bottom.
It would be possible to make another method with a very long name and a bunch of variables, but I began to understand that this will not stop there, there will be more and more methods, and it will be more and more difficult to write and support tests.

Therefore, I decided to use hamcrest in order to write down any conditions for tables in a unified form and finally get rid of a bunch of methods doing various checks.

I managed to write down the check for the Type column like this:
assertThat(
	table,
	column("Type",contains("A","B","C")).where(cell("Status", is("Ok")))
);

Now more about how this works.

Table ( table ) is represented by a collection of rows ( class Table extends Collection<Row> )
In order to record the verification of such a table, I created hamcrest matchers that specify the condition for a series or the entire table.

So far, the following players have been enough for me:
  1. CellMatcher sets the condition for one row cell.
    For example:
    cell("Id", greaterThan(0))
    will select a row if the “Id” column contains a value greater than 0.
    In order to create such a match, you need to specify the name of the column and any “standard” hamcrest match that will check the value in this column.

    Using this gamer and standard collection gamers, you can record the conditions on the entire table.

    For example, using a "library" matchera everyItem , which checks each element of the collection (number of tables) for compliance with a certain rule, we can write this condition
    in each row of the table, the value of Id is greater than zero, and Time is not empty (no null ):
    everyItem(both(cell("Id", greaterThan(0))).and(cell("Time", notNullValue())))
    And adding to the CellMatcher functionality of the standard CombinableMatcher , such a condition is written even easier - without the word both:
    everyItem(cell("Id", greaterThan(0)).and(cell("Time", notNullValue()))))
    

  2. FilterMatcher - filters the table based on one match, and then applies the second match to the remaining rows.

    As the first matchter (filter) is used CellMatcher , or the union of several CellMatcher .
    Using, FilterMatcher you can rewrite the previous example like this:
    where(cell("Id",greaterThan(0)),everyItem(cell("Time",notNullValue())))
    
    In this case, we check that Time is not empty (not null ) for all series where Id> 0. Where Id is 0 or negative, Time may be empty, unlike the previous example.

  3. ColumnMatcher sets the condition for all values ​​of one column of the table.
    For example:
    column("Action", contains("Active", "Pause", "Active", "Closed"))
    
    sets the condition according to which, in the column "Action" contains the values ​​in order: "Active", "Pause", "Active", "Closed".
    Instead of the standard library mater, contains you can use any other matchers in the collection (the column is presented as a one-dimensional collection of objects), such as containsInAnyOrder, hasItem others.

    Of course, you can add a filter to such conditions:
    column("Action", contains("Active", "Closed")).where(cell("Id",greaterThan(2)))
    
    So we check that for rows with id greater than 2, the Action column contains values ​​in order: "Active", "Closed".

    ColumnMatcher makes it possible to use aggregating matchers to elegantly check conditions for the amount of at least a maximum of a column. for example
    column("Salary", sum(is(100000))).where(cell("Type",is("fulltime")))
    
    allows you to check the amount of fulltime salaries of employees.

  4. ColumnsMatcher allows you to cut out several columns from the table and set the condition for the resulting two-dimensional data array. For example:
    sliced(byColumns("Action", "Time"),
    	contains(row("Pause", "12:00"),
    	  	  row("Active", "12:30"),
    		  row("Closed", "14:00")))
    		.where(<some condition>)
    
    Here, having selected from the entire, probably very large, table, only the columns “Action” and “Time”, we check that they contain clearly defined values.

  5. Because the table is a standard collection, we can set a condition for the number of its rows, for example: not(empty()),iterableWithSize(lessThan(10)) using standard matchers from hamcrest and not reinvent your bike.

It took a day and a half to write the matchers, and in itself it was a very interesting activity, which served as an excellent practice in design patterns. I had to make a lot of architectural, design decisions, starting from the moment that it is better to present a table?

Probably, it was the richest one and a half days, based on the number of architectural solutions per line of code, which turned out to be less than 15 kilobytes.

In general, this turned out to be a separate miniproject, with several refactoring cycles, and design changes that were required during the writing and application of the written matchers. I even wrote a small unit test, in practice using TDD for the first time in my life.

I thought that this could be a great example for studying TDD (and other practices) by beginners or a good topic for practical questions at the interview (which I have to conduct) in order to identify the architectural, “design” abilities of the candidate, who often already know the answer to the question why the sewer manholes are round. (just kidding, I never ask him, and you?).

Conclusion:
The described table matchers allowed:
  • write down all the checks for the tables that I met in my tests,
  • get rid of code duplication,
  • make the code shorter and more understandable
  • make error messages clearer with a more precise indication of the rows or cells that caused the problem,
  • automatically get a log with a description of all the checks that we use as part of the " BDD vice versa "
  • approach , about which I will write next time.
  • image