Testing in Python
Writing good unit tests in Python with ease — Part 2
Part 2: Testing basics with pytest
This article series is aimed at people working in the public sector (like me), academics and students, people new to their careers in the private sector and anyone else for whom testing code is not yet second nature. When I refer to “tests” or “testing” in this article the context is constrained to unit and component/modular tests. My experience is mostly derived from building analytical pipelines in Python using pandas
and pyspark
and testing in pytest
.
The series started as a written summary of a presentation given to the Office for National Statistics on Feb 26 2021. There is also a GitHub repo with some more advanced test parameterisation examples, in addition to other resources.
Here is a list of the articles in this series:
- Part 1: Why you should write unit tests
- Part 2: The basics of testing with pytest (this article)
- Part 3: Testing workflow tips
Here are the topics covered in this chapter:
Introduction
So you’ve made it through the opinionated stuff in Part 1. Now I want to cover a few basics; a series of rules to live by that can help to streamline your decision-making process and reduce much of the burden of writing tests.
As with all best practice guidance, it is merely “a guide” and requires you to apply your own judgement to the use case in front of you. If any of these tips resonate with your own thoughts, or if you have any disagreements, please leave a comment!
The anatomy of a test
For those who wouldn’t be able to point out a test in a police line up, here’s a basic example assuming pytest
layout:
- We have our function definition
to_snake_case
that we want to test. - The test function name is prefixed with
test_
- The test input is a simple valid input to our function (a positive test case)
- We compute the result using our function
- And we check that the result matches our expectations using
assert
And that’s it!
Structure
- Your tests should live in a directory at the top level of your project
- Tests are then grouped into modules in the test directory
- The test directory is usually a mirror image of your source directory
- Tests can be further grouped using test classes (but don’t have to be if using
pytest
)
Naming
- Test module names should start with “test_” (or end with “_test”)
- So should test function names
- The name should be descriptive enough so that you know exactly what the test is testing
- Don’t worry if your test function names are long — the point above is more important
- The aim is to make it as readable as possible for your
pytest
output
Function naming
- They should be long and descriptive
- Include the function name
Class and test method naming
- Name the test class after the function under test
- Use CamelCase as you would with classes elsewhere in Python e.g.
class TestMyFunction
- No need to repeat the function name in the test functions as it comes through from the class name in the
pytest
output
Parameterised test naming
Parameterisation is covered later in the article, but when it comes to naming my parameterised tests the main thing for me is to avoid repetition.
Assert your authority
The main component of any test is the assert
statement which is used to test whether some condition is True or False. Unlike the unittest
framework, which requires the use of different assertion methods for different use cases (e.g. AssertTrue
, AssertIsNot
, AssertIn
) pytest
makes use of the built-in keyword assert
for universal testing of native Python objects — giving you back a few valuable bytes of brain memory… which I will now fill with the winners of the Best Visual Effects at The Oscars between 1972–1982.
pytest
generates a comprehensive output from the assert statement that gives you the context to see exactly why a test is failing, which you can then fix as quickly as possible. This is really great for native Python objects, but what if you’re using objects from external libraries? Some libraries provide their own assertion functions that offer a way to test for object equality and provide a helpful output when they are not:
pandas
providesassert_frame_equal
andassert_series_equal
- For
pyspark
I usechispa
and it’sassert_df_equality
function
These assertion functions are usually just a combination of multiple assert statements about each of the relevant properties of the object, and tend to provide some customisation on what is being tested through the passed arguments, so be sure to have a read of the documentation before you use them.
The pytest
documentation says that:
For unit tests, each test function or method should have one and only one assert statement.
But I do find myself occasionally adding another assert
if I determine that it’s necessary and still falls within the boundary of my test description. I know — sue me.
How do you know what to test?
So the basics of test structure are locked in, but what should we actually test for? To start, we can think about:
- Positive testing
- Edge cases
- Coverage
- Negative testing
Positive testing
Testing that the function returns the output we expect without throwing any exceptions.
If your functions under test are executing one single unit of logic, then with a little creativity it shouldn’t be difficult to contrive a simple scenario for which you can calculate an expected output. You can do this in your head. You may wish to do it on paper or Excel if there are some calculations to do.
If it’s a critical or vulnerable piece of code that you’re testing, it might be worth asking someone else to derive the expected output to reduce the risk of mistakes. A test that passes for the wrong reasons is one of the sneakier bugs out there, (given that it passes tests!), so ensuring there are no mistakes in the expected output never goes amiss.
Edge Cases
An uncommon or odd occurrence.
Edges cases are unexpected things you should try to expect. They usually occur at some upper or lower limits, hence the name, but they can come in a number of varieties. You have two strategies for dealing with edge cases: foresight and hindsight.
Foresight: have a sit down and think about all the different and strange scenarios that the function will have to handle. If working with data, think about what peculiarities might occur within the data for different data types and how your function will deal with them. When working with a collection of number values, this might be:
- What happens when some of the values are zero or NaN?
- What happens if all the values are zero or NaN?
Practising writing tests and catching bugs is the best way to develop your foresight.
Hindsight: you will inevitably discover a bug in your test or production environments. When you’re writing a fix for those, make sure to write a test too so you can be sure that you don’t introduce the same bug again. Do the same with any lines of code that aim to preemptively deal with a bug (foresight). Write a test to go with it.
Never allow the same bug to bite you twice.
Steve Maguire
Finally, each test should test one edge case and one edge case only. If you are passing a pandas
data frame in as an input for a test that contains several different edge cases in the data, if the test then fails you won’t know which edge case is breaking the data! Split each edge case out into it’s own test.
Coverage
In it’s simplest terms, coverage is a percentage score for how many lines of your code are executed by the tests. This score is brought down by untested functions and not testing all routes of any control-flow that you may have within your program i.e. if/else statements.
The main point I wanted to cover here, is if you have control-flow in your functions governed by parameters that are either a boolean switch (True/False), or one of a set of options, then you must test for all relevant combinations of these parameters! The definition of relevant here is up to you, but at a minimum make sure that your tests are executing every line within your function at least once.
Once you’re familiar with testing, 100% coverage should be the aim, but it’s not the be all and end all. A suite of bad tests can still achieve 100% coverage.
Negative testing
Negative testing is about putting yourself in the users’ shoes and imagining all the weird and wonderful ways that a user might misuse your functions or program. Some questions to ask yourself might be:
- How might users break things? (they are particularly good at this)
- What mistakes might they make when using a function?
- Are there parts of the documentation the user might misinterpret?
- What error message would you like to see to help you do things correctly?
Getting in the mindset of negative testing is a great way to begin improving your design choices. By putting the user first, it’s going to help you write clearer code and develop a much more usable API.
Remember however, that not everyone has your prior knowledge. You may be the author of the package after all, and you’re at risk of making assumptions on the experience of your users where perhaps you shouldn’t be. A good way around this is to get other people involved. Find some people that haven’t been toiling over this program for the past 6 months and see how they fare with using it. It won’t take long before they point something out that can be improved.
Practically, negative testing involves writing a test to intentionally break something. What you’re testing for is that your function handles things in the way you expect. Typically this is testing that an exception has been raised. You can assert that a function has raised an exception in the pytest
library using pytest.raises()
. The simple example given in the docs is below:
Negative tests complement positive tests in a test suite by showing a developer how functions are not supposed to be used. They add to the richness of your documentation and help developers understand the intentions behind your API.
Priorities
We may not always get as much time as we like to test everything, (despite how much we try to make time), so sometimes you will have to prioritise what you test.
- Test the most complex or vulnerable parts of your codes first
- Focus on the realistic use cases
- Don’t attempt to test every possible input and type of input
Don’t focus on built-in functions or methods, or functionality from a package that is already sufficiently tested. Instead focus on where you are using sufficiently tested functions and methods in some combination to create some new untested behaviour.
Test Data
Tests typically need some input data and expected output data. So where does this come from? And where do you put your data? Here are some of my top tips on test data:
#1 Only use the minimum amount of test data required to properly satisfy the test case. For tests involving pandas DataFrames this may only be a few rows, if that.
#2 Keep the test data close to the test. Tests should be readable. Ideally a user should be able to see the input data, expected output data and the test body all on one screen. And if not on one screen, then with the minimum amount of scrolling necessary. The closest the data can get to the test is in the test body itself. The next closest is in a list of parametrised cases. If your test data are in pytest fixtures, then position them as close to the test as you can. For example, if your tests are grouped by a class, then have the fixtures in the same grouping above the test (unless the fixture is shared by a number of different test class groupings: in that case, use your own judgement based on this rule).
#3 For generalised functions, generalise the test data. This may mean stripping out the context of the original problem that it was created to solve. What’s important is to retain the same data types, and ensure the features that will prove your test case are present in the test dataset. If your function represents a logical building block that you think could be reused elsewhere in the code, or if you feel it could have value for others, then generalising it is the way to go. It’s a good habit to get into. By generalising I mean extract the variables as parameters and give those parameters general names. Try to avoid expecting implicit class attributes without making the user explicitly pass them (e.g. pandas.DataFrame
column names).
#4 Hard code your data where possible. This relates to point #2. Where I generally make an exception for hard coding data is if the input data for my test is too long, meaning I have to scroll to see the entire dataset in my editor. At this point I’m losing the readability benefits of have the test data in the script. This typically occurs when I’m doing component testing, as the inputs generally need to be more complex.
#5 If your input is a data frame, only include the columns you need. This relates to point #1 and #3 but I thought it was worth isolating given the frequency that this occurs. For libraries that introduce a data frame object (e.g. pandas
/pyspark
), your input test data only needs to include the columns of data that are actually needed for the test case to be proven. There’s no need to synthesise additional columns just because they exist in the context of your original problem. If they’re not actually used and consumed by your function under test, they don’t need to be in the test data.
How to format your test data
When my input data is in a data frame format, I tend to layout my data as a list of tuples like below. What I aim for is to get it as close as possible to how it might be printed in a pretty format — a readable format.
One small thing I’m doing here to aid readability is the use of the create_dataframe
function which I usually keep in my conftest.py
and import into my test script when I need it. All it does is call the pandas.DataFrame
constructor but uses the first row as the column headers. It’s a small change, but having my column headers at the head of my data rather than at the foot of it does wonders for readability.
It is also sometimes beneficial to align data into neat columns by adding additional white space. It can add to readability and make editing easier if your development environment supports multiple cursor editing and column box selection/cutting/pasting. Doing this may break some PEP8 conventions, but in my opinion you have more of a license to break some of those rules in test scripts. This suggestion is a matter of preference though, as you can end up spending extra effort re-aligning things while your test data is still being written.
Data sources
Getting the right data for a test can feel like a bit of a mental effort when you’re starting out, but if you follow my tips on test data then you will find all it takes is a little creativity and knowledge of the function (for unit tests at least).
For component tests, often the series of logical steps taken is too complex to be able to conjure expected output from mental calculations. You may need a tool to help you, such as Excel. I often create functions that perform different statistical equations, so Excel is a good tool for hashing out a solution in a step-by-step and visual way while generating some expected output data. This way it can be checked and validated by someone else, usually the person or team setting the requirements. Better yet, I get them to create it for me so I can concentrate on what I do best. If you are setting your own requirements, then get a second set of eyes to validate the implementation of methods used to generate your data — you’ll almost certainly make some mistakes.
Fake data generators
Fake data generators are useful when you’re less concerned about any particular patterns in the test data and just want some data that’s the right type and the right format. Or if you’re after a lot of test data.
One Python package for this is mimesis
. This is lifted straight from the docs:
“The fake data could be used to populate a testing database, create fake API endpoints, create JSON and XML files of arbitrary structure, anonymize data taken from production and etc.”
So fake data is mostly useful when you’re looking to “mock” a call to some external system such as a database or API. Mocking is a separate topic in itself and won’t be covered any further in this article.
An alternative package for fake data in Python is faker
.
When generating randomised fake data, make sure to use a random seed so that the generation of your fake data can be reproduced.
Using fixtures
Fixtures are a construct within the pytest
package to help initialise test functions. They are a form of dependency injection and can be used to setup the operating environment, services, state or data that a test function depends on. From the docs: “they provide a fixed baseline so that tests execute reliably and produce consistent, repeatable, results.”
You may have noticed two sections ago when I was demonstrating how I format test data that I used a fixture. They are special functions that are collected by the pytest
runner, and they typically contain some data or object that is then injected into the test at the time that test is executed. They are defined with a decorator, like so:
When should you use a fixture? Typically if you have some setup, such as a dataset or a service that you want to re-use several times. I sometimes use fixtures for input test datasets that are only used once, but only if there are a number of additional setup steps to get the data in the right format that I want to remove from the main test body for readability. Increasingly though, I’m using fixture functions as a way to manipulate input data in repeatable ways.
Sharing fixtures across test scripts
A convenient way to do this in pytest
is to use the conftest.py
file. It sits at the top level of your test directory, (although lower level conftest.py
files can be created with greater precedence), and you should put in all the fixtures that you want to share throughout your test suite. These fixtures are then automatically available to pass in as arguments (dependencies) for your tests in any script without the need to explicitly import them. Pretty neat!
Fixture scope
When you run a test suite with pytest
, it handles the setup and teardown of that fixture internally. The default scope is to setup and teardown for each test function that uses the fixture. Changing the scope of a fixture allows you to specify that a fixture is shared for a group of tests, in other words, the fixture is only setup once and torn down once for those tests.
It’s specified by passing a value to the scope
argument within the fixture decorator. The options are fairly self-explanatory: {“function”, “class”, “module”, “session”}.
Changing the scope is particularly useful for resource intensive setup and where the state is not of concern. A good example would be sharing a SparkSession
across all tests using the “session” scope, as creating a new one for each test will significantly slow your test suite.
Sometimes the state of the fixture is of concern though — consider this scenario. Your fixture initialises an empty database. The scope is set to “class”. The first test within the class adds some rows to the database. The next test checks whether the database is empty, and we are expecting it to assert that it is an empty database as was specified in the fixture. However, the state of that fixture was changed in the first test — rows were added — and now the database is not empty for the second test and that test fails. This is a scenario you should bear in mind when changing scope and grouping your tests.
Fixture functions
When a test is initialised with a fixture dependency, it is actually the return value of that fixture that is being requested when the test executes. The best way to illustrate this is with a quick example:
You can see that within the test, the name my_input_data
is actually bound to the return value and is no longer callable as a function, despite that being the way we defined it. This is a quirk of the fixture decorator where return values are requested at test execution time.
Fixtures can be used as functions however, if the return value is callable. To define a fixture function you need to define an inner function and return that. Here’s an example of one I use often to convert a pandas.DataFrame
to a pyspark.sql.DataFrame
:
You’ll notice that the fixture uses the previously defined spark_session
fixture as an input. This is a cool feature of pytest
that allows you to layer up fixture function behaviour in a modularised fashion. A couple of other things to note:
- The inner function is named using an underscore as this is the convention for throwaway variables. The name we want to bind this function to is
to_spark
and sincepytest
will do that automatically at test execution time there’s no need to think of an additional name for this inner function. Using “_” in this instance keeps things neat. - The function takes
args
andkwargs
so that when the fixture function is used, arguments can be passed to parameters that belong tospark_session.createDataFrame
without having to explicitly redefine them.
If we put this fixture function in the conftest.py
, then it can be passed in as an argument to any test and used as a function within the test body, all without explicitly importing it. I find this particularly useful as it means I have a suite of tools available at my fingertips without having to interrupt the flow of writing tests and test data.
Here’s how it would be used to convert the pandas.DataFrame
from earlier to a pyspark
one:
I tend to construct pyspark
test data from pandas
data because it’s more familiar to me and there’s no need to define the schema unless I really have to (it involves quite a bit of boilerplate in pyspark
).
Why bother using to_spark
at all when we can just use spark_session.createDataFrame
? Well, it’s a fair few less characters for something I use often, and it allows me to keep things all in snake_case as pyspark
tends to introduce some unnecessary camelCase for its class methods. This will save time in the long run, but it’s mainly to improve readability.
Test parameterisation
The pytest
library has support for parameterising tests, which put simply, is the ability to run the same test for multiple sets of inputs and outputs. It’s something I do often but not necessarily in the documented way. Used properly, it can be a very neat and efficient way to extend your test cases for a given function.
I feel like the pytest
docs leave a lot to be desired, but they are still a good place to start. I’m just going to scratch the surface here as test parameterisation deserves it’s own article, one which I intend to write and then link to here once it’s finished.
Here is a simple case of using pytest.mark.parametrize
:
Let’s break it down:
- Uses the
@pytest.mark.parametrize
decorator - For the first argument, passes in the parameters to be parameterised as a comma separated string:
'digits,expected'
- For the second argument, passes in a list of tuples which contain the parameter values for each test.
- Pass the parameter names in as arguments to the test, and use them as you would in a usual function.
The result from this, will be four passing tests — each for a different number of digits.
My concern here would be that the way this layout is not particularly readable or Pythonic. Passing in the parameter names as a comma-separated string? No thank you. Having a list of tuples where it is challenging to visually see which parameter each value corresponds to, especially if the number of parameters grows? Uckk. Naming each test is possible using the ids
parameter, but then the names of the test are not co-located with the parameters. Drat, drat, triple drat.
Luckily parametrize_cases comes to the rescue. I won’t go into the examples here but you an dig in for yourself in the accompanying GitHub repo. I’ll just show you the final result (in the context of some prices index work I was doing) so you can get an idea of how useful it can be. This example shows three cases for this test, but with this layout it makes it very easy to extend the number of test cases (in reality I have over 20 cases here).
Wrap Up
>>> Continue to: Part 3 —Testing workflow tips
- OK — now you’ve read the article, go write some tests.
- Check out the accompanying GitHub repo for some additional resources.
- If you liked something about this article or found it interesting, please leave a “clap”.
- If you disagreed with something or have something else to share then please leave a comment.
I hope these articles will be a good reference for you on your testing journey. Bon chance!