functional-pipeline

MC706.io

2019-02-16

CI/CD, Functional Programming, Gitlab, Library, Pipes, Python, ReadTheDocs

Continuing from my last post dealing with python pipes, and usage of the reduce pipeline, today I released functional-pipeline, a library to package the reduce pipeline along with a bunch of helpers to make working on functional pipelines easier.

I have already posted at length about functional programming in python, and this library doesnt really add anything new beyond the helpers (which you can read about on the docs), so I wont go on more about that. Instead, this post I want to go over a little bit about the process of releasing that package, and the work surrounding it.

This was the first open source library I got to a full release since the changelog-cli; which means its been over a year since I last went through the rigor of open sourcing a codebase and using all of the tools available for open source. Back then my workflow looked a bit like this:

Checkout the codebase and update to the latest
Make and changes, test them, get everything running locally
Make sure tox and tests pass locally
Push changes to github and make sure travis, coveralls, landscape.io all pass and look good
Run invoke locally to use the changelog-cli to cut a release, tag it, build a wheel and publish via twine

Some of the things this didn’t really account for.

How to contributors help out or pick up the workflow
How is documentation done
The CI (travis, coveralls, landscape.io) is independent of the release and deploy
Pypi Supports RST but not markdown (at the time)

So after a year of delivering software, getting intimately familiar with gitlab, docker, and other build tools, and publishing a few private repositories at work, I decided to start refining my workflow.

The first thing I addressed was do I continue using Github + Travis or do I go with the one-stop-shop that is Gitlab. Over the past few years working in Gitlab, I have fallen in love the CI configuration as code model. Travis had this as well, however it being all in the same platform and just working together is awesome. I ended up hosting the open source code on Gitlab.

Next step was to setup a proper branching strategy. Gitlab allows me to deny pushes and restrict merges to master. This way all changes need to go through CI to make it into master, and at least need to go through a MR. This not only gives a stop-gap for approval, code review, and ensuring that CI is passing; it significantly cleans up the git history of the codebase by allowing me to force fast-forwards and squashing all branches before they make it to master. This allows me to have a ton of “commit early and often” commits while I am working on a feature, but then small book length commit messages and very descriptive history by the time it all gets to master.

Next I changed to only doing deploys from CI and only on Tags. Like the master branch, I can restrict access on who can cut tags. Once a few changes have made it to master (Along with the changelog changes), I cut a release branch, update the changelog and __version__ variable in the root __init__.py, and merge the release branch into master. As soon as it builds, I cut a tag off of master, which triggers a only: - tags build in gitlab which pulls my protected Username and Password for Pypi and builds and deploys that tag as the release version. This way releases are 1:1 with tags as creating tags kicks off the build and deploy of a release.

One of the main things I have been wanting to upgrade about my workflow has been documentation. Previously I have just used Markdown files in the repo. Since then I have played with Sphinx, MkDocs, and a few other python documentation generation libraries. Due to how much I love markdown, I decided to go with MKDocs. I wanted to publish the docs on ReadTheDocs, and luckily they have a guide on how to do it with mkdocs.

Along with documentation, testing has previously been a tedious thing for me. I have had a long love/hate relationship with pytest, mostly due to some very poorly written tests that pytest allowed for far too many layers of abstraction for. However pytest has one of the best remaining test discovery runners, and for this project, the fact that it can run doctests as part of that discovery is great.

Quick sidenote on doctest. When I was first learning python, doctest taught to me as a quick way to prove something is working. Since then, I have almost never used them, always going with proper unittest suites and testing folders. A few years ago, while doing daily challenges, I found it nice to write the function name, its signature, and a few smoke tests in doctest as a form of mini TDD. I would then work on the problem until the tests passed. In the end, I had the tests, the documentation and the code all in one place that was easy to read like so:

def fizzbuzz_gen(n: int) -> List[str]:
    """
    Return FizzBuzz of length n. Safely return empty list for non valid inputs.
    
    Examples:
        >>> fizzbuzz_gen(7)
        ['1', '2', 'Fizz', '4', 'Buzz', 'Fizz', '7']
        >>> len(fizbuzz_gen(20))
        20 
        >>> fizzbuzz_gen(0)
        []
        >>> fizzbuzz_gen(-1)
        []
        >>> fizzbuzz_gen('a')
        []
    """
    if not isintance(n, int) or n <= 0:
        return []
    def _fizzbuzz(i: int) -> str:
        if i % 15:
            return "FizzBuzz"
        if i % 5:
            return "Buzz"
        if i % 3:
            return "Fizz"
        return str(i)
    return [_fizzbuzz(a) for a in range(1, n+1)]
    
if __name__ == "__main__":
    import doctest
    doctest.testmod()

Using doctest, I can write easier to read documentation with examples, use basic TDD principles, and have it all in one place.

The self contained nature of doctest makes it perfect for testing small pure functions. So writing a functional pipeline, I decided to use docstring doctests everywhere. pytest can discover all doctests with a flag so it was an easy choice.

While I had previously used doctest for unit tests, this time I discovered something new, doctest can also read your documentation. By adding the --doctest-glob='*.md' flag to pytest, it will scan all of my documentation examples and make sure they compile. This often needs the multiline syntax of doctest:

>>> from functional_pipeline import pipeline, not_none, lens

>>> people = [
...     { 
...         'first_name': 'John', 
...         'last_name': 'Smith', 
...         'age': 32, 
...         'employment': [
...             {'name': 'McDonalds', 'position': 'Manager'},
...             {'name': 'CSV', 'position': 'Cashier'}
...         ]
...     },
...     { 
...         'first_name': 'Jane', 
...         'last_name': 'Smith', 
...         'age': 30, 
...         'employment': [
...             {'name': 'BurgerKing', 'position': 'Manager'}
...         ]
...     },
...     {
...         'first_name': 'Billy', 
...         'last_name': 'Bob', 
...         'age': 55, 
...         'employment': [
...             {'name': 'Microsoft', 'position': 'Programmer'}
...         ]
...     },
...     {
...         'first_name': 'Jill', 
...         'last_name': 'Jones', 
...         'age': 21, 
...         'employment': []
...     },
... ]

>>> full_names_with_employment_history = pipeline(
...     people,
...     [
...         (filter, lambda x: not_none(lens('employment.0')(x))),
...         (map, lambda x: f"{x['first_name']} {x['last_name']}"),
...         list,
...     ]
... )

>>> full_names_with_employment_history
['John Smith', 'Jane Smith', 'Billy Bob']

This discovery ended up being my favorite part of the project. I could confidently demo all of my code in the documentation that gets deployed to ReadTheDocs while increasing coverage.

Overall I am happy with the new pipeline driven workflow of CI and releasing this package. It means a lot less manual work for me and releases being much better tested and documented. I was also happy to discover that the new version of pypi supports markdown long_descriptions, so I can just read in my README.md at deploy time.