Summing Lists of Tuples

MC706.io

2018-02-08

Code-Quality, Functional-Programming, Python

A coworker came to me with a simple python problem and asked if I could find a better solution. The problem is as follows.

The Problem

Given a list of 2 element tuples (suchas x and y coordinates), how do I sum the tuples so that they only sum the respective elements (sum(x), sum(y)).

Example:

1	[(1,2), (3,4), (5,6), (7,8)] => (16, 20)

For the following solutions, we will give that

1	tuple_list = [(1,2), (3,4), (5,6), (7,8)]

and for performance testing:

1	long_tuple_list = [[(1,2), (3,4), ..., (999, 1000)]]

All code and tests will be run in python 3.6.

Beginner Solution

So the Python101 type solution would look a little something like this:

def sum_tuples(tuples):
    x = 0
    y = 0
    for tup in tuples:
        x += tup[0]
        y += tup[1]
    return x, y
    
sum_tuples(tuple_list) // == (16,20)

Our results from %timeit with jupyter notebook as as follows:

1 2	> %timeit sum_tuples(long_tuple_list) 337 µs ± 11.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Proposed Solution

The solution my coworker came up with went something like

def sum_tuples(tuples):
    return tuple([sum(x) for x in zip(*tuples)])
    
sum_tuples(tuple_list) //  == (16,20)

This is pretty nice, its a one liner that makes good use of builtin zip and uses, sum. This performs nicely due to the list comprehension:

1 2	%timeit sum_tuples(long_tuple_list) 166 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

However there is an unnecessary list casting in here since we want to return a tuple, so we can refactor this to:

def sum_tuples(tuples):
    return tuple(sum(x) for x in zip(*tuples))
    
sum_tuples(tuple_list) //  == (16,20)

Remove the square brackets to cast directly to tuple results in the same speed but saves us two characters and an operation.

Map Solution

After falling in love with functional programming, my first gut reaction to this was to use map:

def sum_tuples(tuples):
    return tuple(map(sum, zip(*tuples)))
    
sum_tuples(tuple_list) //  == (16,20)

This is probably my favorite. One liner, all builtin keyword methods. It replaces the comprehension with map and performs a hair faster too:

1 2	%timeit sum_tuples(long_tuple_list) 158 µs ± 6.41 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Reduce Solution

In functional programming, this type of problem where you go from many to one is often handled with a reducer. Since python3, reduce is no longer a builtin keyword, but it is still available via functools:

The initial reduce with a lambda looks like:

from functools import reduce

def sum_tuples(tuples):
    return tuple(reduce(lambda x, y: map(sum, zip(x, y)), tuples))
    
sum_tuples(tuple_list) //  == (16,20)

So to me, this looks weird. I don’t like the lambda anonymous function, we are calling map anyways inside, and the whole thing is a bit hard to read. Also due to the anonymous lambda, the thing is slow:

1 2	%timeit sum_tuples(long_tuple_list) 2.33 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

But lets not count reduce dead yet, we can clean up that anonymous function with a named one:

from functools import reduce

def matrix_add(x, y):
    return tuple(map(sum, zip(x, y)))
    
def sum_tuples(tuples):
    return tuple(reduce(matrix_add, tuples)
    
sum_tuples(tuple_list) //  == (16,20)

This to me looks like the best functional approach as far as signal to noise. It defines a function on how to add two tuples together, then uses reduce to reduce the list to a single result. The code reads exactly as it does. However since reduce is no longer first class, it’s performance is pretty similar to the above results.

Numpy

To be complete, we could use numpy to do matrix addition to get our answer. *Note, numpy is huge. According to pypi, its between 4MB and 12MB depending on your operating system. This is the equvalent of killing a fly with a sledge hammer.

from numpy import array

def sum_tuples(tuples):
     return tuple(sum(array(tup) for tup in tuples))
     
sum_tuples(tuple_list) //  == (16,20)

Numpy arrays do matrix math, so adding them will result the behavior that we want. However, the cast to array comes with a cost:

1 2	%timeit sum_tuples(long_tuple_list) 6.73 ms ± 559 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

OOP Solution

Again for completeness, we can look at an OOP solution to this problem.

class Mtuple(tuple):
    def __add__(self, other):
        return Mtuple(x + y for (x,y) in zip(self, other))
        
def sum_tuples(tuples):
    mtuples = [Mtuple(tup) for tup in tuples]
    return sum(mtuples, Mtuple((0,0)))
    
sum_tuples(tuple_list) //  == (16,20)

So here we cast all of our tuples to a new class, Mtuple, which defines a __add__ function. This allows us to override the behavior of + or sum on these objects. We then can simply sum the list of Mtuple, given we give sum a starting value of Mtuple((0,0))). However we run into the same problem with casting as we did with numpy:

1 2	%timeit sum_tuples(long_tuple_list) 5.45 ms ± 429 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Conclusion

While I like how clean reduce looks, due to it no longer being “pythonic”, I would have to say the best way to solve this problem is either using a comprension or using map. Personally I think map(sum, zip(*tuple_list)) is the best because it is lazily evaluated and can be further composed.