Functional Python
Recently for a weekend project, I built a simple python script that looked up the current price of a cryptocurrency, looked up the current spot prices of AWS EC2 instances, and calculated if any of mining scenarios with a spot fleet were cheaper than buying the coins on an exchange. For those interested in the answer, the result is basically that it only gets close, it never gets profitable.
As an exercise on the programming half, since the rest of the problem was rather simple, I wanted to stretch my functional programming muscles and see how far I could push python into functional programming and find out what that would look like.
Starting point.
So to start out, here is what some of that original code would look like:
Calculating Profitability
For those of you who dont know, the simplified version of how to calculate payout for a cryptocurrency is your contribution to the global hashrate multiplied by the last reward. The rewards move slowly over time based on the global hashrate, but using the last one will get you very close.
To zero in a little bit more on the calculation, you get paid based on your portion of the mining pool you contribute, and the chance that the mining pool gets the reward is the ratio of the mining pools hashrate to the global hashrate. So if the global hashrate of a crypto is 100MH/s, and your mining pool has a hash rate of 10MH/s, and you are contributing 500KH/s; for a crypto currency that has an average block solve time of 2 minutes, every 2 minutes, your pool has a 1/10 chance of solving the block. When that happens, you get 5% (500KH/10MH) of the reward.
Multiply that out by the reward to get your reward per 2 minute block, then multiply to get reward per hour. Mulptily by the current market price to get the dollars per hour, which can be compared directly to spot prices for profitability.
1 | import requests |
Getting Spot Prices
To get the spot prices, I used boto3. I then looked up the hardware on all of the boxes specialized for GPUs, looked up their hashrates, and multiplied it by the number of GPU’s available on those boxes.
That looked a little bit like:
1 | from boto3 import client |
Putting it together
From here the rest is pretty straightforward, you take the list of prices with their paired hashrates, feed it into the payout per hour, and see if there are any where there payout is greater than the cost.
1 | def main(): |
Refactoring for Functional
At this point I had my answer; no it was never profitable to rent hardward to mine monero. This script was quick and dirty, and in the interest of increasing the ratio of signal to noise in my code, and flexing my functional programming muscles, I started refactoring.
Note, the goal after a little bit shifted from increasing the signal to noise ratio to making it as functional programming style as possible.
Memoization
One of the big things that functional programming touts as one of its pillars is referential transparency. Any function can be replace with the value it would have returned and it should not affect the end result. As a result of this, pure functions can be easily memoized: their evaluations for specific arguemnts can be stored and used in place of calling the function again.
Luckily this is pretty easy to do in a number of ways in python.
The first is sometimes considered a bug, but it has to do with passing a mutable object to the default value of a function. To explore this, lets look at this example that has some unexpected results for beginners:
1 | def add_to_list(value, list=[]): |
Now what most beginners would expect from this, is we default the argument list
to an empty list: so first
should be [1]
and second should b e [2]
.
However this is the output of the above:
1 | [1] |
So what is happening here? Turns out when you pass a mutable data structure to
a default argument in a function, all invocations of the function share the
initial one. Which means all calls of add_to_list
that don’t specify a list,
share the default list.
We can use this oddity to our advantage for memoization:
1 | def do_something_hard(arg, mem={}): |
Or more succinctly using pythons dictionary’s setdefault
:
1 | def do_something_hard(arg, mem={}): |
Now all calls with the same arguments to do_something_hard
will execute without
calling hard_work
. This is extremely helpful for things like API calls, since
calls seconds apart from each other dont necessarily need to be made.
The other way to do memoization in python is with a decorator:
1 | import collections |
Now to implement our do_something_hard
as a memoized function, we can just do:
1 |
|
These two methods can be used to greatly increase the performance of the original code by not pounding the AWS API or Monero API’s. Instead we can memoize all of the values we need and do the calculations we need:
1 |
|
Simplifying API Calls
All of those API calls to get the hashrates, prices, rewards have a lot of
repeated code, and we have a call to get_monero_stats
that needs to different
keys pulled out. So lets cleanup these nested JSON lookups with a functional
tool called lenses
.
A lens
is a function that given a keypath, will return a
function, that given a nested structure, will return the value of that keypath in
the structure.
1 | def lens(keyList: List[str]): |
This would allow us to do something like:
1 | response_json = { |
This allows us to use a known structure, lens
, to clean up some of that lookup
code. At this point I went a little further, after playing the the command line tool
jq
, by allowing me to replace a dot separated path instead of a list, and also
allowing for indexes to be passed:
1 | KeyList = NewType('KeyList', List[Union[str, int]]) |
Here I added a type, KeyList
to make sure lens
cant get anything but what I
expect, a function key_path
that takes a string and returns a KeyList
so I can
pass things like "Body.0.price_usd""
to it if there is a list in the json response.
Armed with memoized functions, lens
, and KeyList
, I can write a generic external
data function:
1 | def get_keypath_from_api(keyList: KeyList, api_url: str, mem={}) -> Callable[[], Any]: |
We didn’t use @memoized
here so I can memoized keyed off of url and not off of
keyList
, allowing multiple keys to be pulled from a single API response without
remaking the call.
This allows our calls to be refactored as so:
1 | current_price = get_keypath_from_api(key_path('0.price_usd'), "https://api.coinmarketcap.com/v1/ticker/monero/") |
Now each of these items can be called (like current_price()
) and the returned value
will either be retrieved from the API or from memory if the call has already been
made. Either way, we can now treat these functions as values, and we are sure
that the minimum amount of work required will be done.
Python’s Partial
Lets readdress our lens
function. Here we have a function of arity 1, that returns
a callable. If we wanted a result of it right away, I would have to call it like
1 | lens('0.price_usd')(data) |
However we have the wraps
inside function making it a bit messy. Ideally we could
define the function do what we want, with all of the arguements, then if we pass
less than all of the arguements, it just returns a function with the arguments given
as partially applied.
Unlike elm, python does not allow this by default, however functools
has a function,
partial
, that does allow us to do this. Lets use it to clean up our lens
.
1 | from functools import partial |
Here we can see that we can more cleanly define our functions and use partial
to provide functional partial application where we need it.
Function Composition
One of the pillars of functional programming is functional composition. With all
pure functions, you can take a chain of nested calls and pass them to compose to
flatten out the structure. For instance instead of f(g(h(1)))
, we would like to
flatten this to compose(f, g, h)(1)
. This makes seeing what is happening a lot
more simple.
To implement a simple compose
(functools
compose only takes 2 functions), we
can build a simple one:
1 | from functools import reduce |
Here we are reducing a list of functions, starting with a simple identity function,
lambda x: x
, then calling the next function with the previous result.
Pipelines
One of my favorite things of elixir and elm are pipelines. Being able to do something like
1 | 1 |
makes the signal to noise ratio very clear. It is very easy to see what is going
on here. If you look carefully, a pipeline is nothing but a reversed compose
,
where instead of the functions appearing in outermost first, they appear in
innermost first.
1 | def pipeline(*functions): |
This makes the order of the functions called a little bit clearer, but we still
get a composed function back, in elm, we can start with a value. So less make an
evaluate_pipeline
, that allows us to pipe a value in:
1 | def evaluate_pipeline(start, *functions): |
This allows us to get pretty close to elm:
1 | evaluate_pipeline( |
Helper Functions and Exceptions
Elm has a Maybe
type, which allows it to handle errors. to get around some pesky
errors, we can use python’s Optional
type which is either something or None
. To
cast functions to this, we can write a function that catches exceptions and returns
None
:
1 | def safe(func, *args) -> Optional[Callable[]]: |
This allows us to write some nice pipeline casting functions like:
1 | toFloat = partial(safe, float) |
Then we can put toFloat
in a pipeline to safely cast a result to float.
In a similar vein, we can write some helper functions for getting indexes:
1 | def index(i, iterable) -> Optional[Any]: |
To make the math bits more functional, I defined the basic operations:
1 | def add (a, b): |
Moving all of our utility functions to their own package, we can convert our original script to a series of composed pipelines:
1 | current_price = get_key_from_api(key_path('0.price_usd'), "https://api.coinmarketcap.com/v1/ticker/monero/") |
With this final version, almost all of the functions are pure, memoized, and composed
of pipelines. Since everything is called via functional composition, everything
returns a generator, which allows the final filter
or things like any
to short
circuit evaluation.
Conclusion
I had a lot of fun pushing python to the extreme of functional programming. Whether
or not this is more readible and provides more a better signal to noise ratio
than the regular procedural python is yet to be seen. Since python doesn’t easily
allow for multi-threading/multip processing, the ability of pure functions to be
easily computed over distributed computing doesn’t provide much of a benefit here.
Guido historically hasn’t been a fan of functional programming (hence why reduce
was relegated to functools
in python3), and the optimizations made are not focused
on this paradigm.