As a one-day project during quarantine, I set out to create a predicate building library that follows the design of django’s queryset filtering.
For those who have never used django before, the ORM provides python classes that shadow your database. To get records out of the table, you can put a number of filters on the queryset that will build a SQL statement to get your the filtered results.
It allows for complex queries, even spanning relationships like:
Now this makes these queries not only easy to write, but very easy to read and update
especially relative to their SQL counterparts. The ORM is one of the reasons I stuck
with python and Django for so long and still develop projects in them today.
Building Python Native Predicates
Previously I have built functional-pipeline
which was meant to make building python native pipelines for data processing easier.
In that library, I give a few examples of building a chain of filter functions
to buildup filter down a set of data. This works nicely in python because filter
as of python 3 is lazy and wont give results until asked. This means a chain of
filter functions in python acts the same way as the composition of those predicate
functions, and performs relatively similarly in time and memory.
After having mixed success of introducing albeit easy to ready, but relatively non-python
syntax at work, I found us falling back on manually creating chains of filters. They
go something along the lines of:
Now this works. It is semi-easy to read. It is pretty clear what we are doing.
Performance wise, it takes advantage of the laziness. There are a lot of lambdas
which does hurt readbality but we could refactor this by putting all of the
predicates together into a single named function:
So with functional pipeline, it is more lines, but the lines are organized in a
way that I feel is more conducive to reading and understanding what is going on.
However, this can seem a bit magical to beginners and scary to those who have never
worked in functional langauges before. Reflecting back on trying to make predicate
functions more accessible for beginners, I remembered my time using the django orm.
So I brainstormed one evening and decided to take a day to take a shot at building
a predicate library to make types of filtering done above easier for beginners.
The result is predicate. It simplifes the
above to something like:
1 2 3 4 5 6 7 8 9 10 11 12
from predicate import where
all_data = get_data_from_nosql_db() filtered = filter( where( created__gt=datetime(2020, 05, 01), type='a', account='123' ), all_data ) values = {record.value for record in filtered}
The intention of predicate is to make writing complex filter predicates as easy
as it is to use the django orm.
A few other things I built into predicate,
ability to dig through nested data structures to do a comparison
ability to pass custom comparator functions
ability to select a combinator to combine predicate results
ability to pass custom combinators to define more complex logical systems
Building Functionally
So naturally, when writing it, I wrote the library in as much of a functional style
as I could. It actually ended up turing out extremely minimal. I ended up removing most
of the if else logic and only have one or two ternary assignments in there.
Removing comments and compressing the whole project boils down to:
defkwargs_to_predicates(key_word_arguments_dict: Dict[str, Any]) -> List[Predicate]: return [_process_predicate(key, value) for key, value in key_word_arguments_dict.items()]
classCombinator(Enum): AND = 'and' OR = 'or' NONE = 'none' NAND = 'nand'
defcombinator_from_string(string: str) -> Combinator: try: return Combinator(string) except ValueError: raise ValueError(f'{string} is not a valid argument for combinator.')
defsafe_lens(path: List[str]) -> Callable[[Any], Optional[Any]]: deflens(target): _data = deepcopy(target) keypath = path[:] while keypath and _data isnotNone: k = keypath.pop(0) if isinstance(_data, dict): _data = _data.get(str(k), None) else: _data = getattr(_data, str(k), None) return _data
return lens
Writing this project to be a lightweight parser taught me a bunch of things. A) adding
to a grammar to extend a language is extremely easy and powerful. 2 enums and their
associated dictionary mapping control the entirety of the functionality. As one
of the first enhancements I added the NAND combinator, and it took 2 lines of code
to do so.
Overall it was a real fun project to work on, and I hope someone finds use in it.