predicate

MC706.io

As a one-day project during quarantine, I set out to create a predicate building library that follows the design of django’s queryset filtering.

For those who have never used django before, the ORM provides python classes that shadow your database. To get records out of the table, you can put a number of filters on the queryset that will build a SQL statement to get your the filtered results.

It allows for complex queries, even spanning relationships like:

1
2
3
4
5
Entry.objects.filter(
date__gte=datetime(2020,01,01),
title__startswith="Python",
author__name__last='McDevitt'
)

Now this makes these queries not only easy to write, but very easy to read and update especially relative to their SQL counterparts. The ORM is one of the reasons I stuck with python and Django for so long and still develop projects in them today.

Building Python Native Predicates

Previously I have built functional-pipeline which was meant to make building python native pipelines for data processing easier. In that library, I give a few examples of building a chain of filter functions to buildup filter down a set of data. This works nicely in python because filter as of python 3 is lazy and wont give results until asked. This means a chain of filter functions in python acts the same way as the composition of those predicate functions, and performs relatively similarly in time and memory.

After having mixed success of introducing albeit easy to ready, but relatively non-python syntax at work, I found us falling back on manually creating chains of filters. They go something along the lines of:

1
2
3
4
5
all_data = get_data_from_nosql_db()
windowed_before = filter(lambda x: x.get('created') < datetime(2020, 05, 01), all_data)
only_type_a = filter(lambda x: x.get('type) == 'a', windowed_before)
by_account = filter('lambda x: x.get('account') == '123', only_type_a)
values = {record.value for record in by_account}

Now this works. It is semi-easy to read. It is pretty clear what we are doing. Performance wise, it takes advantage of the laziness. There are a lot of lambdas which does hurt readbality but we could refactor this by putting all of the predicates together into a single named function:

1
2
3
4
5
6

def account_type_a_before(record):
return record.get('account') == '123' and record.get('created') < datetime(2020, 05, 01) and record.get('type') == 'a'

all_data = get_data_from_nosql_db()
values = {record.value for record in filter(account_type_a_before, all_data)}

And this is a really reasonable refactoring that cleans up a lot of the issues. In fact it is probably cleaner than if we had used functional-pipeline

1
2
3
4
5
6
7
8
9
10
11
12
13
from functional_pipeline import pipeline
from operator import itemgetter

values = pipeline(
get_data_from_nosql_db(),
[
(filter, lambda x: x.get('created') < datetime(2020, 05, 01)),
(filter, lambda x: x.get('type') == 'a'),
(filter, lambda x: x.get('account') == '123')
(map, itemgetter('value'),
set,
]
)

So with functional pipeline, it is more lines, but the lines are organized in a way that I feel is more conducive to reading and understanding what is going on.

However, this can seem a bit magical to beginners and scary to those who have never worked in functional langauges before. Reflecting back on trying to make predicate functions more accessible for beginners, I remembered my time using the django orm.

So I brainstormed one evening and decided to take a day to take a shot at building a predicate library to make types of filtering done above easier for beginners.

The result is predicate. It simplifes the above to something like:

1
2
3
4
5
6
7
8
9
10
11
12
from predicate import where

all_data = get_data_from_nosql_db()
filtered = filter(
where(
created__gt=datetime(2020, 05, 01),
type='a',
account='123'
),
all_data
)
values = {record.value for record in filtered}

The intention of predicate is to make writing complex filter predicates as easy as it is to use the django orm.

A few other things I built into predicate,

  • ability to dig through nested data structures to do a comparison
  • ability to pass custom comparator functions
  • ability to select a combinator to combine predicate results
  • ability to pass custom combinators to define more complex logical systems

Building Functionally

So naturally, when writing it, I wrote the library in as much of a functional style as I could. It actually ended up turing out extremely minimal. I ended up removing most of the if else logic and only have one or two ternary assignments in there.

Removing comments and compressing the whole project boils down to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
Predicate = Callable[[A], bool]

def where(**kwargs) -> Predicate:
combinator = kwargs.pop('combinator', 'and')
predicates = kwargs_to_predicates(kwargs)
return combine(combinator, predicates)

def kwargs_to_predicates(key_word_arguments_dict: Dict[str, Any]) -> List[Predicate]:
return [_process_predicate(key, value) for key, value in key_word_arguments_dict.items()]

def _process_predicate(predicate_field: str, value: Any) -> Predicate:
attribute_accessor, comparator = _process_predicate_field(predicate_field)
return compose(partial(flip(COMPARATORS[comparator]), value), attribute_accessor)

def _process_predicate_field(field: str) -> Tuple[Callable[[Any], Optional[Any]], Comparator]:
parts = field.split("__")
*init, last = parts
maybe_comparator = comparator_from_string(last)
return (safe_lens(init), maybe_comparator) if maybe_comparator is not None else (safe_lens(parts), Comparator.EQ)

@unique
class Comparator(Enum):
EQ = 'eq'
GT = 'gt'
GTE = 'gte'
LT = 'lt'
LTE = 'lte'
NE = 'ne'
FUNC = 'func'
CONTAINS = 'contains'
STARTSWITH = 'startswith'
ENDSWITH = 'endswith'
EXACT = 'exact'
IEXACT = 'iexact'
IN = 'in'
TYPE = 'type'

def comparator_from_string(string: str) -> Optional[Comparator]:
try:
return Comparator(string)
except ValueError:
return None

COMPARATORS = {
Comparator.EQ: operator.eq,
Comparator.GT: operator.gt,
Comparator.GTE: operator.ge,
Comparator.LT: operator.lt,
Comparator.LTE: operator.le,
Comparator.NE: operator.ne,
Comparator.FUNC: apply_to,
Comparator.CONTAINS: contains,
Comparator.STARTSWITH: startswith,
Comparator.ENDSWITH: endswith,
Comparator.EXACT: operator.eq,
Comparator.IEXACT: iexact,
Comparator.IN: flip(contains),
Comparator.TYPE: isinstance,
}

class Combinator(Enum):
AND = 'and'
OR = 'or'
NONE = 'none'
NAND = 'nand'


def combinator_from_string(string: str) -> Combinator:
try:
return Combinator(string)
except ValueError:
raise ValueError(f'{string} is not a valid argument for combinator.')


COMBINATORS: Dict[Combinator, Callable] = {
Combinator.AND: all,
Combinator.OR: any,
Combinator.NONE: compose(operator.not_, any),
Combinator.NAND: compose(operator.not_, all),
}


def combine(combinator: Union[str, Callable], predicates: List[Predicate]) -> Predicate:
combinator_func: Predicate = combinator if callable(combinator) else COMBINATORS[combinator_from_string(combinator)]

def _inner(arg: Any) -> bool:
return combinator_func(map(partial(apply_to, arg), predicates))

return _inner

def safe_lens(path: List[str]) -> Callable[[Any], Optional[Any]]:
def lens(target):
_data = deepcopy(target)
keypath = path[:]
while keypath and _data is not None:
k = keypath.pop(0)
if isinstance(_data, dict):
_data = _data.get(str(k), None)
else:
_data = getattr(_data, str(k), None)
return _data

return lens

Writing this project to be a lightweight parser taught me a bunch of things. A) adding to a grammar to extend a language is extremely easy and powerful. 2 enums and their associated dictionary mapping control the entirety of the functionality. As one of the first enhancements I added the NAND combinator, and it took 2 lines of code to do so.

Overall it was a real fun project to work on, and I hope someone finds use in it.