Two similar ways to check whether a list contains an odd number:

```
any(x % 2 for x in a)
any(True for x in a if x % 2)
```

Timing results with `a = [0] * 10000000`

(five attempts each, times in seconds):

```
0.60 0.60 0.60 0.61 0.63 any(x % 2 for x in a)
0.36 0.36 0.36 0.37 0.37 any(True for x in a if x % 2)
```

Why is the second way almost twice as fast?

My testing code:

```
from timeit import repeat
setup = 'a = [0] * 10000000'
expressions = [
'any(x % 2 for x in a)',
'any(True for x in a if x % 2)',
]
for expression in expressions:
times = sorted(repeat(expression, setup, number=1))
print(*('%.2f ' % t for t in times), expression)
```

·
Santiago Trujillo

The first method sends everything to `any()`

whilst the second only sends to `any()`

when there's an odd number, so `any()`

has fewer elements to go through.

·
Santiago Trujillo
Report

```
(x % 2 for x in a)
```

This generator produces a series of falsey values until it produces a truthy value (if it does), at which point `any`

will stop iterating the generator and return `True`

.

```
(True for x in a if x % 2)
```

This generator will only produce exactly one `True`

value (if it does), at which point `any`

will stop the iteration and return `True`

.

The additional back and forth of yielding back to `any`

and then fetching the next value from the generator in the first case accounts for the overhead.

·
Santiago Trujillo
Report

TL;DR The slow version has to iterate over a long sequence of false values before returning `False`

. The fast version "iterates" over an empty sequence before doing the same. The difference is the time it takes to construct the long-false sequence vs the empty sequence.

Let's look at the byte code generate by each. I've omitted the first section for each, as they are identical for the both. It's only the code for the generators involved that we need to look at.

```
In [5]: dis.dis('any(x%2 for x in a)')
[...]
Disassembly of <code object <genexpr> at 0x105e860e0, file "<dis>", line 1>:
1 0 LOAD_FAST 0 (.0)
>> 2 FOR_ITER 14 (to 18)
4 STORE_FAST 1 (x)
6 LOAD_FAST 1 (x)
8 LOAD_CONST 0 (2)
10 BINARY_MODULO
12 YIELD_VALUE
14 POP_TOP
16 JUMP_ABSOLUTE 2
>> 18 LOAD_CONST 1 (None)
20 RETURN_VALUE
In [6]: dis.dis('any(True for x in a if x % 2)')
[...]
Disassembly of <code object <genexpr> at 0x105d993a0, file "<dis>", line 1>:
1 0 LOAD_FAST 0 (.0)
>> 2 FOR_ITER 18 (to 22)
4 STORE_FAST 1 (x)
6 LOAD_FAST 1 (x)
8 LOAD_CONST 0 (2)
10 BINARY_MODULO
12 POP_JUMP_IF_FALSE 2
14 LOAD_CONST 1 (True)
16 YIELD_VALUE
18 POP_TOP
20 JUMP_ABSOLUTE 2
>> 22 LOAD_CONST 2 (None)
24 RETURN_VALUE
```

Both are identical up to the `BINARY_MODULO`

instruction. After that, the slower version has to yield the resulting value for `any`

to consume before moving on, while the second code *immediately* moves on to the next value. So basically, the slower code has to consume a long list of false (i.e., non-zero) values to determine that there are no true values. The faster code only needs to consume an empty list.

·
Santiago Trujillo
Report

The previous answers somewhat assume the reader is already familiar with the syntax and generators. I'd like to explain more for people who aren't.

The snippet

```
any(x % 2 for x in a)
```

is short syntax for:

```
any((x % 2 for x in a))
```

So what's happening is that `(x % 2 for x in a)`

gets evaluated and the result value is then given to the `any`

function. Just like `print(21 * 2`

) computes the value 42, which is then given to the `print`

function.

The expression `(x % 2 for x in a)`

is a generator expression and its result is a generator iterator. That is an object that computes and hands out its values on demand. So in this case, when asked for a value, this iterator looks at the next value from `a`

, computes its remainder modulo 2 (i.e., 0 for even and 1 for odd), and hands out that value. And then literally *waits* for possibly getting asked again for another value.

The `any`

function is a *second actor* here. It gets the iterator as its argument, and then asks the iterator for more and more values, hoping for one that's true (note that 1 is true and 0 is false).

You can really think of it as two different persons interacting. The any-guy asking the iterator-guy for values. Again, note that the iterator-guy does *not* compute all values in advance. Only one at a time, whenever the any-guy asks for the next value. So it's really a back-and-forth between the two guys.

In the case of `any(x % 2 for x in a)`

, the iterator-guy, whenever asked by the any-guy for the next value, just computes one modulo value, hands it to the any-guy, and the any-guy has to judge it. Here the iterator-guy is like an incompetent junior developer, involving the manager for every single number, somewhat forcing them to hardcore micro-manage.

In the case of `any(True for x in a if x % 2)`

, the iterator-guy, whenever asked by the any-guy for the next value, doesn't mindlessly hand over just the modulo values. Instead, *this* iterator-guy judges the values himself, and only hands over something to the manager when there's something worthy to hand over. Namely only when he discovers an odd value (in which case he doesn't hand over `0`

or `1`

, but `True`

). Here the iterator-guy is like a competent senior developer doing all the work, and the manager can totally lay back and chill (and at the end of the day still take all the credit).

It should be clear that the second way is much more efficient, as they don't needlessly communicate for every ... single ... input number. Especially since your input `a = [0] * 10000000`

doesn't contain *any* odd numbers. The junior developer reports ten million zeros to the manager who has to judge all of them. With a constant back-and-forth between them for every zero. The senior developer judges all himself and reports *nothing* to his manager. Well, ok, both developers at the end additionally report that they're done, at which point the manager concludes `False`

as the result of the whole `any(...)`

expression).

·
Santiago Trujillo
Report

Number of "checking for falsiness" is not the actual reason because in faster version we can see an `if`

statement which intern calls `bool()`

. That checking is done "in advance" in faster case. So in both cases python has to go through **all values** and check truthiness of **all of them**.

The procedure that showed in Chepner's answer is indeed the answer of the question. Let's find when the next item in for loop can be requested...:

In faster case, it is just after the `BINARY_MODULO`

, **but** in `POP_JUMP_IF_FALSE`

statement it has to do a little bit of work to check the **truthiness**(`if`

calls `bool()`

) while in slower version it doesn't check that there. Up until now (-1) point for faster version. BUT in slower version it has to do three steps to reach the point to request for next item, `YIELD_VALUE`

, `POP_TOP`

, `JUMP_ABSOLUTE`

. So (-3) for slower version... Those three steps causes the overhead because they can not be skipped.

In other words, faster version only does "checking" to reach the point to request for next item but slower version has to do "checking + those steps". Again both of them check for truthiness of all values.

The answer is the overhead of yielding.

·
Santiago Trujillo
Report