I was wondering if there was an easy solution to the the following problem. The problem here is that I want to keep every element occurring inside this list after the initial condition is true. The condition here being that I want to remove everything before the condition that a value is greater than 18 is true, but keep everything after. Example

Input:

```
p = [4,9,10,4,20,13,29,3,39]
```

Expected output:

```
p = [20,13,29,3,39]
```

I know that you can filter over the entire list through

```
[x for x in p if x>18]
```

But I want to stop this operation once the first value above 18 is found, and then include the rest of the values regardless if they satisfy the condition or not. It seems like an easy problem but I haven't found the solution to it yet.

·
Santiago Trujillo

You can use `itertools.dropwhile`

:

```
from itertools import dropwhile
p = [4,9,10,4,20,13,29,3,39]
p = dropwhile(lambda x: x <= 18, p)
print(*p) # 20 13 29 3 39
```

In my opinion, this is arguably the easiest-to-read version. This also corresponds to a common pattern in other functional programming languages, such as `dropWhile (<=18) p`

in Haskell and `p.dropWhile(_ <= 18)`

in Scala.

Alternatively, using walrus operator (only available in python 3.8+):

```
exceeded = False
p = [x for x in p if (exceeded := exceeded or x > 18)]
print(p) # [20, 13, 29, 3, 39]
```

But my guess is that some people don't like this style. In that case, one can do an explicit `for`

loop (ilkkachu's suggestion):

```
for i, x in enumerate(p):
if x > 18:
output = p[i:]
break
else:
output = [] # alternatively just put output = [] before for
```

·
Santiago Trujillo
Report

You could use `enumerate`

and list slicing in a generator expression and `next`

:

```
out = next((p[i:] for i, item in enumerate(p) if item > 18), [])
```

Output:

```
[20, 13, 29, 3, 39]
```

In terms of time, it depends on the data structure. Below plots show the time difference among the answers on here for various lengths of `p`

.

If the original data is a list, then using a generator expression and `next`

(or a loop that breaks when the condition is satisfied) are generally the fastest:

```
import perfplot
import numpy as np
import pandas as pd
import random
import itertools
def it_dropwhile(p):
return list(dropwhile(lambda x: x <= 18, p))
def walrus(p):
exceeded = False
return [x for x in p if (exceeded := exceeded or x > 18)]
def explicit_loop(p):
for i, x in enumerate(p):
if x > 18:
output = p[i:]
break
else:
output = []
return output
def genexpr_next(p):
return next((p[i:] for i, item in enumerate(p) if item > 18), [])
def np_argmax(p):
return p[(np.array(p) > 18).argmax():]
def pd_idxmax(p):
s = pd.Series(p)
return s[s.gt(18).idxmax():]
perfplot.show(
setup=lambda n: random.choices(range(0, 15), k=10*n) + random.choices(range(-20,30), k=10*n),
kernels=[it_dropwhile, walrus, explicit_loop, genexpr_next, np_argmax, pd_idxmax],
labels=['it_dropwhile','walrus','explicit_loop','genexpr_next','np_argmax','pd_idxmax'],
n_range=[2 ** k for k in range(17)],
equality_check=np.allclose,
xlabel='~10*n'
)
```

But if the initial data are ndarray objects, then the vectorized operations in numpy or pandas are better for large arrays:

```
perfplot.show(
setup=lambda n: np.hstack([np.random.randint(0,15,10*n), np.random.randint(-20,30,10*n)]),
kernels=[it_dropwhile, walrus, explicit_loop, genexpr_next, np_argmax, pd_idxmax],
labels=['it_dropwhile','walrus','explicit_loop','genexpr_next','np_argmax','pd_idxmax'],
n_range=[2 ** k for k in range(17)],
equality_check=np.allclose,
xlabel='~10*n'
)
```

·
Santiago Trujillo
Report

Great solutions here, just wanted to demonstrate how to do it with numpy:

```
>>> import numpy as np
>>> p[(np.array(p) > 18).argmax():]
[20, 13, 29, 3, 39]
```

Since there are a lot of nice answers here, I decided to run some simple benchmarks. The first one uses the OP's sample array (`[4,9,10,4,20,13,29,3,39]`

) of length 9. The second uses randomly generated array of length 20 thousand, where the first half is between 0 and 15, and the second half is between -20 and 30 (so that the split wouldn't occur right in the center).

Using the OP's data (array of length 9):

```
%timeit enke()
650 ns ± 15.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit j1lee1()
546 ns ± 4.22 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit j1lee2()
551 ns ± 19 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit j2lee3()
536 ns ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit richardec()
2.08 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```

Using an array of length 20,000 (20 thousand):

```
%timeit enke()
1.5 ms ± 34.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit j1lee1()
1.95 ms ± 43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit j1lee2()
2.1 ms ± 53.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit j2lee3()
2.33 ms ± 96.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit richardec()
13.3 µs ± 461 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```

Code to generate second array:

```
p = np.hstack([np.random.randint(0,15,10000),np.random.randint(-20,30,10000)])
```

So, for the small case, numpy is a slug and not needed. But the large case, numpy is almost 100x times faster and the way to go! :)

·
Santiago Trujillo
Report

I noticed the OP mention under an answer that `p`

is actually a Pandas DataFrame. Here is a method of filtering all elements up to the first instance of a number greater than 18 using Pandas:

```
import pandas as pd
df = pd.DataFrame([4,9,10,4,20,13,29,3,39])
df = df[df[0].gt(18).idxmax():]
print(df)
```

Outputs:

```
0
4 20
5 13
6 29
7 3
8 39
```

Note: I'm blind to the actual structure of your DataFrame so I just used exactly what was given.

·
Santiago Trujillo
Report