I am trying to efficiently compute a summation of a summation in Python:

WolframAlpha is able to compute it too a high n value: sum of sum.

I have two approaches: a *for* loop method and an np.sum method. I thought the np.sum approach would be faster. However, they are the same until a large n, after which the np.sum has overflow errors and gives the wrong result.

I am trying to find the fastest way to compute this sum.

```
import numpy as np
import time
def summation(start,end,func):
sum=0
for i in range(start,end+1):
sum+=func(i)
return sum
def x(y):
return y
def x2(y):
return y**2
def mysum(y):
return x2(y)*summation(0, y, x)
n=100
# method #1
start=time.time()
summation(0,n,mysum)
print('Slow method:',time.time()-start)
# method #2
start=time.time()
w=np.arange(0,n+1)
(w**2*np.cumsum(w)).sum()
print('Fast method:',time.time()-start)
```

·
Santiago Trujillo

Here's a very fast way:

```
result = ((((12 * n + 45) * n + 50) * n + 15) * n - 2) * n // 120
```

How I got there:

- Rewrite the inner sum as the well-known
`x*(x+1)//2`

. So the whole thing becomes`sum(x**2 * x*(x+1)//2 for x in range(n+1))`

. - Rewrite to
`sum(x**4 + x**3 for x in range(n+1)) // 2`

. - Look up formulas for
`sum(x**4)`

and`sum(x**3)`

. - Simplify the resulting mess to
`(12*n**5 + 45*n**4 + 50*n**3 + 15*n**2 - 2*n) // 120`

. - Horner it.

Another way to derive it if after steps 1. and 2. you know it's a polynomial of degree 5:

- Compute six values with a naive implementation.
- Compute the polynomial from the six equations with six unknowns (the polynomial coefficients). I did it similarly to this, but my matrix
`A`

is left-right mirrored compared to that, and I called my y-vector`b`

.

Code:

```
from fractions import Fraction
import math
from functools import reduce
def naive(n):
return sum(x**2 * sum(range(x+1)) for x in range(n+1))
def lcm(ints):
return reduce(lambda r, i: r * i // math.gcd(r, i), ints)
def polynomial(xys):
xs, ys = zip(*xys)
n = len(xs)
A = [[Fraction(x**i) for i in range(n)] for x in xs]
b = list(ys)
for _ in range(2):
for i0 in range(n):
for i in range(i0 + 1, n):
f = A[i][i0] / A[i0][i0]
for j in range(i0, n):
A[i][j] -= f * A[i0][j]
b[i] -= f * b[i0]
A = [row[::-1] for row in A[::-1]]
b.reverse()
coeffs = [b[i] / A[i][i] for i in range(n)]
denominator = lcm(c.denominator for c in coeffs)
coeffs = [int(c * denominator) for c in coeffs]
horner = str(coeffs[-1])
for c in coeffs[-2::-1]:
horner += ' * n'
if c:
horner = f"({horner} {'+' if c > 0 else '-'} {abs(c)})"
return f'{horner} // {denominator}'
print(polynomial((x, naive(x)) for x in range(6)))
```

Output (Try it online!):

```
((((12 * n + 45) * n + 50) * n + 15) * n - 2) * n // 120
```

·
Santiago Trujillo
Denunciar

(fastest methods, 3 and 4, are at the end)

In a fast NumPy method you need to specify `dtype=np.object`

so that NumPy does not convert Python `int`

to its own dtypes (`np.int64`

or others). It will now give you correct results (checked it up to N=100000).

```
# method #2
start=time.time()
w=np.arange(0, n+1, dtype=np.object)
result2 = (w**2*np.cumsum(w)).sum()
print('Fast method:', time.time()-start)
```

Your fast solution is significantly faster than the slow one. Yes, for large N's, but already at N=100 it is like 8 times faster:

```
start=time.time()
for i in range(100):
result1 = summation(0, n, mysum)
print('Slow method:', time.time()-start)
# method #2
start=time.time()
for i in range(100):
w=np.arange(0, n+1, dtype=np.object)
result2 = (w**2*np.cumsum(w)).sum()
print('Fast method:', time.time()-start)
```

```
Slow method: 0.06906533241271973
Fast method: 0.008007287979125977
```

EDIT: Even faster method (by KellyBundy, the Pumpkin) is by using pure python. Turns out NumPy has no advantage here, because it has no vectorized code for `np.objects`

.

```
# method #3
import itertools
start=time.time()
for i in range(100):
result3 = sum(x*x * ysum for x, ysum in enumerate(itertools.accumulate(range(n+1))))
print('Faster, pure python:', (time.time()-start))
```

```
Faster, pure python: 0.0009944438934326172
```

EDIT2: Forss noticed that numpy fast method can be optimized by using `x*x`

instead of `x**2`

. For `N > 200`

it is faster than pure Python method. For `N < 200`

it is slower than pure Python method (the exact value of boundary may depend on machine, on mine it was 200, its best to check it yourself):

```
# method #4
start=time.time()
for i in range(100):
w = np.arange(0, n+1, dtype=np.object)
result2 = (w*w*np.cumsum(w)).sum()
print('Fast method x*x:', time.time()-start)
```

·
Santiago Trujillo
Denunciar

Comparing Python with WolframAlpha like that is unfair, since Wolfram will simplify the equation before computing.

Fortunately, the Python ecosystem knows no limits, so you can use SymPy:

```
from sympy import summation
from sympy import symbols
n, x, y = symbols("n,x,y")
eq = summation(x ** 2 * summation(y, (y, 0, x)), (x, 0, n))
eq.evalf(subs={"n": 1000})
```

It will compute the expected result almost instantly: `100375416791650`

. This is because SymPy simplifies the equation for you, just like Wolfram does. See the value of `eq`

:

@Kelly Bundy's answer is awesome, but if you are like me and use a calculator to compute `2 + 2`

, then you will love SymPy ❤. As you can see, it gets you to the same results with just 3 lines of code and is a solution that would also work for other more complex cases.

·
Santiago Trujillo
Denunciar