I am trying to efficiently compute a summation of a summation in Python:
WolframAlpha is able to compute it too a high n value: sum of sum.
I have two approaches: a for loop method and an np.sum method. I thought the np.sum approach would be faster. However, they are the same until a large n, after which the np.sum has overflow errors and gives the wrong result.
I am trying to find the fastest way to compute this sum.
import numpy as np
import time
def summation(start,end,func):
sum=0
for i in range(start,end+1):
sum+=func(i)
return sum
def x(y):
return y
def x2(y):
return y**2
def mysum(y):
return x2(y)*summation(0, y, x)
n=100
# method #1
start=time.time()
summation(0,n,mysum)
print('Slow method:',time.time()-start)
# method #2
start=time.time()
w=np.arange(0,n+1)
(w**2*np.cumsum(w)).sum()
print('Fast method:',time.time()-start)
Here's a very fast way:
result = ((((12 * n + 45) * n + 50) * n + 15) * n - 2) * n // 120
How I got there:
x*(x+1)//2
. So the whole thing becomes sum(x**2 * x*(x+1)//2 for x in range(n+1))
.sum(x**4 + x**3 for x in range(n+1)) // 2
.sum(x**4)
and sum(x**3)
.(12*n**5 + 45*n**4 + 50*n**3 + 15*n**2 - 2*n) // 120
.Another way to derive it if after steps 1. and 2. you know it's a polynomial of degree 5:
A
is left-right mirrored compared to that, and I called my y-vector b
.Code:
from fractions import Fraction
import math
from functools import reduce
def naive(n):
return sum(x**2 * sum(range(x+1)) for x in range(n+1))
def lcm(ints):
return reduce(lambda r, i: r * i // math.gcd(r, i), ints)
def polynomial(xys):
xs, ys = zip(*xys)
n = len(xs)
A = [[Fraction(x**i) for i in range(n)] for x in xs]
b = list(ys)
for _ in range(2):
for i0 in range(n):
for i in range(i0 + 1, n):
f = A[i][i0] / A[i0][i0]
for j in range(i0, n):
A[i][j] -= f * A[i0][j]
b[i] -= f * b[i0]
A = [row[::-1] for row in A[::-1]]
b.reverse()
coeffs = [b[i] / A[i][i] for i in range(n)]
denominator = lcm(c.denominator for c in coeffs)
coeffs = [int(c * denominator) for c in coeffs]
horner = str(coeffs[-1])
for c in coeffs[-2::-1]:
horner += ' * n'
if c:
horner = f"({horner} {'+' if c > 0 else '-'} {abs(c)})"
return f'{horner} // {denominator}'
print(polynomial((x, naive(x)) for x in range(6)))
Output (Try it online!):
((((12 * n + 45) * n + 50) * n + 15) * n - 2) * n // 120
(fastest methods, 3 and 4, are at the end)
In a fast NumPy method you need to specify dtype=np.object
so that NumPy does not convert Python int
to its own dtypes (np.int64
or others). It will now give you correct results (checked it up to N=100000).
# method #2
start=time.time()
w=np.arange(0, n+1, dtype=np.object)
result2 = (w**2*np.cumsum(w)).sum()
print('Fast method:', time.time()-start)
Your fast solution is significantly faster than the slow one. Yes, for large N's, but already at N=100 it is like 8 times faster:
start=time.time()
for i in range(100):
result1 = summation(0, n, mysum)
print('Slow method:', time.time()-start)
# method #2
start=time.time()
for i in range(100):
w=np.arange(0, n+1, dtype=np.object)
result2 = (w**2*np.cumsum(w)).sum()
print('Fast method:', time.time()-start)
Slow method: 0.06906533241271973
Fast method: 0.008007287979125977
EDIT: Even faster method (by KellyBundy, the Pumpkin) is by using pure python. Turns out NumPy has no advantage here, because it has no vectorized code for np.objects
.
# method #3
import itertools
start=time.time()
for i in range(100):
result3 = sum(x*x * ysum for x, ysum in enumerate(itertools.accumulate(range(n+1))))
print('Faster, pure python:', (time.time()-start))
Faster, pure python: 0.0009944438934326172
EDIT2: Forss noticed that numpy fast method can be optimized by using x*x
instead of x**2
. For N > 200
it is faster than pure Python method. For N < 200
it is slower than pure Python method (the exact value of boundary may depend on machine, on mine it was 200, its best to check it yourself):
# method #4
start=time.time()
for i in range(100):
w = np.arange(0, n+1, dtype=np.object)
result2 = (w*w*np.cumsum(w)).sum()
print('Fast method x*x:', time.time()-start)
Comparing Python with WolframAlpha like that is unfair, since Wolfram will simplify the equation before computing.
Fortunately, the Python ecosystem knows no limits, so you can use SymPy:
from sympy import summation
from sympy import symbols
n, x, y = symbols("n,x,y")
eq = summation(x ** 2 * summation(y, (y, 0, x)), (x, 0, n))
eq.evalf(subs={"n": 1000})
It will compute the expected result almost instantly: 100375416791650
. This is because SymPy simplifies the equation for you, just like Wolfram does. See the value of eq
:
@Kelly Bundy's answer is awesome, but if you are like me and use a calculator to compute 2 + 2
, then you will love SymPy ❤. As you can see, it gets you to the same results with just 3 lines of code and is a solution that would also work for other more complex cases.