The right way to loop in Python (codes included)
Introduction
Since, Python by itself is slow, it becomes import to know the nitty-gritty of different components of our code to efficienty code. In this post, we will look into most common ways we loop in Python using a simple summing example. We will also compute the memory profile to inspect which way is the most memory efficient for analyzing huge datasets.
Key idea — push the loop out of pure Python. Every iteration of a hand-written while or for loop runs inside the Python interpreter, which is what makes it slow. The two fast options below both move the repeated work into compiled C: the builtin sum() iterates in C, and numpy operates on a whole array at once (vectorization). The catch is memory — numpy materializes the full array, so it is the fastest and the hungriest. For summing a range of numbers, the right question is not “for vs while” but “can I avoid the explicit loop altogether?”
The while loop
import timeit
import numpy as np
nval = 1000000
# usual while loop
def while_loop(n=nval):
i, sumval = 0, 0
while i < n:
sumval += 1
i += 1
return sumval
if __name__ == "__main__":
print(
f"while_loop: {timeit.timeit(while_loop, number = 10):.6f}s")
This returns while_loop: 0.727578s. We can also do the memory profiling of this function.
import timeit
import numpy as np
from memory_profiler import profile
nval = 1000000
# usual while loop
@profile(precision=4)
def while_loop(n=nval):
i, sumval = 0, 0
while i < n:
sumval += 1
i += 1
return sumval
if __name__ == "__main__":
while_loop()
This returns:
Line # Mem usage Increment Occurences Line Contents
============================================================
10 25.8984 MiB 25.8984 MiB 1 @profile(precision=4)
11 def while_loop(n=nval):
12 25.8984 MiB 0.0000 MiB 1 i, sumval = 0, 0
13 25.9727 MiB 0.0000 MiB 1000001 while i < n:
14 25.9727 MiB 0.0625 MiB 1000000 sumval += 1
15 25.9727 MiB 0.0117 MiB 1000000 i += 1
16
17 25.9727 MiB 0.0000 MiB 1 return sumval
In total, the while loop took 0.0743Mb of the memory usage for the above task.
On memory_profiler in 2026. The @profile decorator comes from the memory_profiler package (pip install memory_profiler). It still installs and runs on current Python, but development is quiet — the last release, 0.61.0, dates to November 2022. If you only need peak/current memory of a block and want a maintained, dependency-free option, Python’s standard-library tracemalloc is a good alternative. The line-by-line report below is still memory_profiler’s strength.
The for loop
import timeit
import numpy as np
nval = 1000000
# usual for loop
def for_loop(n=nval):
sumval = 0
for i in range(n):
sumval += i
return sumval
if __name__ == "__main__":
print(
f"for_loop: {timeit.timeit(for_loop, number = 10):.6f}s")
This returns for_loop: 0.490051s. Now, we do the memory profiling of this function.
Line # Mem usage Increment Occurences Line Contents
============================================================
22 25.9922 MiB 25.9922 MiB 1 @profile(precision=4)
23 def for_loop(n=nval):
24 25.9922 MiB 0.0000 MiB 1 sumval = 0
25 26.0273 MiB 0.0117 MiB 1000001 for i in range(n):
26 26.0273 MiB 0.0234 MiB 1000000 sumval += i
27 26.0273 MiB 0.0000 MiB 1 return sumval
In total, the for loop took 0.0351Mb of the memory usage for the above task.
The builtin python function
import timeit
import numpy as np
nval = 1000000
# using built in sum
def builtinsum(n=nval):
return sum(range(n))
if __name__ == "__main__":
print(
f"builtinsum: {timeit.timeit(builtinsum, number = 10):.6f}s")
This returns builtinsum: 0.175238s.
Line # Mem usage Increment Occurences Line Contents
============================================================
46 25.8867 MiB 25.8867 MiB 1 @profile(precision=4)
47 def builtinsum(n=nval):
48 25.8906 MiB 0.0039 MiB 1 return sum(range(n))
In total, the “builtin function” based function took 0.0039Mb of the memory usage for the above task.
The numpy function
import timeit
import numpy as np
nval = 1000000
# using numpy sum
def numpysum(n=nval):
return np.sum(np.arange(n))
if __name__ == "__main__":
print(
f"numpysum: {timeit.timeit(numpysum, number = 10):.6f}s")
This returns numpysum: 0.017640s.
Line # Mem usage Increment Occurences Line Contents
============================================================
53 25.9766 MiB 25.9766 MiB 1 @profile(precision=4)
54 def numpysum(n=nval):
55 33.6172 MiB 7.6406 MiB 1 return np.sum(np.arange(n))
In total, the numpy based function took 7.6407Mb of the memory usage for the above task.
Conclusions
Please note that these values of run time and memory usage may differ from system to system but the ratio of these values between different methods will stay very similar.
We found that the numpy is fastest (0.017640s) and while loop sum is the slowest (0.727578s). The reason for the while loop to be slow is that each step of the task is completed in the native Python. Since numpy is written in C, it runs quite fast.
In terms of the memory usage, the numpy is the worst. It took ~7Mb of the memory usage. In contrast, the “builtin python function” based function is the most memory efficient as it does not store all the data into memory but does it in steps.
If we compare the while and for loop, then for loop is fast and also more memory efficient. Hence, for loop should always be our first choice (and usually is) unless we don’t know the total number of runs.
Quick check: You need to sum a very large range and your machine is tight on RAM (not on time). Which option is the best fit?
np.sum(np.arange(n))— it is the fastestsum(range(n))— fast and the most memory-efficient- A hand-written
whileloop — it uses the least memory - A hand-written
forloop — it is always the safest choice
Recap
- A hand-written
while/forloop runs every step in the Python interpreter, so it is the slowest way to aggregate numbers. - The builtin
sum(range(n))iterates in C: it was ~4× faster than thewhileloop here and used the least memory because it never stores the whole sequence. numpywas the fastest (~40× thewhileloop) but built the entire array in memory, making it the hungriest — great when you are time-bound and the array fits in RAM.- Between the two hand-written loops,
forbeatwhileon both speed and memory, so reach forforunless you genuinely do not know the iteration count in advance. - Rule of thumb: avoid the explicit loop for numeric aggregation; pick builtin vs vectorized based on whether you are memory-bound or time-bound.
Where to go next
- Introduction to scientific computing using NumPy — how vectorization replaces loops for array math.
- Speed up your codes by parallel computing in Python — the next lever when a single core is not enough.
- Handling huge data files with pandas — memory-aware processing of real datasets.
References
Disclaimer of liability
The information provided by the Earth Inversion is made available for educational purposes only.
Whilst we endeavor to keep the information up-to-date and correct. Earth Inversion makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services or related graphics content on the website for any purpose.
UNDER NO CIRCUMSTANCE SHALL WE HAVE ANY LIABILITY TO YOU FOR ANY LOSS OR DAMAGE OF ANY KIND INCURRED AS A RESULT OF THE USE OF THE SITE OR RELIANCE ON ANY INFORMATION PROVIDED ON THE SITE. ANY RELIANCE YOU PLACED ON SUCH MATERIAL IS THEREFORE STRICTLY AT YOUR OWN RISK.
Leave a comment