The right way to loop in Python (codes included)

Utpal Kumar   6 minute read      

Introduction

Since, Python by itself is slow, it becomes import to know the nitty-gritty of different components of our code to efficienty code. In this post, we will look into most common ways we loop in Python using a simple summing example. We will also compute the memory profile to inspect which way is the most memory efficient for analyzing huge datasets.

Key idea — push the loop out of pure Python. Every iteration of a hand-written while or for loop runs inside the Python interpreter, which is what makes it slow. The two fast options below both move the repeated work into compiled C: the builtin sum() iterates in C, and numpy operates on a whole array at once (vectorization). The catch is memory — numpy materializes the full array, so it is the fastest and the hungriest. For summing a range of numbers, the right question is not “for vs while” but “can I avoid the explicit loop altogether?”

The right way to loop in Python

The while loop

import timeit
import numpy as np

nval = 1000000

# usual while loop


def while_loop(n=nval):
    i, sumval = 0, 0
    while i < n:
        sumval += 1
        i += 1

    return sumval

if __name__ == "__main__":

    print(
        f"while_loop: {timeit.timeit(while_loop, number = 10):.6f}s")
   

This returns while_loop: 0.727578s. We can also do the memory profiling of this function.

import timeit
import numpy as np
from memory_profiler import profile

nval = 1000000

# usual while loop
@profile(precision=4)
def while_loop(n=nval):
    i, sumval = 0, 0
    while i < n:
        sumval += 1
        i += 1

    return sumval


if __name__ == "__main__":
    while_loop()
  

This returns:

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    10  25.8984 MiB  25.8984 MiB           1   @profile(precision=4)
    11                                         def while_loop(n=nval):
    12  25.8984 MiB   0.0000 MiB           1       i, sumval = 0, 0
    13  25.9727 MiB   0.0000 MiB     1000001       while i < n:
    14  25.9727 MiB   0.0625 MiB     1000000           sumval += 1
    15  25.9727 MiB   0.0117 MiB     1000000           i += 1
    16                                         
    17  25.9727 MiB   0.0000 MiB           1       return sumval

In total, the while loop took 0.0743Mb of the memory usage for the above task.

On memory_profiler in 2026. The @profile decorator comes from the memory_profiler package (pip install memory_profiler). It still installs and runs on current Python, but development is quiet — the last release, 0.61.0, dates to November 2022. If you only need peak/current memory of a block and want a maintained, dependency-free option, Python’s standard-library tracemalloc is a good alternative. The line-by-line report below is still memory_profiler’s strength.

The for loop

import timeit
import numpy as np

nval = 1000000


# usual for loop


def for_loop(n=nval):
    sumval = 0
    for i in range(n):
        sumval += i
    return sumval

if __name__ == "__main__":

    print(
        f"for_loop: {timeit.timeit(for_loop, number = 10):.6f}s")

This returns for_loop: 0.490051s. Now, we do the memory profiling of this function.

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    22  25.9922 MiB  25.9922 MiB           1   @profile(precision=4)
    23                                         def for_loop(n=nval):
    24  25.9922 MiB   0.0000 MiB           1       sumval = 0
    25  26.0273 MiB   0.0117 MiB     1000001       for i in range(n):
    26  26.0273 MiB   0.0234 MiB     1000000           sumval += i
    27  26.0273 MiB   0.0000 MiB           1       return sumval

In total, the for loop took 0.0351Mb of the memory usage for the above task.

The builtin python function

import timeit
import numpy as np

nval = 1000000


# using built in sum
def builtinsum(n=nval):
    return sum(range(n))

if __name__ == "__main__":

    print(
        f"builtinsum: {timeit.timeit(builtinsum, number = 10):.6f}s")

This returns builtinsum: 0.175238s.

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    46  25.8867 MiB  25.8867 MiB           1   @profile(precision=4)
    47                                         def builtinsum(n=nval):
    48  25.8906 MiB   0.0039 MiB           1       return sum(range(n))

In total, the “builtin function” based function took 0.0039Mb of the memory usage for the above task.

The numpy function

import timeit
import numpy as np

nval = 1000000


# using numpy sum
def numpysum(n=nval):
    return np.sum(np.arange(n))

if __name__ == "__main__":

    print(
        f"numpysum: {timeit.timeit(numpysum, number = 10):.6f}s")

This returns numpysum: 0.017640s.

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    53  25.9766 MiB  25.9766 MiB           1   @profile(precision=4)
    54                                         def numpysum(n=nval):
    55  33.6172 MiB   7.6406 MiB           1       return np.sum(np.arange(n))

In total, the numpy based function took 7.6407Mb of the memory usage for the above task.

Conclusions

Four ways to sum a million numbers: speed and memory Time for ten runs of summing zero to one million. The while loop is slowest at 0.728 seconds, the for loop 0.490, the builtin sum 0.175, and numpy 0.018 — about forty times faster than the while loop. But numpy uses the most memory (7.64 MB) while the builtin sum uses the least (0.004 MB). Summing 0…1,000,000 — time for 10 runs (shorter bar = faster) while 0.728 s mem 0.074 MB for 0.490 s mem 0.035 MB sum() 0.175 s ← least memory (0.004 MB) np.sum 0.018 s — fastest (~40× the while loop) but most memory: 7.64 MB Fastest is not always cheapest: numpy wins on speed but pays in memory; the builtin sum() is the balanced pick — fast and the lightest on memory.
Speed and memory for four ways to sum 0…1,000,000 (10 runs each), from this post's own benchmarks.

Please note that these values of run time and memory usage may differ from system to system but the ratio of these values between different methods will stay very similar.

We found that the numpy is fastest (0.017640s) and while loop sum is the slowest (0.727578s). The reason for the while loop to be slow is that each step of the task is completed in the native Python. Since numpy is written in C, it runs quite fast.

In terms of the memory usage, the numpy is the worst. It took ~7Mb of the memory usage. In contrast, the “builtin python function” based function is the most memory efficient as it does not store all the data into memory but does it in steps.

If we compare the while and for loop, then for loop is fast and also more memory efficient. Hence, for loop should always be our first choice (and usually is) unless we don’t know the total number of runs.

Quick check: You need to sum a very large range and your machine is tight on RAM (not on time). Which option is the best fit?

  • np.sum(np.arange(n)) — it is the fastest
  • sum(range(n)) — fast and the most memory-efficient
  • A hand-written while loop — it uses the least memory
  • A hand-written for loop — it is always the safest choice

Recap

  • A hand-written while/for loop runs every step in the Python interpreter, so it is the slowest way to aggregate numbers.
  • The builtin sum(range(n)) iterates in C: it was ~4× faster than the while loop here and used the least memory because it never stores the whole sequence.
  • numpy was the fastest (~40× the while loop) but built the entire array in memory, making it the hungriest — great when you are time-bound and the array fits in RAM.
  • Between the two hand-written loops, for beat while on both speed and memory, so reach for for unless you genuinely do not know the iteration count in advance.
  • Rule of thumb: avoid the explicit loop for numeric aggregation; pick builtin vs vectorized based on whether you are memory-bound or time-bound.

Where to go next

References

  1. The fastest way to loop in Python - An Unfortunate truth

Disclaimer of liability

The information provided by the Earth Inversion is made available for educational purposes only.

Whilst we endeavor to keep the information up-to-date and correct. Earth Inversion makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services or related graphics content on the website for any purpose.

UNDER NO CIRCUMSTANCE SHALL WE HAVE ANY LIABILITY TO YOU FOR ANY LOSS OR DAMAGE OF ANY KIND INCURRED AS A RESULT OF THE USE OF THE SITE OR RELIANCE ON ANY INFORMATION PROVIDED ON THE SITE. ANY RELIANCE YOU PLACED ON SUCH MATERIAL IS THEREFORE STRICTLY AT YOUR OWN RISK.


Leave a comment