Understanding Docker: A Beginner’s Guide for Geophysics Students

Utpal Kumar 4 minute read DATASCIENCE December 01, 2024

Docker packages your code, dependencies, and runtime settings into a portable unit called a container. For geophysics students, that means running the same workflow across your laptop, a lab machine, and a cloud server without the classic “but it works on my machine” headache. This post introduces the core Docker concepts and shows a practical setup for scientific analysis.

The one mental model

Three words, one pipeline:

Dockerfile (the recipe) → build → Image (the read-only template) → run → Container (the running instance).

An image is frozen and shareable; a container is a live process started from it. Get this distinction and the rest of Docker falls into place.

Why Docker is useful in geophysics

Geophysics workflows often depend on many tools and libraries (for example numpy, scipy, matplotlib, obspy, and system packages). Managing these manually across systems can be fragile.

Docker helps by:

Keeping environments reproducible.
Making collaboration easier.
Simplifying deployment on servers or HPC gateways.
Reducing dependency conflicts.

Core concepts

The five core objects: a Dockerfile builds an image, which runs as a container; registries share images, and volumes keep your data on the host.

Image — a read-only template that contains your application and dependencies. Container — a running instance of an image. Dockerfile — a text file with instructions to build an image. Volume — lets containers read/write persistent data outside the container lifecycle. Registry — (for example Docker Hub) stores and distributes images.

Check your understanding

You build geophysics-lab:1.0 once and then docker run it three times. How many images and containers is that?

Install Docker

Install Docker Desktop (Windows/macOS) or Docker Engine (Linux), then verify:

docker --version
docker run hello-world

If hello-world runs successfully, your setup is ready.

Basic commands you should know

# Pull an image
docker pull python:3.11-slim

# List local images
docker images

# Run a temporary container
docker run --rm -it python:3.11-slim bash

# List running containers
docker ps

# List all containers
docker ps -a

# Remove a container
docker rm <container_id>

Build your first geophysics image

Create a folder and add this Dockerfile:

FROM python:3.11-slim

WORKDIR /workspace

RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --no-cache-dir numpy scipy matplotlib obspy pandas

CMD ["python", "--version"]

Build the image:

docker build -t geophysics-lab:1.0 .

Run it interactively:

docker run --rm -it geophysics-lab:1.0 bash

Work with local data using volumes

Bind your local directory so scripts and data stay on your machine:

docker run --rm -it \
  -v "$(pwd)":/workspace \
  -w /workspace \
  geophysics-lab:1.0 \
  python your_script.py

This keeps your workflows reproducible while preserving local files.

Check your understanding

Why mount your seismic data as a volume instead of copying it into the image?

Example: quick seismic trace plot with ObsPy

Create plot_trace.py:

from obspy import read
import matplotlib.pyplot as plt

st = read("example.mseed")
tr = st[0]

plt.figure(figsize=(10, 3))
plt.plot(tr.times(), tr.data, lw=0.9)
plt.xlabel("Time (s)")
plt.ylabel("Amplitude")
plt.title(f"{tr.id} | fs={tr.stats.sampling_rate} Hz")
plt.tight_layout()
plt.savefig("trace_plot.png", dpi=150)
print("Saved: trace_plot.png")

Run:

docker run --rm -it \
  -v "$(pwd)":/workspace \
  -w /workspace \
  geophysics-lab:1.0 \
  python plot_trace.py

Because the current directory is mounted as a volume, example.mseed is read from your machine and trace_plot.png is written straight back to it — the container is disposable, your data isn’t.

Best practices for students

Pin versions for critical libraries.
Keep Dockerfiles small and readable.
Use .dockerignore to avoid copying large unnecessary files.
Mount data as volumes instead of baking raw datasets into images.
Tag images with meaningful versions (1.0, 1.1, 2024-12).

Common pitfalls

Permission mismatches between host and container files.
Large image sizes due to unnecessary packages.
Forgetting to persist outputs using mounted directories.
Using latest tags everywhere, which hurts reproducibility.

The latest trap: latest isn’t a “newest” guarantee — it’s just the default tag, and whatever it points to can change under you. For reproducible science, pin an explicit tag (geophysics-lab:1.0, python:3.11-slim) so a rerun next year uses the same environment you validated today.

Recap

Without scrolling up — can you trace the pipeline? Docker gives you:

A Dockerfile recipe that builds into a read-only image,
which you run as one or many disposable containers,
sharing images through a registry and keeping data on the host with volumes,
all pinned to explicit tags so the environment is reproducible.

Once you containerize your analysis environment, the same workflow runs consistently on your laptop, a lab machine, or the cloud — one of the fastest ways to make student projects reproducible and shareable.

Where to go next

Docker’s official getting-started guide — images, containers, and Compose.
ObsPy documentation — the seismology toolkit used in the example.
Related post here: The Impact of Cloud Computing on Geophysical and Seismological Research — where these containers go to scale.

Disclaimer of liability

The information provided by the Earth Inversion is made available for educational purposes only.

Whilst we endeavor to keep the information up-to-date and correct. Earth Inversion makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services or related graphics content on the website for any purpose.

UNDER NO CIRCUMSTANCE SHALL WE HAVE ANY LIABILITY TO YOU FOR ANY LOSS OR DAMAGE OF ANY KIND INCURRED AS A RESULT OF THE USE OF THE SITE OR RELIANCE ON ANY INFORMATION PROVIDED ON THE SITE. ANY RELIANCE YOU PLACED ON SUCH MATERIAL IS THEREFORE STRICTLY AT YOUR OWN RISK.

Subscribe to our weekly newsletter

Why Docker is useful in geophysics

Core concepts

Install Docker

Basic commands you should know

Build your first geophysics image

Work with local data using volumes

Example: quick seismic trace plot with ObsPy

Best practices for students

Common pitfalls

Recap

Where to go next

Disclaimer of liability

Leave a comment