Understanding Docker: A Beginner’s Guide for Geophysics Students
Docker packages your code, dependencies, and runtime settings into a portable unit called a container. For geophysics students, that means running the same workflow across your laptop, a lab machine, and a cloud server without the classic “but it works on my machine” headache. This post introduces the core Docker concepts and shows a practical setup for scientific analysis.
The one mental model
Three words, one pipeline:
Dockerfile (the recipe) → build → Image (the read-only template) → run → Container (the running instance).
An image is frozen and shareable; a container is a live process started from it. Get this distinction and the rest of Docker falls into place.
Why Docker is useful in geophysics
Geophysics workflows often depend on many tools and libraries (for example numpy, scipy,
matplotlib, obspy, and system packages). Managing these manually across systems can be fragile.
Docker helps by:
- Keeping environments reproducible.
- Making collaboration easier.
- Simplifying deployment on servers or HPC gateways.
- Reducing dependency conflicts.
Core concepts
Image — a read-only template that contains your application and dependencies. Container — a running instance of an image. Dockerfile — a text file with instructions to build an image. Volume — lets containers read/write persistent data outside the container lifecycle. Registry — (for example Docker Hub) stores and distributes images.
You build geophysics-lab:1.0 once and then docker run it three times. How many images and containers is that?
Install Docker
Install Docker Desktop (Windows/macOS) or Docker Engine (Linux), then verify:
docker --version
docker run hello-world
If hello-world runs successfully, your setup is ready.
Basic commands you should know
# Pull an image
docker pull python:3.11-slim
# List local images
docker images
# Run a temporary container
docker run --rm -it python:3.11-slim bash
# List running containers
docker ps
# List all containers
docker ps -a
# Remove a container
docker rm <container_id>
Build your first geophysics image
Create a folder and add this Dockerfile:
FROM python:3.11-slim
WORKDIR /workspace
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir numpy scipy matplotlib obspy pandas
CMD ["python", "--version"]
Build the image:
docker build -t geophysics-lab:1.0 .
Run it interactively:
docker run --rm -it geophysics-lab:1.0 bash
Work with local data using volumes
Bind your local directory so scripts and data stay on your machine:
docker run --rm -it \
-v "$(pwd)":/workspace \
-w /workspace \
geophysics-lab:1.0 \
python your_script.py
This keeps your workflows reproducible while preserving local files.
Why mount your seismic data as a volume instead of copying it into the image?
Example: quick seismic trace plot with ObsPy
Create plot_trace.py:
from obspy import read
import matplotlib.pyplot as plt
st = read("example.mseed")
tr = st[0]
plt.figure(figsize=(10, 3))
plt.plot(tr.times(), tr.data, lw=0.9)
plt.xlabel("Time (s)")
plt.ylabel("Amplitude")
plt.title(f"{tr.id} | fs={tr.stats.sampling_rate} Hz")
plt.tight_layout()
plt.savefig("trace_plot.png", dpi=150)
print("Saved: trace_plot.png")
Run:
docker run --rm -it \
-v "$(pwd)":/workspace \
-w /workspace \
geophysics-lab:1.0 \
python plot_trace.py
Because the current directory is mounted as a volume, example.mseed is read from your machine and
trace_plot.png is written straight back to it — the container is disposable, your data isn’t.
Best practices for students
- Pin versions for critical libraries.
- Keep Dockerfiles small and readable.
- Use
.dockerignoreto avoid copying large unnecessary files. - Mount data as volumes instead of baking raw datasets into images.
- Tag images with meaningful versions (
1.0,1.1,2024-12).
Common pitfalls
- Permission mismatches between host and container files.
- Large image sizes due to unnecessary packages.
- Forgetting to persist outputs using mounted directories.
- Using
latesttags everywhere, which hurts reproducibility.
The latest trap: latest isn’t a “newest” guarantee — it’s just the default tag, and whatever
it points to can change under you. For reproducible science, pin an explicit tag (geophysics-lab:1.0,
python:3.11-slim) so a rerun next year uses the same environment you validated today.
Recap
Without scrolling up — can you trace the pipeline? Docker gives you:
- A Dockerfile recipe that builds into a read-only image,
- which you run as one or many disposable containers,
- sharing images through a registry and keeping data on the host with volumes,
- all pinned to explicit tags so the environment is reproducible.
Once you containerize your analysis environment, the same workflow runs consistently on your laptop, a lab machine, or the cloud — one of the fastest ways to make student projects reproducible and shareable.
Where to go next
- Docker’s official getting-started guide — images, containers, and Compose.
- ObsPy documentation — the seismology toolkit used in the example.
- Related post here: The Impact of Cloud Computing on Geophysical and Seismological Research — where these containers go to scale.
Disclaimer of liability
The information provided by the Earth Inversion is made available for educational purposes only.
Whilst we endeavor to keep the information up-to-date and correct. Earth Inversion makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services or related graphics content on the website for any purpose.
UNDER NO CIRCUMSTANCE SHALL WE HAVE ANY LIABILITY TO YOU FOR ANY LOSS OR DAMAGE OF ANY KIND INCURRED AS A RESULT OF THE USE OF THE SITE OR RELIANCE ON ANY INFORMATION PROVIDED ON THE SITE. ANY RELIANCE YOU PLACED ON SUCH MATERIAL IS THEREFORE STRICTLY AT YOUR OWN RISK.
Leave a comment