MacOS File Indexing using Python

Utpal Kumar   1 minute read      

A Python-based solution for indexing and searching files on a macOS system using SQLite, FAISS, and semantic search.

This project provides a Python-based solution for indexing and searching files on a macOS system. The tool indexes all files in a specified volume, stores their metadata in an SQLite database, and allows fast searching using semantic similarity and fuzzy matching.

Download the script here: https://github.com/earthinversion/macos-file-indexing

Features

  • Index all files on a specified macOS volume.
  • Store metadata (file path, file kind, size, volume name, modified time) in SQLite.
  • Perform semantic search using FAISS and Sentence Transformers.
  • Use fuzzy matching to find similar filenames.
  • Show formatted search results in a human-readable table.
  • Improve speed using FAISS caching.

Installation

Install Dependencies

pip install faiss-cpu sentence-transformers fuzzywuzzy pandas tabulate tqdm numpy python-Levenshtein

Clone the Repository

git clone https://github.com/earthinversion/macos-file-indexing.git
cd macos-file-indexing

Set Up the Database

  • Edit the configuration in config.yaml file.
  • Run the following command to build the indexing database:
python file_indexer.py

Searching for Files

  • To search for a file, use:
python search_files.py

Example output:

Do you want to rebuild the search cache? (yes/no): no
Enter filename to search: location_info.yaml
Exact match not found. Suggested files:
+---+-------------------------------------------------------------------+------------------------------------+--------------+-----------+---------------------+
|   |                               Path                                |             File Kind              | Size (bytes) |  Volume   |    Modified Time    |
+---+-------------------------------------------------------------------+------------------------------------+--------------+-----------+---------------------+
| 0 | /Volumes/QSIS_DISK/event_data_download_waveform_api/._config.yaml | AppleDouble encoded Macintosh file |   4.00 KB    | QSIS_DISK | 2025-01-26 14:18:00 |
| 1 |          /Volumes/QSIS_DISK/QSIS-Server-run/run_info.yml          |             ASCII text             |    101 B     | QSIS_DISK | 2022-06-27 23:55:45 |
| 2 |     /Volumes/QSIS_DISK/qsis-server-inspect/data/run_info.yml      |             ASCII text             |   2.46 KB    | QSIS_DISK | 2023-03-18 02:08:30 |
| 3 |           /Volumes/QSIS_DISK/line-bot-qsis/config.yaml            |             ASCII text             |    140 B     | QSIS_DISK | 2023-01-14 17:21:30 |
| 4 |  /Volumes/QSIS_DISK/event_data_download_waveform_api/config.yaml  |      Unicode text, UTF-8 text      |    511 B     | QSIS_DISK | 2025-01-25 12:51:35 |
+---+-------------------------------------------------------------------+------------------------------------+--------------+-----------+---------------------+

Best fuzzy match:
+---+----------------------------------------------------------+------------+--------------+-----------+---------------------+
|   |                           Path                           | File Kind  | Size (bytes) |  Volume   |    Modified Time    |
+---+----------------------------------------------------------+------------+--------------+-----------+---------------------+
| 0 | /Volumes/QSIS_DISK/qsis-server-inspect/location_info.yml | ASCII text |   1.10 KB    | QSIS_DISK | 2023-04-07 22:32:39 |
+---+----------------------------------------------------------+------------+--------------+-----------+---------------------+

Enter filename to search: wpa_supplicant.conf
Exact match found:
+---+----------------------------------------+------------+--------------+-----------+---------------------+
|   |                  Path                  | File Kind  | Size (bytes) |  Volume   |    Modified Time    |
+---+----------------------------------------+------------+--------------+-----------+---------------------+
| 0 | /Volumes/QSIS_DISK/wpa_supplicant.conf | ASCII text |    161 B     | QSIS_DISK | 2022-03-30 20:18:02 |
+---+----------------------------------------+------------+--------------+-----------+---------------------+

The script prints exact matches, suggested files, and best fuzzy match results with path and metadata.

Disclaimer of liability

The information provided by the Earth Inversion is made available for educational purposes only.

Whilst we endeavor to keep the information up-to-date and correct. Earth Inversion makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services or related graphics content on the website for any purpose.

UNDER NO CIRCUMSTANCE SHALL WE HAVE ANY LIABILITY TO YOU FOR ANY LOSS OR DAMAGE OF ANY KIND INCURRED AS A RESULT OF THE USE OF THE SITE OR RELIANCE ON ANY INFORMATION PROVIDED ON THE SITE. ANY RELIANCE YOU PLACED ON SUCH MATERIAL IS THEREFORE STRICTLY AT YOUR OWN RISK.