Linux server disk is full: how to diagnose and clean it up
What to check when df shows 100%, which cleanup targets are safe, and how the mttrly disk-full recipe runs the same triage from chat or an AI IDE.
Direct answer
When a Linux server runs out of disk space, first find what is consuming it: df -h for the filesystem, df -i for inodes, du for the biggest directories, docker system df for images and volumes. Clean safe targets first — apt cache, old journal logs, stale temp files, unused Docker images — and never delete application data blindly. mttrly automates this triage with its disk-full diagnostic recipe; cleanup playbooks that change server state run only after explicit human approval.
What you see
Disk pressure shows up indirectly at first — failed writes, crashing services, silent log loss — and then everything stops at once. Typical signals:
"ENOSPC: No space left on device" in application logs or during npm install / pip install
df -h shows 100% (or 99%) used on / or /var
Database refuses writes: "could not write to file: No space left on device"
Services fail to restart, logs stop updating, deploys fail mid-way
df -h shows free space but writes still fail — check df -i: the filesystem is out of inodes
Monitoring alert: disk usage above threshold (mttrly raises disk_pressure and inode_exhaustion alerts)
How to fix it manually
The standard SSH triage, in the order that finds the problem fastest. All commands are safe to run read-only first; only the cleanup steps change state.
Confirm which filesystem is full
Check usage per mount point. A full /var behaves differently from a full /.
df -h
Rule out inode exhaustion
If df -h shows free space but writes fail, the filesystem has run out of inodes — usually millions of tiny files (sessions, cache, mail queue).
df -i
Find the biggest directories
Walk the filesystem top-down to locate what is actually eating space.
du -xh --max-depth=2 / 2>/dev/null | sort -rh | head -20
Check journal logs and rotate them
systemd journals routinely grow to gigabytes. Vacuuming by age is safe.
journalctl --disk-usage journalctl --vacuum-time=7d
Clear the package manager cache
On Debian/Ubuntu the apt cache is fully reconstructible and safe to drop.
apt-get clean
Check Docker usage and prune unused images
Old images, build cache, and stopped containers are a common multi-gigabyte sink. Review what the prune will remove before confirming.
docker system df docker image prune -a
Truncate oversized logs instead of deleting them
Deleting a log file that a running process holds open does not free space — the process keeps the file handle. Truncate in place, or restart the service after deleting.
truncate -s 0 /var/log/your-app/huge.log # find deleted-but-still-open files: lsof +L1 | head
Do not delete application data, databases, or user uploads to free space under pressure — move them or grow the disk instead. And remember: space freed from files held open by a running process does not return until that process is restarted.
How mttrly handles it
mttrly ships a diagnostic recipe named disk-full that runs the same triage from Telegram or from an AI IDE (Claude Code, Cursor, Codex) connected over MCP. You ask what is eating the disk; the agent on the server gathers the evidence and the AI summarizes cleanup targets.
You ask, mttrly runs the disk-full recipe
From chat or via the mttrly_run_diagnostic MCP tool. The recipe is a defined sequence in the product, not an arbitrary shell session.
Step 1 — disk_check playbook (read-only)
Disk usage report: filesystems, usage percentages, the directories consuming space.
Step 2 — container_status playbook (read-only)
Docker context: which containers and images exist, as disk-consumption candidates.
AI analysis of the results
The recipe output is summarized into likely cleanup targets — log directories, apt cache, unused images — with sizes, so you decide what goes.
Example workflow
A realistic session from Claude Code connected to mttrly over MCP (values illustrative):
You -> Claude Code:
"Prod is throwing 'No space left on device'. Can you check what's eating the disk?"
Claude -> mttrly:
calls mttrly_run_diagnostic { recipe: "disk-full" }
mttrly returns:
- disk_check: / at 97% — /var/log 11 GB, docker images 8.4 GB, apt cache 1.2 GB
- container_status: 3 running, 2 exited containers
Claude -> You:
"Journal logs and unused Docker images are the biggest reclaimable targets.
I can request the disk_cleanup playbook — it needs your approval before
anything is deleted. Approve?"
You: approve from Telegram (or the dashboard)
mttrly: runs disk_cleanup, reports space freed, records the action in the audit logRemediation, gated by approval
safe_cleanup playbook — clears a fixed whitelist of low-risk targets only: apt cache, journal logs older than 7 days, /tmp files older than 7 days. Its scope is hardcoded in the agent; it cannot touch application data.
disk_cleanup playbook — the deeper cleanup (temp files, old logs, apt cache). It is an approve-required playbook: invoking it via mttrly_run_playbook creates a pending action, and nothing is deleted until you explicitly approve from Telegram, the dashboard, or your IDE.
Neither playbook deletes databases, user uploads, or application files. If the space is consumed by your own data, mttrly shows you where it is — the decision and the cleanup path stay with you.
Diagnostics are read-only. Fixes need your approval.
- +mttrly investigates on its own: diagnostic recipes and read-only playbooks inspect the server without changing anything.
- +Any state-changing fix creates a pending action. Nothing executes until a human — you, not the AI — approves it from Telegram, the dashboard, or the IDE confirmation flow.
- +Every diagnostic, pending action, approval, and result is recorded in the audit log.
- +mttrly does not auto-fix production. The agent diagnoses and suggests a remediation path; the decision stays with you.
Detection tools tell you something is wrong. mttrly is the agent on your server that diagnoses the incident and prepares the fix — through scoped MCP tools, diagnostic recipes, and remediation playbooks, with human approval in front of every state change. It complements monitoring like Grafana, Datadog, or UptimeRobot; it does not replace it.
A real case from onboarding
Shortly after signing up, a new user connected their VPS and the agent flagged a nearly full disk during its first checks. mttrly proposed a cleanup, the user approved it from chat, and the disk pressure was resolved within minutes — no SSH session involved.
When mttrly is not enough
Some disk problems are outside what an on-server agent should touch. Honest boundaries:
- -mttrly cannot grow the disk: partition resizing, LVM extension, or attaching a new volume happens in your provider console.
- -Cleanup playbooks work on safe, predefined targets. If the space is taken by your application data or a database, mttrly reports it but will not delete it — that call is yours.
- -Inode exhaustion caused by millions of application-generated files usually needs a manual, application-aware cleanup.
- -If the agent is offline (or the server is unreachable), there are no live diagnostics — use your provider console or SSH to recover first.
Frequently asked questions
Does mttrly clean up the disk automatically?
No. The disk-full diagnostic recipe is read-only — it reports usage and suggests targets. The safe_cleanup playbook only ever touches a hardcoded whitelist (apt cache, journal logs older than 7 days, stale /tmp files). The deeper disk_cleanup playbook requires explicit human approval before it runs, and every action is recorded in the audit log.
Will mttrly delete my application data or database to free space?
No. Cleanup playbooks operate on predefined safe targets only. If your own data is consuming the disk, mttrly shows you where the space went and leaves the decision to you.
Can Claude Code or Cursor run this diagnosis?
Yes. mttrly exposes 40 MCP tools, including mttrly_run_diagnostic for recipes like disk-full and mttrly_run_playbook for remediation. The AI assistant can investigate freely with read-only tools, but any state-changing playbook creates a pending action that a human approves.
Does mttrly need SSH access to do this?
The mttrly agent runs on the server and connects outbound, so day-to-day diagnostics and cleanup do not go through raw SSH. Installing the agent initially requires server access, and some situations (offline agent, disk resize, boot issues) still need SSH or your provider console.
Related
Catch disk pressure before it takes the site down
Connect a server in a few minutes. Start with free monitoring and read-only diagnostics; add approval-gated cleanup when you trust the workflow.