srmdn.

Before You Add Redis

Tue, 31 Mar 2026 00:00:00 GMT

Your app feels slow. The pages take a second too long, and someone mentioned Redis. Before you wire up a new service, check two things: whether nginx is compressing your responses, and whether your SQLite WAL is growing unchecked. Either one will make an app feel slow, and both take minutes to fix.

This article is written for apps running SQLite and nginx: a common setup for small to mid-size web apps deployed on a single server. The gzip and Redis sections apply regardless of your database. The WAL section is SQLite-specific; if you're on Postgres or MySQL, skip it.

Redis is a real solution. But it solves a specific problem: repeated expensive computation. If that's not your problem, adding Redis adds complexity without fixing anything.

Where the Slowness Lives

Performance problems in a web app come from three distinct places:

Browser <—— network ——> nginx <—— proxy ——> app <—— query ——> database
               ↑                                       ↑
             gzip                                     WAL
                                        ↑
                                      Redis

Gzip lives between nginx and the browser. WAL lives inside SQLite. Redis lives between your app and the database. They don't overlap.

A 30KB HTML page sent without compression is a network problem. Adding Redis doesn't help it. A SQLite database with an unchecked WAL file is a read latency problem. Gzip doesn't help it. A query that takes 200ms and runs 50 times per second is a computation problem. That's where Redis fits.

Gzip: The Free Win

Nginx can compress responses before sending them. Enable it in your server block and a 30KB HTML page becomes 7KB. Over a slow mobile connection, that's the difference between a page that loads and one that spins.

The config:

gzip on;
gzip_proxied any;
gzip_min_length 1024;
gzip_types text/plain text/css application/javascript application/json image/svg+xml;

gzip_proxied any is the directive most people miss. Without it, nginx only compresses files it serves directly from disk. If your app runs behind a reverse proxy (it probably does), nginx won't compress its responses unless you set this.

Two things gzip won't help: responses smaller than 1KB (the compression overhead isn't worth it) and binary formats like images and video (they're already compressed; gzip will make them larger).

WAL: The One-Line SQLite Fix

This section applies to SQLite only. Postgres and MySQL handle concurrency differently and don't have a WAL checkpoint problem in the same sense.

SQLite has two main journaling modes. The default (DELETE) locks the entire database file on every write. Any reader waiting for that lock blocks until the write finishes. For a web app handling concurrent requests, this creates visible latency under load.

WAL mode (Write-Ahead Log) separates reads from writes. Instead of modifying the database file directly, SQLite appends changes to a .wal file. Reads see a consistent snapshot of the main database without waiting for writes to finish.

Enable it once at connection setup:

PRAGMA journal_mode = WAL;

The catch: SQLite flushes WAL changes back to the main database (a "checkpoint") only when the WAL reaches 1000 pages by default. Until that threshold is hit, every read has to scan both the main database and the entire WAL to reconstruct the latest state. A WAL that grows for days before checkpointing can reach several megabytes, and reads slow down proportionally.

Lowering the threshold keeps the WAL small:

PRAGMA wal_autocheckpoint = 100;

Set this at connection init, alongside journal_mode. The WAL checkpoints every 100 pages and stays under 400KB. There's no meaningful downside unless you're on a write-heavy workload where frequent checkpoints create contention, which is uncommon for typical web apps.

Redis: When You Actually Need It

Redis is an in-memory key-value store. You run a database query once, store the result in Redis with an expiry, and serve subsequent requests from memory instead of running the query again.

That's the core use case. If you're not running the same expensive query repeatedly, Redis adds a service to deploy, monitor, and keep synchronized with your database without improving response times.

The cases where it genuinely helps: aggregation or ranking queries that run on every page load, session data that needs to be shared across multiple app instances, or any computation that takes hundreds of milliseconds and produces a result that stays valid for minutes.

The cases where it doesn't: simple lookups by primary key (already fast), apps where queries run a few dozen times per minute, or datasets small enough to fit in application memory.

Cache invalidation is what nobody mentions in the tutorials. Cached data goes stale. You need to decide when to expire it, whether to delete it on writes or wait for the TTL, and what happens when a cache miss occurs under load. None of this is complicated, but it's all code you write and bugs you debug. The cost is real.

Common Gotchas

| Layer | Mistake | Effect | | ----- | ------------------------------------ | ------------------------------------------------------- | | Gzip | Missing gzip_proxied any | Proxied app responses aren't compressed | | Gzip | Compressing images and video | Response gets larger, not smaller | | WAL | Default checkpoint threshold of 1000 | WAL grows for days; reads slow down | | WAL | Multiple concurrent writers | WAL allows one writer at a time; extra writers queue up | | Redis | Caching without an expiry | Stale data served indefinitely | | Redis | Not invalidating on writes | Cache and database diverge silently |

Is This Right for You?

Add gzip if you're serving HTML, CSS, or JSON over nginx and haven't confirmed Content-Encoding: gzip in your response headers. It's a config change and a reload.

Set wal_autocheckpoint = 100 if you're using SQLite in WAL mode on a web app with concurrent requests. One pragma at connection init, no schema changes, no migration.

Add Redis if a specific query is measurably slow, runs on every request, and the result stays valid long enough to be worth caching. Profile the query first. If it completes in under 20ms and runs a few hundred times per day, you don't have a database problem.

Most apps I've seen that feel slow are sending full HTML payloads uncompressed over the wire. The database is fine. Fix the network layer before adding infrastructure.

A Network Blip Is Not Just a Blip

Tue, 31 Mar 2026 00:00:00 GMT

A network blip is brief, self-healing, and invisible after the fact. It is also the kind of thing that silently wipes out a backup job while your site keeps running normally.

What is a network blip

A network blip is a temporary interruption in a network connection. Not a full outage. The server stays up, DNS resolves, ping responds. But for a few seconds (sometimes longer), packets get dropped or delayed enough that an active TCP connection times out and closes.

Common causes:

A router or switch along the path reboots or flushes its connection table
A cloud provider does brief maintenance on a network link
BGP route changes between two hosting providers mid-transfer
ISP congestion causing packet loss above the TCP retransmission threshold

The defining characteristic: by the time you notice and go to investigate, everything works fine again. There is no broken host to point at, no down service to restart. The failure window is already closed.

Why long-running operations take the hit

A web request that hits a blip just fails and retries in milliseconds. The user barely notices, the next request goes through, life continues. The entire exposure window is under a second.

A long-running transfer is different. An SCP upload that hits a blip mid-transfer loses the entire transfer. No retry, no resume. The connection is gone, the destination file is incomplete or missing, and the script that launched it either crashes or silently moves on.

This is the asymmetry: short operations tolerate blips, long operations do not. Database backups, large file transfers, remote sync jobs — anything that holds a TCP connection open for more than a few seconds is exposed. The longer the transfer, the higher the probability that a momentary blip lands inside it.

Case study: the failed backup

My backup timer fired at 3am. The script packaged SQLite databases, config files, and content into a .tar.gz, then uploaded to a remote VPS over SCP through a jump host. Every local step completed cleanly. Then the upload dropped.

The setup is a standard bash backup script scheduled via systemd timer. If you want a reference implementation, here is the one I use.

What the logs said

journalctl -u your-backup.service --since today

Output:

Mar 31 03:01:05 backup[12301]: [03:01:05] Creating archive...
Mar 31 03:01:05 backup[12302]: [03:01:05]   archive: backup_20260331.tar.gz (1.1M)
Mar 31 03:01:05 backup[12303]: [03:01:05] Uploading to remote...
Mar 31 03:08:18 backup[12304]: scp: Connection closed
Mar 31 03:08:21 systemd[1]: backup.service: Main process exited, code=exited, status=255/EXCEPTION
Mar 31 03:08:21 systemd[1]: backup.service: Failed with result 'exit-code'.
Mar 31 03:08:21 systemd[1]: Failed to start backup.service

Two things stand out. First, the gap: Uploading to remote... logged at 03:01, then silence until scp: Connection closed at 03:08. SCP hung for seven minutes before the connection dropped. Second, the log file itself has no "uploaded" confirmation line:

[03:01:05] Uploading to remote...
[03:08:18] ← scp: Connection closed (from journalctl, not the log file)

The backup script used set -euo pipefail. When scp exited non-zero, the script bailed immediately without writing another log entry. No "upload failed" message, because nothing in the failure path wrote to the log before exiting. The gap in the log file is the signal.

What caused it

The SCP route went through a jump host. The jump host stayed up (ping responded, port open), but the SSH session dropped mid-transfer. A brief interruption on the relay, a momentary blip on the path between relay and backup destination. The connection was healthy before and after.

To rule out a key issue or downed destination, test each hop separately:

# Test the relay directly
ssh -p <port> user@relay-host "echo OK"

# Test the full chain
ssh user@backup-destination "echo OK"

# Manual SCP test
scp /tmp/testfile user@backup-destination:/tmp/

All three succeeded immediately. The relay was healthy. The failure was a moment-in-time blip: gone before the investigation started.

Why it is easy to miss

The systemd unit did report Failed with result 'exit-code'. So systemctl status your-backup.service would show a failed state. But you only check that when something is obviously broken. A 3am backup failure does not break anything visible. The site stays up, requests keep coming in, nothing pages you.

The only reason I caught it was an alert email. Without that, I would have gone days believing the remote copy existed.

The real consequence

If a disk fails or a deploy goes wrong the next morning and you reach for the latest backup, you are restoring from data that is older than you think. In a low-traffic personal project that might be acceptable. In anything with active writes, that gap is data loss.

What to do after a failure

Check whether the local archive is still there. A backup script that packages locally before uploading will still have the archive even when the upload fails:

ls -lh ~/backups/

If the file is there, the data is not lost. Upload it manually:

scp ~/backups/backup_20260331.tar.gz user@backup-destination:/path/to/backups/

Then run the full backup script again to get a fresh timestamped copy and let it handle remote rotation normally.

What to add to prevent silent failures

Trap on exit and log the outcome. With set -euo pipefail, the failure path exits without writing to the log by default. Add a trap:

on_exit() {
    local code=$1
    if [[ $code -ne 0 ]]; then
        log "Backup FAILED (exit code: $code)"
        send_failure_alert "$code"
    fi
}
trap 'on_exit $?' EXIT

Send a failure notification. The systemd unit failure state is not enough because it requires you to look for it. Email via curl --ssl-reqd to an SMTP relay, a webhook, anything. The goal is a push notification, not a pull check.

Verify the remote after upload. After scp returns, SSH into the destination and confirm the file exists and is non-empty:

ssh backup-destination "ls -lh /backups/backup_${TIMESTAMP}.tar.gz"

A zero-byte file after a dropped connection is a real failure mode. SCP can create the destination file before the transfer completes.

The check that matters

After any incident, read the log file directly alongside journalctl. The log file shows what the script wrote. journalctl shows what systemd saw, including stderr. Together they give you the full picture. A gap between "Uploading..." and the next entry is the signature of a blip-killed transfer.

Network blips are outside your control. What is inside your control: whether your script logs the outcome, whether a failure sends you a notification, and whether you verify the remote copy after every upload. The blip cannot be prevented. The silent failure can.

Claude Code's Usage Bug and the Fight Behind It

Thu, 26 Mar 2026 00:00:00 GMT

My Claude Code session limit hit 100% in under two hours last week. I was doing normal work. Nothing unusual in my workflow, no massive context dumps, no runaway loops. I checked Reddit and found dozens of people reporting the same thing: limits draining in minutes, sessions dying mid-task, plans costing $100–$200/month behaving like free tiers.

The first instinct is to blame a bug. And there is a bug. GitHub issues are piling up on Anthropic's own repo (#38335, #9424), and Anthropic responded by doubling usage limits through March 27 as a temporary fix. The frustration is measurable: a METR study found that Claude Code increased task completion time by 19% compared to working without it, largely because users kept hitting limits mid-task and losing flow. But the timing of when this latest wave of issues started deserves more attention than it's getting.

March 23, 2026. The same week Anthropic's standoff with the Pentagon became public.

What Anthropic Refused

This is not a company that quietly declined a government contract. Anthropic published a formal statement explaining exactly what the Department of War demanded and what they refused to allow.

Two things. Mass domestic surveillance of Americans using their AI models. And fully autonomous weapons systems without human oversight in the decision chain.

On the weapons point: the distinction matters. Partially autonomous weapons, where a human confirms the final decision to engage, already exist and Anthropic has no objection to those. Fully autonomous means the AI selects and strikes a target with no human in the loop. Think a drone that identifies, decides, and fires without a soldier ever pressing a button. Anthropic's argument is that current AI models hallucinate and make errors at rates that are simply not acceptable for that kind of irreversible action.

The government's position: accept "any lawful use" and remove the safety restrictions. Anthropic's position: no.

The Pentagon's response was to threaten designating Anthropic a "supply chain risk." That label has previously been reserved for foreign adversaries like Huawei and ZTE. Applying it to a U.S. company is unprecedented. A California judge noted the government may be "attempting to cripple Anthropic."

The government's counterargument is worth stating clearly. The Pentagon's position is not "we want rogue AI." Their argument is that AI deployment decisions in national security contexts should be governed by existing law and military oversight, not unilaterally by a private company's internal safety team. From that angle, Anthropic is being asked to trust the legal and institutional framework the U.S. already has, not to override it. Whether you find that convincing depends on how much you trust those institutions right now.

The OpenAI Contrast

OpenAI joined Stargate, a U.S. government AI initiative worth $500B with direct Pentagon involvement. They accepted the terms. They are not facing this pressure.

I'm not saying one decision is obviously correct. But the contrast explains a lot about why Claude Code feels like it's running on strained infrastructure right now while other tools operate without incident.

The Migration Wave

Claude.ai now lets you import your ChatGPT memory directly from settings. Anthropic built that feature with purpose, and the timing tells you something about how they see the market.

A number of developers left OpenAI's products after the Stargate announcement. Some didn't want their daily tools tied to autonomous weapons contracting. Others just followed the momentum. Claude Code's user numbers climbed, which put more load on infrastructure already under political pressure.

Whether any of the usage drain traces back to deliberate attacks on Anthropic's servers, I can't confirm from the outside. It's speculation, and I want to be clear about that. But the context (a public standoff with a powerful government institution, combined with a user surge) makes infrastructure pressure of all kinds more plausible than it would have been six months ago.

Does Anthropic Know?

Yes. And the evidence is in their own actions.

The GitHub issues flagging the usage drain are filed directly on Anthropic's repo. Their engineering team sees every one. The decision to double limits through March 27 was not a coincidence. It was a direct response to the volume of complaints hitting Reddit, X, and their own issue tracker within days. Companies at this scale have people whose job is to monitor exactly that.

The harder question is bandwidth. A team simultaneously managing an existential legal fight with the Pentagon, an infrastructure surge from new users, and a billing bug that's hard to reproduce cleanly is a team with limited capacity even when fully informed. Knowing about a problem and having the space to fix it properly are two different things. The doubled limits are the fastest lever they could pull without a proper fix in place.

The Stakes Are Existential

Anthropic is not yet profitable. Google has invested $2B and Amazon $4B. A "supply chain risk" designation would cut off government contracts and create pressure on those investors. Both Google Cloud and AWS have their own federal contracts that a high-profile association with a designated "supply chain risk" could complicate. This is not a PR dispute. The company's ability to keep operating is on the line.

The irony: Anthropic was founded by people who left OpenAI specifically over AI safety disagreements. They are now being threatened by their own government for maintaining those same safety standards.

If you're on Claude's paid plans ($100 or $200 a month) and your limits drain in two hours, you deserve an explanation. "We're doubling limits temporarily" is not one. It's a patch.

What Happens Next

The case is being heard in a California federal court. No ruling date has been confirmed publicly, but the judge's early comments suggest the court is taking the "supply chain risk" designation seriously as an overreach.

Two outcomes matter beyond Anthropic itself. If the Pentagon wins, every AI company operating in the U.S. faces the same demand: remove your safety restrictions or lose government access. The pressure would cascade quickly given how much of the AI industry depends on federal contracts and cloud revenue. If Anthropic wins, it creates legal precedent that private companies can hold the line on specific use cases even under government pressure, and it opens space for actual legislation on autonomous weapons and AI surveillance, something 69% of Americans say they want according to polling cited by Al Jazeera.

Follow the case through court filings on CourtListener and coverage from The Register if you want to track it directly.

My Take

I'm going to keep using Claude Code. The usage issues are frustrating, but the company's position on autonomous weapons and domestic surveillance is defensible. Plenty of AI tools exist with no restrictions on either use case. I'd rather wait out a broken session limit than use one of them.

I understand the Pentagon's argument too. Institutional oversight exists for a reason, and private companies unilaterally deciding what the military can and can't do with a product sets a complicated precedent in the other direction. This is a genuinely hard problem, not a clean villain story.

But when a government labels its own citizen company using the same designation reserved for adversary nations, that's worth paying attention to regardless of where you land on the underlying policy question.

The bug you noticed in your session usage is real. The context around it is bigger than the bug.

nginx 1.28.3: Six CVEs, One Upgrade

Thu, 26 Mar 2026 00:00:00 GMT

nginx has been running on your server for months, probably years. You haven't touched its config. It just works. Three days ago it got six security fixes in one release, and every one of them is worth at least a glance.

The release

nginx 1.28.x is the stable branch. When it ships security fixes, they're deliberate backports from mainline, not experimental changes. Version 1.28.3 came out March 24, 2026. If you're on Ubuntu and haven't upgraded yet, apt upgrade nginx is all it takes.

Six CVEs at a glance

| CVE | Severity | CVSS 4.0 | What's vulnerable | Impact | | ----------------------------------------------------------------- | -------- | -------- | ---------------------------------------- | ------------------------------------ | | CVE-2026-27654 | High | 8.8 | alias + WebDAV COPY/MOVE | Path escape outside document root | | CVE-2026-27651 | High | 8.7 | CRAM-MD5/APOP mail auth with retry | Worker process segfault | | CVE-2026-27784 | High | 8.5 | MP4 module, 32-bit platforms | Worker process crash | | CVE-2026-32647 | High | 8.5 | MP4 module, all platforms | Worker process crash | | CVE-2026-28755 | Medium | 5.3 | OCSP in stream module | Revoked client cert accepted | | CVE-2026-28753 | Medium | 6.3 | PTR DNS records in auth_http/SMTP proxy | Data injection into backend requests |

Breaking each one down

CVE-2026-27654 lives in how nginx handles the alias directive when WebDAV methods (COPY, MOVE) are enabled. An attacker can craft a request that shifts the source or destination path outside the document root. The alias directive has a long history of path-handling edge cases in nginx. No config change is needed to get the fix, just the upgrade. But if you have WebDAV enabled on a public server, audit what methods you're actually allowing.

CVE-2026-27651 affects nginx's mail proxy module when CRAM-MD5 or APOP authentication is used with retry enabled. A segmentation fault in the worker process means the request dies and the worker restarts. That's availability impact, not data exposure. If you're not running nginx as a mail proxy, this doesn't touch you.

CVE-2026-27784 and CVE-2026-32647 both live in the MP4 module. The first affects 32-bit platforms specifically; the second affects all platforms. Both result in worker process crashes when processing a specially crafted MP4 file. If you're not using ngx_http_mp4_module, nginx doesn't compile or load it by default on most distributions anyway. Check with nginx -V 2>&1 | grep mp4 to confirm.

CVE-2026-28755 is the one that surprised me. In the stream module, an OCSP check could reject a client certificate and the TLS handshake would succeed anyway. The entire point of OCSP is to revoke certificates that should no longer be trusted. A bypass here means a revoked cert gets through silently. Most deployments don't use stream with mutual TLS, so the practical impact is limited. But the failure mode is worth knowing: the check ran, it said no, and nginx said yes anyway.

CVE-2026-28753 is the DNS injection one. When nginx does reverse DNS lookups for auth_http or the XCLIENT command in SMTP proxy flows, it uses the PTR record response as input. An attacker who controls the DNS server answering those PTR queries can inject data into the headers sent to your backend. Exploiting this requires controlling upstream DNS, which raises the bar considerably. Still, it belongs to a class of bugs that keeps appearing across different software: DNS responses are attacker-controlled data, and treating them as trusted input is the root cause every time.

QUIC improvements

This release also ships two non-CVE changes to QUIC handling worth noting.

nginx now limits the size and rate of QUIC stateless reset packets. Without this, a misbehaving or malicious peer could trigger an unbounded volume of stateless resets. The second fix addresses a bug where a QUIC packet received by the wrong worker process caused the connection to terminate instead of being handed off correctly.

Neither of these is a security vulnerability with a CVE, but both affect connection stability for any deployment using QUIC.

Updating

apt upgrade nginx
nginx -v
systemctl status nginx

The package post-install script handles the service restart. No config changes, no downtime beyond the few seconds nginx takes to reload. Worker processes drain and restart cleanly.

After the upgrade, nginx -v should show nginx/1.28.3. If the status shows the service running with a timestamp matching your upgrade, you're done.

ProtectHome=yes in Systemd Breaks Subprocesses Too

Wed, 25 Mar 2026 00:00:00 GMT

Your service runs fine. You click a button that triggers a subprocess: npm run build, a shell script, anything. It crashes immediately with EACCES: permission denied on some path under /home. You didn't touch /home. Your service doesn't use /home. Nothing makes sense.

ProtectHome=yes is why.

What ProtectHome Actually Does

When you add ProtectHome=yes to a systemd unit, systemd makes /home, /root, and /run/user completely inaccessible to that service. Not just hidden. Inaccessible. Any read or write attempt returns permission denied.

The restriction applies to every subprocess the service spawns, not just the service itself. The sandbox is inherited. Your Go binary running as deploy can't access /home/deploy. Neither can the npm process your Go binary spawns.

Why It's Hard to Catch

The service itself rarely touches home directories. You run it, it works, you move on. The problem only surfaces when your service runs a tool that assumes it can write to ~/.config/ or ~/.cache/.

Build tools are the usual culprit. Many of them write telemetry or cache data to the home directory on first run. Astro writes to ~/.config/astro. Some npm tools write to ~/.npm. These writes happen before the actual task starts, so the error appears immediately and looks like a misconfiguration, not a sandbox issue.

The Fix

You have two options.

Option 1: Disable the home-dir write on the subprocess side.

Most tools that write to home directories do it for telemetry or caching, and they provide an environment variable to disable it:

cmd := exec.Command("npm", "run", "build")
cmd.Env = append(os.Environ(), "ASTRO_TELEMETRY_DISABLED=1")

os.Environ() carries the parent's full environment through. The extra variable disables the telemetry write. No other behavior changes.

Option 2: Remove ProtectHome.

Don't. ProtectHome=yes prevents a compromised service from reading your SSH keys, dotfiles, and anything else stored under home directories. Removing it to fix a telemetry write is the wrong trade-off.

Finding the Right Disable Flag

The pattern is consistent across build tools:

| Tool | Environment variable | |------|----------------------| | Astro | ASTRO_TELEMETRY_DISABLED=1 | | Next.js | NEXT_TELEMETRY_DISABLED=1 | | Nuxt | NUXT_TELEMETRY_DISABLED=1 | | Gatsby | GATSBY_TELEMETRY_DISABLED=1 | | Angular CLI | NG_CLI_ANALYTICS=false | | .NET CLI | DOTNET_CLI_TELEMETRY_OPTOUT=1 |

If the tool doesn't have a telemetry flag, check whether it respects XDG_CONFIG_HOME or XDG_CACHE_HOME. You can redirect those to a path your service can actually write to:

cmd.Env = append(os.Environ(),
    "XDG_CONFIG_HOME=/var/www/myapp/.config",
    "XDG_CACHE_HOME=/var/www/myapp/.cache",
)

How to Confirm This Is Your Problem

Check your service unit:

systemctl cat your-service-name

Look for ProtectHome=yes or ProtectHome=read-only. Then check what the subprocess is trying to access in the error output. If the path is under /home, /root, or /run/user, this is your problem.

Run the subprocess manually as the service user outside of systemd to verify:

sudo -u deploy npm run build

If it works manually but fails under systemd, the sandbox is the issue.

Is This Right for You?

Keep ProtectHome=yes. Fix the subprocess by passing the right environment variable.

If your subprocess legitimately needs home directory access, consider whether that data belongs there at all. Writing user-specific state or reading credentials from ~/.config works fine outside a sandbox, but it's a dependency you shouldn't need. A service running under a dedicated system user with /var/www/... as its working directory rarely needs home directory access.

Tighten the sandbox. Reduce what the subprocess expects to find there.

Self-Hosted Referral Links Without the /s/

Sun, 22 Mar 2026 00:00:00 GMT

Raw referral links are self-defeating. When someone sees a URL with ref= or a /s/ prefix, they know two things: it's a shortener, and you earn something if they click. Some people skip these on principle. Others open them but strip the referral parameter before buying. Either way, you lose the commission.

The fix is to hide the destination. Give people yourdomain.com/shopee with no indication of where it goes.

I looked at Slash first

Slash is a self-hosted link shortener with a few thousand GitHub stars, active development, Docker deployment, analytics, and multi-user workspace support. For team bookmarks and internal shortcuts, it works well.

The problem: Slash uses a /s/ prefix. Your link becomes yourdomain.com/s/shopee. For internal shortcuts shared with a team, that format is fine. For referral links meant to look natural, the /s/ announces what you're doing.

Slash targets teams sharing internal shortcuts. The /s/ prefix keeps the namespace clean and predictable for that use case. Referral link hiding is a different problem.

What I built instead

plink is a single Go binary. SQLite database, templates embedded directly in the binary via Go's embed package. No Docker, no npm, no build step. Deploy it by copying a binary and an env file to your server. Clean slugs by default: yourdomain.com/shopee, destination invisible from the URL.

The admin panel sits behind a configurable path you set in your env file. The path doesn't appear in the source code. Visitors browsing your public link list have no way to find the admin URL from the page source.

Click analytics are built in: total counts, a 30-day chart, and referrer breakdown. That last one tells you where your traffic comes from, which is more useful than the total number alone.

The code is on GitHub: github.com/srmdn/plink

Slash vs plink

| | Slash | plink | |------------------|-------------------------|-------------------------| | URL format | domain.com/s/link | domain.com/link | | Deployment | Docker | Single binary | | Users | Multi-user, teams | Single user | | Frontend | React + TypeScript | Vanilla HTML, embedded | | Database | SQLite or PostgreSQL | SQLite | | Browser extension| Yes | No | | License | AGPL-3.0 | MIT |

Is This Right for You?

Use plink if you run referral links on your own domain and want the destination hidden from the URL. The single-binary deployment is a practical advantage on a VPS you're already managing.

Use Slash if you want team collaboration, a browser extension, or you'd rather run a maintained Docker image. The community is larger and the feature set is broader.

The clean slug vs /s/ distinction sounds minor. For referral links, it changes whether visitors click or not.

A Private Client Portal for Freelancers

Thu, 19 Mar 2026 00:00:00 GMT

If you're doing client work, someone already solved this problem for you. Dubsado, HoneyBook, Notion, even a well-organized Gmail folder. These exist, they work, and they're cheaper than the time it takes to build something from scratch.

Why did I build my own client portal anyway?

The short answer: I wanted my client data on my server, not someone else's.

The problem with existing tools

Most tools in this space are either too general-purpose or too expensive. Notion gives you infinite flexibility but zero client access control. Share a Notion page with a client and they can see everything in that workspace if you're not careful. Project management tools like Trello or Linear are built for internal teams, not external clients. And the professional CRM options are full suites with pricing to match, more than you need if you're running a small freelance operation.

What I actually needed was straightforward: a place where I can invite a client, assign them to their project, share documents, and see a clean record of what happened and when.

What I built

The system runs at sys.srmdn.com. Each client gets their own space and only sees their own projects, with no visibility into anyone else's work.

Clients get an invite link by email when I add them. They set a password, land on their dashboard, and see their projects and any documents I've shared.

The document editor is Markdown-based. I write notes, deliverables, or reports inside the system, and when something is client-facing, I send it directly to their email with one click. That last part replaced a workflow I used to hate: draft in one app, copy to email, reformat, send, and then immediately lose track of whether I actually sent it.

There's also an audit log. Every significant action gets recorded with a timestamp. The value of this only became clear after using it: when a client says "I never received that," you can pull up exactly what happened.

How it's built

Backend is Go, using the Fiber framework. Frontend is React with Vite, served as a static SPA. Database is SQLite. Everything runs on the same VPS that serves this blog: one more nginx vhost, one more systemd service.

No Docker, no Kubernetes. Deployment is building a binary and restarting a service. The whole thing runs comfortably under 50MB of RAM.

I chose SQLite deliberately. This is a single-user system with a small number of clients. SQLite's concurrency limits aren't a concern at this scale, and I can back up the entire database by copying one file. For a system like this, SQLite is the right call.

What surprised me

Building it was faster than I expected. Getting it secure took much longer.

After the initial build, I ran a self-audit and found 17 issues. Most were minor (missing security headers, too-long JWT expiry, overly verbose error messages on auth failures), but a few would have been real problems in production. None were catastrophic, but it was a useful reminder that "it works" and "it's safe" are different checkboxes.

The invite flow was also more complex than it looks. Expired links, duplicate accepts, re-invites for existing accounts: each one has to be handled explicitly. The happy path is five lines of code. The edge cases are fifty.

The other surprise was the editor. I started with a popular rich text editor and eventually replaced it with Milkdown, a Markdown-first WYSIWYG. The reason: the original editor stored content as HTML blobs, which means documents only render correctly inside that same editor. Markdown is plain text, readable anywhere without a special tool, and it doesn't become unreadable as software changes.

Most freelancers should use an existing tool. Building your own is only worth it if control matters more than time cost, and you're clear-eyed about what that trade looks like over two or three years of maintenance.

For me, it was worth it. I know exactly what the system does, I own the data, and it fits my workflow precisely. But I've also spent more hours on it than any SaaS subscription would have cost me. That's not a regret, just a fact worth naming.

The system is open to other freelancers at sys.srmdn.com. You get the same setup I use: project workspaces, a Markdown doc editor, and invite-by-email for clients. No seat pricing or company tiers.

AppArmor Had a Privilege Escalation Bug. Since 2017.

Sat, 14 Mar 2026 00:00:00 GMT

AppArmor is supposed to be one of the deeper layers of Linux security. It sits inside the kernel, enforces access control policies, and restricts what any given process can do even after it is already running. The pitch is: even if something breaks through, AppArmor contains the damage.

On March 12, 2026, Qualys published nine vulnerabilities in AppArmor itself. They named them CrackArmor. The flaws had been sitting there since 2017.

What AppArmor Actually Does

Before getting into the bugs, it is worth being clear on what AppArmor is and why it matters.

AppArmor is a Linux Security Module that enforces mandatory access control. Unlike filesystem permissions, which are set by the file owner, AppArmor policies are defined by the administrator and enforced by the kernel regardless of what the process wants to do. A web server process confined by an AppArmor profile cannot read /etc/shadow even if it is running as root, cannot open a network socket it was not explicitly allowed, and cannot exec arbitrary binaries.

On Ubuntu, AppArmor is enabled by default. You did not have to opt in. It is running on your server right now, with profiles active for a number of system services.

The idea is that AppArmor is a last line. Even if an attacker exploits your app, they land inside the AppArmor box and cannot get further.

The CrackArmor Flaws

Qualys found nine vulnerabilities in the AppArmor kernel code, all requiring only an unprivileged local user account. No root. No special group membership. Just a shell.

The impacts break into three categories:

Local privilege escalation to root. The most serious outcome. By chaining AppArmor bugs with interactions through standard system tools like sudo and postfix, an unprivileged user could reach root on the machine. This is the kind of bug that turns a limited foothold into full control.

Denial of service via stack exhaustion. AppArmor handles nested policy namespaces recursively. An attacker could craft a deeply nested policy structure to blow the stack and crash the kernel. No special privilege required — anyone with a local account could take down the machine.

KASLR bypass via out-of-bounds reads. KASLR hides where kernel code and data live in memory, making exploitation harder. An out-of-bounds read in AppArmor's pattern matching engine could leak kernel addresses and make other attacks more reliable.

The specifics: missing bounds checks in the DFA verifier, a double-free in namespace cleanup, race conditions in policy data lifecycle, and an unprivileged user being able to trigger privileged policy management operations. Nine separate issues, all in the AppArmor policy loading and parsing code.

None of these are exotic. Out-of-bounds reads and double-frees are the kind of bugs that turn up in security audits of C code that handles untrusted input. AppArmor parses policy files from userspace. That is the attack surface, and it had not been audited thoroughly for nine years.

The Scale

AppArmor ships enabled by default on Ubuntu. Qualys estimated over 12.6 million enterprise Linux instances actively running it at the time of disclosure.

That is not 12.6 million servers any attacker on the internet can reach. The attack requires a local user account. But local access is not as rare as it sounds. A compromised web app that achieves code execution, a misconfigured multi-tenant system, a service account a former employee still has access to — all of these count as local access.

The exploitability depends on context. On a single-user VPS where you are the only person with a shell, the practical risk is lower. On a shared system, it is much more serious. But the point of CrackArmor is that the very thing meant to contain a breach after local access was achieved was itself the path to escalate from that access.

The Fix

The kernel fix shipped on March 6, six days before Qualys published the details publicly. If you updated your kernel before March 12, you were patched before the vulnerability was public knowledge.

On Ubuntu 24.04, that means kernel version 6.8.0-106-generic or later.

uname -r

If it shows something older:

apt update && apt upgrade
reboot

The reboot is not optional. A kernel update does not take effect until you boot into the new kernel. Running apt upgrade and skipping the reboot leaves you on the old kernel regardless of what apt reports.

What This Changes About the Maintenance Routine

Most people treat kernel updates as optional. The kernel rarely breaks anything, updates are infrequent, and a reboot means downtime. So it slides. Weeks, sometimes months.

CrackArmor is a good example of why that is the wrong call for security updates specifically.

The kernel is not just the thing that boots. It is the security boundary between processes, between users, between the OS and the hardware. Vulnerabilities in it cannot be mitigated with a config change or a WAF rule. The only fix is the patched kernel, and the only way to run it is to reboot.

A practical approach: check for kernel updates weekly, apply them, schedule the reboot. On a low-traffic personal site, a 60-second reboot at off-peak hours is not a meaningful event. Treating it as one leads to running a known-vulnerable kernel for months.

Is This Right for You?

If you run a single-tenant VPS where you are the only one with shell access, apply the kernel update and you are done. The practical risk from a local privilege escalation on a server only you can log into is real but bounded.

If you run anything with multiple users, shared hosting, or services that execute code on behalf of untrusted input, this is higher priority. Local privilege escalation in that context means any foothold becomes full root.

Either way, the fix is the same: update the kernel, reboot, verify you are on the patched version. The only bad outcome is knowing about it and not applying it.

References

CrackArmor: Critical AppArmor Flaws Enable Local Privilege Escalation to Root — Qualys, March 12, 2026
AppArmor vulnerability fixes available — Ubuntu Blog
Nine CrackArmor Flaws in Linux AppArmor Enable Root Escalation, Bypass Container Isolation — The Hacker News
Ubuntu's AppArmor Hit By Several Security Issues — Phoronix

Attacked Every 23 Seconds. Why I'm Not Worried.

Mon, 09 Mar 2026 00:00:00 GMT

When I checked my server logs last week, I found over 23,000 failed SSH login attempts in seven days. That works out to roughly one attempt every 26 seconds, around the clock.

My first reaction was panic. My second was: this is completely normal.

Any server with a public IP gets this. It is not a targeted attack. It is automated bots sweeping the entire internet, trying default credentials on every IP they find. They are looking for the one server where someone left the root password as admin123, or where SSH still accepts passwords at all. They do not know whose server this is. They do not care.

What stops them is not magic. It is several layers of boring configuration.

Layer 1: SSH Hardening

SSH is the most common attack vector on a public VPS. The bots know this. So the first job is making your SSH as uninteresting as possible.

Disable password authentication. This is the single most important change. With PasswordAuthentication no, a bot can guess the right username and it still does not matter. They cannot get in without your private key. Password brute-force stops being a threat entirely.

# /etc/ssh/sshd_config
PasswordAuthentication no
PubkeyAuthentication yes
PermitRootLogin prohibit-password

Move off port 22. Port 22 is the first port every SSH scanner checks. Moving to a non-standard port will not stop a determined attacker, but it eliminates most background noise and reduces log spam significantly.

Tighten the other knobs. A few more settings worth setting explicitly:

MaxAuthTries 3        # disconnect after 3 failed attempts per connection
LoginGraceTime 30     # 30 seconds to authenticate, not the default 2 minutes
X11Forwarding no      # no reason to have this on a headless server
MaxStartups 10:30:60  # start dropping new connections above 10 pending, reject all above 60

Add fail2ban. Even with key-only auth, bots can waste server resources by hammering connections. fail2ban watches your SSH logs and bans IPs after repeated failures. A 24-hour ban after 3 failed attempts is a reasonable starting point, matching MaxAuthTries so the ban triggers the moment they exhaust their attempts.

# /etc/fail2ban/jail.local
[sshd]
enabled  = true
port     = your-ssh-port
maxretry = 3
bantime  = 86400

Whitelist users explicitly. List only the accounts that actually need SSH access using AllowUsers. Any account not on that list cannot log in over SSH, even with a valid key. This matters if a compromised service account somehow gets a key added.

AllowUsers appuser

If you want to audit your current state rather than configure from scratch, a few scripts from sysadmin-scripts are useful here. ssh-audit.sh checks your sshd configuration for common weaknesses and gives a CLEAN, WARNING, or CRITICAL verdict with the exact command to fix each finding. user-audit.sh scans for UID 0 duplicates, accounts with empty passwords, unexpected sudo access, and SSH authorized keys across all home directories. fail2ban-report.sh gives you a per-jail summary, top offending IPs, and recent ban events. Useful for a quick picture of what is actually hitting your server.

Layer 2: Network, Lock Down What Is Reachable

A typical self-hosted web app runs multiple processes: a backend API on one port, maybe a frontend server on another. None of these should be directly reachable from the internet. Only nginx should face the outside world.

Bind to localhost. When your app starts, configure it to listen on 127.0.0.1, not 0.0.0.0. The difference is significant. 0.0.0.0 listens on all interfaces including your public IP. 127.0.0.1 only listens on the loopback interface, unreachable from outside the machine.

// Good: only reachable locally
server.ListenAndServe("127.0.0.1:8080", handler)

// Exposes the port to the public internet
server.ListenAndServe(":8080", handler)

Most frameworks read the bind address from an environment variable. Set HOST=127.0.0.1 alongside your PORT and make sure your app actually reads it. It is easy to set HOST in an env file and then have the code ignore it entirely.

Back it up with iptables. Even if the app binds correctly, an explicit firewall rule adds a second line of defence:

# Allow localhost, drop everything else
iptables -I INPUT -p tcp --dport 8080 ! -s 127.0.0.1 -j DROP

# IPv6: blanket drop
ip6tables -I INPUT -p tcp --dport 8080 -j DROP

# Persist across reboots
netfilter-persistent save

One thing to watch: use ! -s 127.0.0.1 -j DROP rather than a plain -j DROP. If you need to debug via an SSH tunnel, the tunnel traffic comes from 127.0.0.1. A blanket DROP silently breaks it.

open-ports-audit.sh from the same sysadmin-scripts collection lists every listening port with its process name and owner, compares it against a whitelist you define, and flags anything unexpected. Worth running after any deployment or infrastructure change.

Layer 3: Process Isolation via systemd

Running your app as root is a bad idea. If the process gets compromised, the attacker gets root. Run it as an unprivileged user instead, one with write access to the app directory and nowhere else.

[Service]
User=appuser
Group=appuser

systemd has built-in sandboxing directives that add meaningful isolation at no extra cost:

NoNewPrivileges=yes     # process cannot gain privileges via SUID binaries
PrivateTmp=yes          # isolated /tmp, cannot see other processes' temp files
ProtectHome=yes         # system home directories are invisible to this process
ProtectSystem=strict    # filesystem is read-only except for explicitly allowed paths
ReadWritePaths=/var/www/myapp  # the one directory the app actually needs

These are defense-in-depth. If your app gets exploited, the attacker ends up stuck inside a box. They cannot read sensitive system directories, cannot modify system files, and cannot escalate privileges. The damage radius shrinks dramatically.

One gotcha: ProtectHome=yes breaks any runtime that writes to home directories for caching or telemetry. Check your runtime's docs for an env var to redirect or disable it rather than removing the protection entirely.

Layer 4: HTTP Security Headers

Once traffic reaches nginx, a few response headers tell browsers how to handle your content. Put these in a shared snippet included by every vhost:

add_header Strict-Transport-Security "max-age=63072000; includeSubDomains" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Permissions-Policy "camera=(), microphone=(), geolocation=()" always;
server_tokens off;

HSTS is the most important one. Once a browser sees it, it will refuse to connect to your domain over plain HTTP for the duration of max-age. Two years is the standard recommendation.

server_tokens off hides your nginx version from response headers. There is no reason to advertise which version you are running to anyone scanning for known vulnerabilities.

Content Security Policy needs to be per-vhost, not global. A CSP defines which origins a page is allowed to load scripts, styles, fonts, and make API calls to. Different apps have different requirements. A shared CSP either ends up so permissive it is useless, or it silently breaks something. Define it individually for each vhost based on what that app actually loads.

Layer 5: Consider a WAF on a Separate VPS

The four layers above protect the server itself. A Web Application Firewall (WAF) works one level higher, inspecting HTTP traffic before it even reaches nginx, and blocking common attack patterns like SQL injection, XSS, and malicious bots.

SafeLine is a self-hostable WAF worth looking at. The recommended setup is to run it on a separate VPS under the same cloud provider, connected via a private network to your main server. The reason for a separate VPS is practical: WAF software can be resource-intensive depending on traffic volume, it has its own port requirements that can conflict with existing services, and keeping it isolated means you can scale it independently without touching your app server.

The traffic flow looks like this:

Internet -> WAF VPS (inspects & filters) -> App VPS (your nginx + apps)

Your app server never receives traffic directly from the internet. Everything passes through the WAF first.

Layer 6: TLS and Certificate Auto-Renewal

Every site should be HTTPS only. Not just the login page. Everything. Plain HTTP leaks session cookies, exposes content to network-level tampering, and modern browsers are actively warning users away from it.

Let's Encrypt makes this free. Certbot handles issuance and renewal:

# Issue a cert for your domain
certbot certonly --nginx -d yourdomain.com

# Test auto-renewal
certbot renew --dry-run

The renewal part matters as much as the issuance. Let's Encrypt certificates expire after 90 days. Set up a systemd timer or cron job to run certbot renew twice a day. That way you are never caught with an expired cert.

Redirect HTTP to HTTPS at the nginx level so there is no way to accidentally serve content over plain HTTP:

server {
    listen 80;
    server_name yourdomain.com;
    return 301 https://$host$request_uri;
}

HSTS from Layer 4 completes the picture. Once the browser has seen the HSTS header over HTTPS, it will refuse to even attempt an HTTP connection to that domain in the future.

Layer 7: Sit Behind a CDN

A CDN does more than cache static files. When your server is behind Cloudflare or a similar provider, your real server IP is hidden from the public internet. Attackers scanning for your origin have a harder time finding where to actually send traffic.

More importantly, volumetric DDoS attacks hit the CDN edge, not your server. A flood of traffic that would knock over a small VPS gets absorbed across hundreds of edge nodes. This is one of the most cost-effective security layers available, and the free tier of most CDN providers covers everything a personal site or small project needs.

A few things to verify when using a CDN:

Configure your origin server to only accept connections from the CDN's IP ranges, not the open internet. Otherwise the protection is cosmetic. An attacker who discovers your real IP can bypass the CDN entirely and hit your server directly.

Make sure your SSL configuration is set to "Full (strict)" mode if you use Cloudflare. The "Flexible" mode means traffic between Cloudflare and your server travels unencrypted, which defeats the point of having a certificate.

Layer 8: Automatic Security Updates

A server that is never updated is a server that will eventually be compromised. Known CVEs get published, exploit code follows, and bots start scanning for vulnerable versions within days.

On Ubuntu and Debian-based systems, unattended-upgrades handles this:

apt install unattended-upgrades
dpkg-reconfigure -plow unattended-upgrades

The default configuration applies security updates only, not every available update. That is the right call. You want patches for known vulnerabilities applied automatically. You do not want an unattended dist-upgrade breaking something on a production server at 3am.

Verify it is actually running:

systemctl status unattended-upgrades

Check the logs periodically at /var/log/unattended-upgrades/ to confirm packages are being updated. It is easy to install and forget, then discover months later that it was silently failing.

Layer 9: Backups Are a Security Layer

Most people think of backups as an ops concern. They are also a security concern. Ransomware encrypts your data and demands payment. A disgruntled ex-contributor deletes your database. A breach happens and you need to restore to a known-clean state. In all of these cases, backups are the only thing that lets you recover without starting over.

A backup strategy needs three things to be useful:

Offsite storage. A backup on the same server it is protecting is not a backup. If the server is compromised or destroyed, you lose both. Store backups on a separate machine, ideally under a different provider.

Automation. A backup you have to remember to run is a backup that will not exist when you need it. Use a systemd timer or cron job to run backups daily. Log the output somewhere you can check.

Tested restoration. A backup you have never restored from is a backup you cannot trust. Periodically restore a backup to a test environment and verify the data is intact and the app runs. Do this before you need it, not during an incident.

# Example: back up a SQLite database daily via systemd timer
# /etc/systemd/system/myapp-backup.service
[Service]
Type=oneshot
ExecStart=/bin/sh -c 'cp /var/www/myapp/data/app.db /backups/app-$(date +%%Y%%m%%d).db'

Two retained copies is a minimum. If you can afford more, keep more.

If you want a ready-made starting point, backup.sh from sysadmin-scripts handles SQLite hot backups, directory archives, and individual files like .env. It packages everything into a timestamped .tar.gz, uploads to a remote server via SCP, and rotates old copies both locally and remotely. Configure the paths at the top of the file, add a cron entry, and it runs itself.

What This Does Not Cover

These nine layers cover the common ground for a self-managed server. They are not exhaustive.

Application-level vulnerabilities like SQL injection, XSS, and broken access control live in your code. No firewall rule protects you from a login endpoint that concatenates user input directly into a query. Use prepared statements, validate input on the server, and hash passwords with bcrypt or argon2.

Dependency vulnerabilities need active monitoring. Go has govulncheck, npm has npm audit. Run them before deploying, not after something breaks. Tools like Dependabot can automate this in CI.

Secrets hygiene is on you. Environment files should be chmod 600, never committed to git, and not world-readable. A world-readable secrets file on a server is a plaintext credential dump waiting to be found.

Mandatory access control goes deeper than systemd sandboxing. AppArmor (enabled by default on Ubuntu) and SELinux define system-wide policies restricting what files and syscalls any process can access. Worth learning if you run services that handle sensitive data.

Kernel hardening via sysctl parameters tightens the network stack itself, things like SYN flood protection and source address verification. Reasonable defaults exist but they are not always on out of the box.

Two-factor authentication for SSH adds a TOTP layer on top of key-based auth. If a private key is ever stolen, 2FA is the last line. Look into libpam-google-authenticator or similar.

Log monitoring and alerting means being notified when something unusual happens, not just reacting after the fact. fail2ban reacts to patterns but does not alert you. A spike in 403s, repeated probing of sensitive paths, or a login at 3am from an unknown country are all worth knowing about in real time.

Going Deeper: WordPress

If you run WordPress, the attack surface is wider. WordPress powers a large share of the web, which makes it a high-value target. Automated scanners specifically probe for outdated plugins, exposed wp-admin, and known vulnerabilities in popular themes.

The same principles apply: SSH hardening, firewall, unprivileged process user, security headers. But the WordPress-specific layer on top of that is a different topic. This course covers it in depth, including WAF configuration: WordPress Security.

Is This Right for You?

This setup makes sense if you are self-hosting on a VPS and want to understand what you are actually running. It requires no paid tools and no external services. Just sshd, iptables, systemd, and nginx doing their jobs.

It is not a complete answer for an app handling sensitive user data at scale. For that you would want a proper secrets manager, structured audit logging, network-level intrusion detection, and a security review before launch.

For a personal site or small project on a VPS you own: this is the setup I run. 23,000 failed login attempts later, nothing has gotten through. The bots are still out there. They just keep hitting a wall.

Systemd Unit Files for Web Apps

Sun, 08 Mar 2026 00:00:00 GMT

The systemd documentation is thorough. It covers every directive, every option, every edge case. What it doesn't show you is which 10% of that actually matters when you're running a Go API and a Node.js frontend on a VPS.

This post covers that 10%: the unit file options you'll use, the gotchas that will cost you an afternoon, and why some directives that look optional will quietly break your app if you skip them.

The Setup

If you're not familiar with the overall deployment model, Deploying to a VPS Without Docker or CI/CD covers the full picture: two environments running side by side, nginx as the front door, and a git pull to deploy. That post treats systemd as a supporting character. This one puts it center stage.

The assumption here: you have a non-root deploy user that runs your services, and your apps live somewhere under /var/www/.

What systemd Is Actually Doing

systemd is your process manager. When you run systemctl start myapp, systemd reads the unit file, sets up the environment, starts the process, and watches it. That's the whole job.

The restart loop is what makes it worth using over just running your binary directly. If your app crashes at 3am, systemd restarts it. If the VPS reboots, systemd starts it. You don't have to be there.

A Unit File, Line by Line

[Unit]
Description=My App Backend (Staging)
After=network.target

[Service]
Type=simple
User=deploy
WorkingDirectory=/var/www/myapp-staging/myapp/backend
EnvironmentFile=/var/www/myapp-staging/myapp/backend/.env.staging
ExecStart=/var/www/myapp-staging/myapp/backend/myapp-backend
Restart=always
RestartSec=3

NoNewPrivileges=yes
PrivateTmp=yes
ProtectHome=yes
ProtectSystem=strict
ReadWritePaths=/var/www/myapp-staging/myapp

[Install]
WantedBy=multi-user.target

The non-obvious ones:

After=network.target tells systemd to start your service after the network is up. Without it, your app might try to bind a port before the network stack is ready. It's an ordering hint, not a hard dependency, but you always want it.

Type=simple tells systemd your process doesn't fork. The process you start in ExecStart is the service. This is correct for almost every Go and Node.js app. Type=forking is for old-style daemons that fork into the background on startup. You probably don't have one of those. Type=notify is for apps that actively signal systemd when they're ready; most apps don't do this.

WorkingDirectory sets the current directory before starting your app. Your app resolves relative file paths from here. Leave this out and a path like ./data/app.db gets resolved relative to /, where it obviously doesn't exist.

EnvironmentFile loads environment variables from a file. One KEY=VALUE per line, same as a .env file. systemd reads this as root before dropping privileges to the deploy user, so you can own it as root with chmod 600 and the service still gets the variables. The running process itself can't read the file directly.

Restart=always restarts the service on any exit: crash, OOM kill, or a clean exit with code 0. on-failure would only restart on non-zero exits. For a long-running server that should never exit cleanly on its own, always is the right choice. A deliberate systemctl stop still stops it.

RestartSec=3 waits 3 seconds before restarting. Without this, a crashing app will restart in a tight loop and fill your journal with noise before you can investigate. Three seconds is enough breathing room.

The Hardening Directives

The lower half of the [Service] section (NoNewPrivileges, ProtectHome, ProtectSystem, ReadWritePaths) restricts what the service process can access on the filesystem. These are kernel namespace features, not virtualization. They have no runtime overhead.

NoNewPrivileges=yes prevents the process from gaining elevated privileges through setuid binaries. Turn this on for every web app.

PrivateTmp=yes gives the service its own isolated /tmp instead of the shared system /tmp. Another process can't snoop on your app's temp files, and your temp files don't accumulate in the system /tmp on crash.

ProtectHome=yes makes /home, /root, and /run/user invisible to the service. Your app binary cannot reach user home directories.

ProtectSystem=strict makes the entire filesystem read-only for the service, except for /dev, /proc, and /sys. Used together with ReadWritePaths, this gives your service exactly the write access it needs and nothing else.

ReadWritePaths carves out an exception to ProtectSystem=strict. List every directory your app needs to write to. If you have a data directory and a separate log directory, add both.

The ProtectHome Gotcha

ProtectHome=yes will silently break any subprocess your app spawns if that subprocess tries to write to a home directory.

The common case: your backend triggers a build step as a subprocess. The build tool tries to write to ~/.config or ~/.cache. With ProtectHome=yes, that path is invisible to the process. The subprocess fails with EACCES or a confusing missing-directory error that doesn't obviously point to systemd.

The fix is not to remove ProtectHome=yes. The fix is to redirect those cache paths to somewhere your service can write:

Environment=HOME=/var/www/myapp-staging
Environment=npm_config_cache=/var/www/myapp-staging/.npm-cache
Environment=XDG_CONFIG_HOME=/var/www/myapp-staging/.config

The better fix is to not run build tools from inside a running service at all. Build steps belong in your deploy script. The service should start a pre-built artifact, not build one.

The Deploy Cycle: Stop, Build, Start

You can't overwrite a running executable on Linux. If you try to go build while the binary is running, you get text file busy. The sequence is:

systemctl stop myapp-backend-staging
/usr/local/go/bin/go build -o myapp-backend ./cmd/server/
systemctl start myapp-backend-staging

The downtime is under a second. For a personal project, that's fine. If you need zero-downtime deploys, you'd pre-build the binary to a temp path and swap it atomically. That's a different problem.

For the Node.js frontend, you don't stop the service before building. The build writes to dist/, not to the running process. Build first, then restart to pick up the new files:

sudo -u deploy npm run build
systemctl restart myapp-astro-staging

Reading Logs

systemd captures stdout and stderr from your service automatically. Write to stdout in your app and it shows up in the journal.

# Follow in real time
journalctl -u myapp-backend-staging -f

# Last 100 lines, no pager
journalctl -u myapp-backend-staging -n 100 --no-pager

# Since last boot
journalctl -u myapp-backend-staging -b

# With full timestamps
journalctl -u myapp-backend-staging --output=short-iso

If your app is failing to start, journalctl -u myapp -n 50 --no-pager immediately after systemctl start will show you why. Don't reach for systemctl status first. The status output truncates the error message.

Common Gotchas

| Problem | Cause | Fix | |---|---|---| | Subprocess fails with EACCES | ProtectHome=yes blocks ~/.config access | Redirect cache dirs via Environment=, or don't spawn build tools from the service | | App can't write to its data directory | ProtectSystem=strict without a matching ReadWritePaths | Add the directory to ReadWritePaths | | App can't find a relative file path | WorkingDirectory not set or wrong | Set WorkingDirectory to the directory your app expects | | EnvironmentFile not loaded | File doesn't exist at that path | Check the path and that the file exists before starting the service | | text file busy on deploy | Binary still running when you try to overwrite it | systemctl stop before rebuilding | | Service restarts immediately after stopping | Restart=always with no start-limit configured | Use Restart=on-failure or adjust StartLimitBurst |

Is This Right for You?

This approach works well if:

You're running a small number of long-running processes on a VPS
You want automatic restarts and structured logs without adding a process manager tool
You can tolerate a second of downtime during Go binary deploys

It's worth knowing the limits:

Zero-downtime deploys require pre-building to a temp path and swapping atomically. The stop-build-start pattern has a gap. For a personal project that's acceptable; for something that needs continuous availability it isn't.

Many services means many unit files. systemd's tooling is all CLI. If you have more than a dozen services, a tool like Coolify might be worth the tradeoff.

Complex startup ordering: if your app needs a database to be healthy before it starts, After=network.target isn't enough. systemd has a full dependency system for this, but it's more involved than what's covered here.

For the common case of two or three processes on a single VPS, this is all you need. The unit files above run exactly as written in production. No surprises.

Cloudflare Turnstile Console Errors Are Not Your Fault

Fri, 06 Mar 2026 17:00:00 GMT

You add Cloudflare Turnstile to your site. Login works. Form submissions go through. Everything functions correctly. Then you open devtools and see a wall of red errors and yellow warnings coming from challenges.cloudflare.com.

Your first instinct is that you misconfigured something. You didn't.

Every single one of these errors comes from inside Cloudflare's own code, running inside iframes that Cloudflare creates. None of them are yours to fix.

The three errors you're seeing

1. The sandboxed iframe error

Blocked script execution in 'about:blank' because the document's frame
is sandboxed and the 'allow-scripts' permission is not set.

Note that 'script-src' was not explicitly set, so 'default-src' is
used as a fallback.

This one looks the most alarming. It mentions script blocking, which sounds like a CSP misconfiguration on your end.

It isn't. Turnstile creates an intermediate about:blank iframe with the sandbox attribute set — intentionally, as a security isolation mechanism. Then Turnstile's own code tries to run scripts inside that sandboxed frame. The browser blocks it, logs the error, and Turnstile handles the fallback internally. Your CSP has no control over the sandbox attribute that Cloudflare's JavaScript sets on its own iframes.

2. The 401 Unauthorized in the network tab

GET https://challenges.cloudflare.com/cdn-cgi/challenge-platform/h/g/pat/...
Status Code: 401 Unauthorized

Turnstile attempts a Private Access Token (PAT) challenge — a protocol where the browser asks Apple or Google's attestation servers to vouch for it. Most browsers either don't support PAT or don't have a valid token at that moment. The 401 just means "no token available." Turnstile registers that, falls back to its standard challenge flow, and continues normally.

3. The preload warning

The resource https://challenges.cloudflare.com/cdn-cgi/challenge-platform/h/g/cmg/1
was preloaded using link preload but not used within a few seconds from
the window's load event.

Turnstile speculatively preloads some resources it might need. In many cases it ends up not needing them within the browser's expected timeframe. The browser warns you, Turnstile doesn't care.

How Turnstile actually works internally

Turnstile doesn't run directly in your page's context. It creates a chain of iframes:

Your page
  └── Turnstile outer iframe (challenges.cloudflare.com)
        └── about:blank sandboxed iframe  ← errors originate here
              └── Turnstile challenge logic

The sandboxing is intentional — it isolates the challenge from your page and prevents your JavaScript from inspecting or tampering with it. The errors are a side effect of that isolation leaking into your browser's console.

How to verify it's not you

Open devtools on any other site using Turnstile. You'll see the exact same errors, regardless of how that site configured its CSP. The errors aren't tied to your configuration — they're tied to Turnstile's internal implementation.

If your own CSP was wrong, you'd see different symptoms: the Turnstile widget wouldn't render at all, or form submissions would fail silently.

When you should actually worry

The only signal that matters is whether Turnstile is working. If users can submit your forms and the widget renders, Turnstile is doing its job. The console noise is irrelevant.

You have a real problem if:

The Turnstile widget renders but form submissions always fail validation
The widget doesn't render at all (usually a missing or wrong site key)
Your backend reports all tokens as invalid (site key / secret key mismatch)

Console errors from challenges.cloudflare.com are not on that list.

Is this right for you?

If you're building a public form and want bot protection without rolling your own CAPTCHA, Turnstile is a solid choice. The integration is straightforward and the UX is far less annoying than reCAPTCHA.

Accept that the console will always have Cloudflare's noise in it. It doesn't reflect on your code quality or your CSP configuration. Some third-party tools are just loud.

What 2 GB of Logs on a Fresh VPS Actually Means

Tue, 03 Mar 2026 00:00:00 GMT

A few weeks after moving to a self-managed VPS, I noticed the system journal had grown to over 2 GB. The server had only been running about a month. Nothing about the apps was unusual: traffic was normal, no crashes, no deployments that week.

So I started digging.

Finding the source

journalctl --disk-usage confirmed the size. To find what was writing so aggressively, I pulled the last seven days of logs and counted entries per service:

journalctl --no-pager -q --since "7 days ago" -o json \
  | python3 -c "
import sys, json, collections
counts = collections.Counter()
for line in sys.stdin:
    try:
        d = json.loads(line)
        svc = d.get('_SYSTEMD_UNIT') or d.get('SYSLOG_IDENTIFIER') or 'unknown'
        counts[svc] += 1
    except: pass
for svc, n in counts.most_common(10):
    print(f'{n:>8}  {svc}')
"

The result:

   73978  ssh.service
    9308  cron.service
    4752  app-backend.service
    1740  app-production.service
     606  kernel

SSH had nearly 74,000 log entries in seven days. Everything else combined didn't come close.

What's actually in there

journalctl -u ssh.service --no-pager -n 10

Mar 03 04:07:02 vps sshd[63622]: Invalid user xiedr from 45.148.10.118 port 43446
Mar 03 04:07:02 vps sshd[63622]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=45.148.10.118
Mar 03 04:07:05 vps sshd[63622]: Failed password for invalid user xiedr from 45.148.10.118 port 43446 ssh2
Mar 03 04:09:15 vps sshd[63704]: Failed password for root from 181.23.107.93 port 38173 ssh2
Mar 03 04:11:41 vps sshd[63950]: Invalid user a from 134.122.46.171 port 53312
Mar 03 04:11:42 vps sshd[63951]: Invalid user a from 134.122.46.171 port 53318
Mar 03 04:11:42 vps sshd[63952]: Invalid user a from 134.122.46.171 port 53322

SSH brute-force attempts. About 10,000 a day, every day, since the server went live.

My first reaction was concern. But before doing anything, I wanted to understand what I was actually looking at.

Why this actually matters

10,000 attempts a day sounds containable until you do the math. At that rate, with no journal size limit set, you're looking at several gigabytes a month and it compounds as long as the server is up. I've had this happen before on a different server: a log file that started as noise quietly grew to 60 GB over a few months. There was no warning. The disk just filled up.

When a Linux disk hits 100%, it doesn't degrade gracefully. nginx stops writing access logs and starts returning errors. Databases that need to write to disk, whether that's a WAL file, a lock, or a temp file, start failing. Applications throw write errors that look like bugs until you realize the real cause. Depending on what's running, recovery can mean emergency cleanup under pressure while services are down.

The second problem is signal loss. Your journal is also where real security events show up: failed sudo attempts, service crashes, actual intrusion attempts with valid usernames. When it's buried under 70,000 SSH noise entries a week, you lose the ability to notice anything real. The logs become a liability instead of a tool.

None of this is the fault of the bots. They're doing what bots do. The failure mode is leaving the journal unconfigured and assuming it self-manages.

Bots, not people

There are a few ways to tell automated scanning from a targeted attack.

The usernames give it away. Pull the top targets:

journalctl -u ssh.service --no-pager --since "7 days ago" -q \
  | grep -oP '(Invalid user|Failed password for) \K\S+' \
  | sort | uniq -c | sort -rn | head -20

9114  invalid
7534  root
 160  admin
  98  hik
  55  oracle
  53  test
  51  ubuntu
  42  git
  40  postgres
  36  dell
  28  deploy
  27  ansible
  21  tomcat

This is a wordlist. hik is the default user on Hikvision cameras. dell is on some Dell iDRAC systems. oracle, postgres, tomcat are server software default accounts. Nobody targeting my server specifically would try hik or orangepi. They're running the same list against every IP they can reach.

The timing is mechanical. When I looked at one of the most persistent IPs:

journalctl -u ssh.service --no-pager --since "7 days ago" -q \
  | grep "80.94.92.65"

Feb 24 04:29:44 vps sshd[31690]: Invalid user equipment from 80.94.92.65 port 59946
Feb 24 04:43:04 vps sshd[31933]: Failed password for sshd from 80.94.92.65 port 41176 ssh2
Feb 24 04:57:25 vps sshd[32182]: Invalid user zhangdong from 80.94.92.65 port 59176
Feb 24 05:10:45 vps sshd[32441]: Invalid user thum from 80.94.92.65 port 36168
Feb 24 05:24:59 vps sshd[32619]: Invalid user huan from 80.94.92.65 port 37594
Feb 24 05:38:35 vps sshd[32743]: Invalid user shengziqi from 80.94.92.65 port 51948

Every 13 to 14 minutes. Exactly. That's deliberate throttling to stay under rate-limit windows. Not a human typing.

Coordinated subnets. The top attackers included 80.94.92.65, .69, .70, .64: four IPs from the same /24, hitting simultaneously. That's a botnet or a rented VPS farm running a distributed scan across the entire IPv4 space.

Your server is not the target. It's just an address that exists.

Why anyone bothers

IPv4 has about 4 billion addresses. Tools like Masscan can sweep the entire space in under an hour. Running these scans costs almost nothing, and the economics work even at a very low hit rate.

A server with default credentials gets compromised in seconds and immediately put to work: spam relays, crypto mining, DDoS botnet nodes, proxy services. Operators sell access to these networks or run them directly. The bots don't care what your server does or who you are. They're looking for the small percentage of newly spun-up machines where someone left root/password as the login, or where a cloud provider silently re-enabled password auth on provisioning.

This is why the noise starts within minutes of a server going live. Scanners watch for new IPs appearing in BGP routes and routing tables. By the time you finish setting up nginx, you're already being scanned.

Am I already compromised?

This is the right question to ask before anything else. Check successful logins:

journalctl -u ssh.service --no-pager --since "30 days ago" -q \
  | grep "Accepted"

What you want to see is every successful login using the same key fingerprint, from IPs you recognise. What I saw was exactly that: my ED25519 key, from my ISP's dynamic IP range and a cloud provider I use. Nothing else.

If you see an Accepted publickey entry from an IP you don't recognise, that's worth investigating immediately. SSH brute-force noise by itself is not evidence of a breach. It's background radiation on any public IP.

The part that was actually worrying

While investigating the SSH config, I found this in /etc/ssh/sshd_config.d/:

50-cloud-init.conf       →  PasswordAuthentication yes
60-cloudimg-settings.conf  →  PasswordAuthentication no

Cloud-init, the provisioning tool most VPS providers use, had dropped a config file setting password authentication to yes. A later file was overriding it to no, so the effective setting was correct. But 50-cloud-init.conf was a loaded gun. If the second file ever got removed by a package update, password auth would silently re-enable. The brute-force bots hammering the server every few minutes would immediately start getting password prompts instead of rejections.

This is the actual risk. Not the scanning, which is just noise, but the possibility that a routine system update quietly removes one file and undoes your security config without any indication that anything changed.

The fix: edit 50-cloud-init.conf and change the value to no. Don't delete it because cloud-init may recreate it. Just make it say the right thing so both files agree.

If you're on a VPS with cloud-init, check yours:

grep -r "PasswordAuthentication" /etc/ssh/sshd_config.d/

Then confirm the effective config:

sshd -T | grep passwordauthentication

That last command is what matters: it shows what sshd actually resolved after processing all the drop-in files, not what any single file says.

If you want to audit your SSH config more systematically, I wrote ssh-audit.sh which checks for common weaknesses, flags misconfigurations, and runs a reputation check on your server's public IP.

SSH hardening worth adding

With password auth confirmed off, two more settings help.

MaxStartups 10:30:60: by default, sshd accepts up to 100 simultaneous unauthenticated connections before starting to drop them. Bots can hold open dozens of connections doing nothing, just occupying sshd threads. Setting this to 10:30:60 means sshd starts probabilistically dropping new connections once 10 are pending, and hard-drops all above 60.

ClientAliveInterval 300 with ClientAliveCountMax 2: sends a keepalive every 5 minutes and disconnects if the client doesn't respond after two attempts. This cleans up ghost sessions from dropped connections.

Put these in a new drop-in file to keep the change auditable:

# /etc/ssh/sshd_config.d/70-hardening.conf
MaxStartups 10:30:60
ClientAliveInterval 300
ClientAliveCountMax 2

Validate and reload:

sshd -t && systemctl reload ssh

Fixing the log bloat

fail2ban was already running and banning aggressively: 3 failures in 10 minutes gets an IP banned for 24 hours. After a month it had banned 1,541 IPs. That's working correctly. If you want a cleaner view of what it's actually doing, fail2ban-report.sh shows per-jail stats, top offending IPs, and recent bans in one output.

The logs themselves were still growing because the journal had no size limit configured. Two changes fix this permanently.

First, vacuum what's already there:

journalctl --vacuum-size=500M
journalctl --vacuum-time=30d

Then set a permanent cap in /etc/systemd/journald.conf so it never grows back:

[Journal]
SystemMaxUse=500M
MaxRetentionSec=30day

Restart journald to apply:

systemctl restart systemd-journald
journalctl --disk-usage

On my server that brought 2 GB down to 499 MB in a few seconds. With the cap set, the journal self-manages: oldest entries are dropped automatically when it hits the limit.

500M is generous for most personal servers. If you need long retention for debugging or compliance, raise it. Just set something so it doesn't grow unbounded.

Is this right for you?

If you run a public-facing server on any major cloud or VPS provider, you are getting scanned. There's no configuration that stops it. It's just the internet. The right response is to make sure your actual defenses are solid, not to try to make the noise stop.

Password auth off, key-only login, fail2ban active, journal capped: these are the floor, not a complete hardening guide. If your threat model goes beyond random bots, look at AllowUsers, IP allowlisting, and port knocking on top of this. For a personal server or small project, these steps get you to a state where the noise is contained and you can actually notice if something real happens.

The cloud-init config issue is worth checking regardless of anything else. It's easy to miss, it won't show up in any obvious error, and the consequence of missing it is that the wall you think is solid has a door in it.

You Don't Need a Message Queue

Sun, 01 Mar 2026 00:00:00 GMT

Every backend tutorial that mentions background jobs ends the same way: "and then you add a message queue." Redis. RabbitMQ. SQS. Choose one, wire it up, and now you have another service to deploy, monitor, and debug.

I built a newsletter system without any of that. The emails go out. Failed deliveries retry automatically. It's been running in production quietly, without me thinking about it. The entire thing is a database table and a background worker.

What a Message Queue Actually Does

Strip away the marketing and a message queue does three things: stores a task somewhere durable, delivers it to a worker, and handles retries when the worker fails. That's the whole job.

The complexity in RabbitMQ and Kafka comes from doing this across many services, at high throughput, with multiple consumers competing for work. That's a real problem for distributed systems processing millions of events per minute.

Most backends are not that. Most backends need to send a welcome email, process an uploaded file, or retry a failed webhook. Your database can do all three. You already have one.

The Pattern

Instead of pushing a task to a queue, write a row to a database table. A background worker periodically reads that table, processes what's pending, and updates the row.

User action
    ↓
Write row to jobs table  →  HTTP response (immediate)
    ↓
Background worker wakes up on a timer
    ↓
Reads pending rows → processes → updates status

No broker. No separate worker process. No infrastructure to configure.

How It Looks in Practice

When a newsletter goes out, each recipient gets a row in a jobs table with a status of pending. A worker started at server boot processes those rows and retries any that fail.

Here's the entire worker setup in Go:

func StartWorker(interval time.Duration) {
    ticker := time.NewTicker(interval)
    go func() {
        for range ticker.C {
            if err := processJobs(); err != nil {
                log.Printf("worker error: %v", err)
            }
        }
    }()
}

A goroutine, a ticker, one function call. It starts when the server starts and runs forever.

processJobs reads the jobs table, finds rows with status = failed and attempts < 3, checks if enough time has passed since the last attempt, and retries them. The backoff is a plain switch:

func retryDelay(attempts int) time.Duration {
    switch attempts {
    case 0:
        return 1 * time.Minute
    case 1:
        return 5 * time.Minute
    case 2:
        return 15 * time.Minute
    default:
        return 0
    }
}

Exponential backoff. Max 3 retries. The logic fits in one screen. No dependency to install.

The same pattern works for one-off tasks. When a user triggers something slow, the handler fires a goroutine and returns immediately:

go func() {
    result, err := doSlowThing()
    // update status in DB when done
}()

w.WriteHeader(http.StatusAccepted)

The HTTP response is instant. The slow work happens in the background. The client polls a status endpoint. No queue needed.

What You Get for Free

Using the database as the job store gives you things that message queues charge extra for.

Visibility is the obvious one. Want to see what's queued? Run a SELECT. No separate management UI, no extra CLI tool. The jobs are where the rest of your data is, queryable the same way.

Transactional writes are the less obvious but more important one. You can insert the job row in the same transaction as the record that triggered it. If the transaction rolls back, the job disappears too. With an external queue there's always a window where the DB commit succeeded but the enqueue failed, or the other way around. Both are bugs that only show up under load.

Durability comes for free too. Your database is already backed up. Your jobs are backed up with it. Any pending work, retry count, error message — it all comes back if the server dies.

Debugging gets simpler. When something fails, you check the row. The error is right there in the table. No hunting through queue consumer logs across services.

When This Breaks Down

High throughput is the first limit. If you're processing thousands of jobs per second, polling a table becomes a bottleneck. Queues use push delivery and are built for this. SQLite in particular has write concurrency limits that will hit you before Postgres does.

Multiple consumers is the second. This pattern assumes one worker pulling from the table. If you need to scale horizontally, multiple instances will race on the same rows. Postgres has SELECT ... FOR UPDATE SKIP LOCKED for this, but now you're managing that complexity yourself.

Cross-service is the third. If the producer and consumer are different services, a shared database is tight coupling. A message queue is the right abstraction there — that's what it was built for.

Near-instant pickup is the last one. Polling every N seconds means jobs wait up to N seconds to start. If you need sub-second job pickup, look at Postgres LISTEN/NOTIFY or just use a proper queue.

Is This Right for You?

This pattern works if you have one backend process, your job volume is in the hundreds to low thousands per day, and you want fewer moving parts to deploy and debug.

It doesn't work if you're building something that needs to scale horizontally, process jobs in real time, or communicate across service boundaries.

In my case, the worker wakes up on a regular interval, finds nothing to do most of the time, and goes back to sleep. The one time a batch of emails failed mid-send, it caught them on the next cycle without me doing anything.

That's the bar I was aiming for. Something that works quietly, without infrastructure I have to manage.

Ever wondered why Linux has two commands for the same task?

Tue, 06 Jan 2026 00:00:00 GMT

Creating a user seems simple, but choosing the wrong tool can leave you with a “broken” account (no home directory, no shell!).

The Core Difference

Think of it this way:

useradd: The raw, low-level tool. It’s a binary that does exactly what it’s told, nothing more.
adduser: The smart, high-level wrapper (Perl script). It uses useradd in the background but adds “common sense” automation.

The “useradd” Way (The Hard Way)

If you run sudo useradd john: No home directory created. No password set (account is locked). Default shell is often /bin/sh (very basic).

You have to manually add flags like -m for home or -s for shell. It’s built for scripts, not humans.

The “adduser” Way (The Easy Way)

If you run sudo adduser john: Automatically creates /home/john. Copies skeleton files (.bashrc, etc.). Prompts you for a password immediately. Asks for user details (Full name, room number).

It’s interactive and “just works.”

Practice

If you’re managing an ubuntu server, creating the user is just step 1. You’ll likely want them to have admin rights:

sudo adduser username
sudo usermod -aG sudo username

Now your new user can perform administrative tasks!

When to use which?

Use adduser if:

You are a beginner.
You are working on Debian/Ubuntu-based systems.
You want a ready-to-use account in 10 seconds.

Use useradd if:

You are writing bash scripts.
You are on a minimal distro (Arch, Alpine) where adduser might not be installed.

Pro-Tip: The “Skel” Directory

Both commands rely on /etc/skel. Anything you put in this folder will automatically appear in a new user’s home directory. Perfect for pre-configuring .vimrc or alias settings for your team!

Wait, why do some tutorials say they are the same?

In Debian/Ubuntu, they are different: adduser is a friendly script, useradd is the raw tool.
In RHEL/CentOS/Fedora, adduser is often just a symbolic link to useradd.

Know your distro before you type!

To summarize for my ubuntu server friends:

Use adduser for a fast, interactive, and “complete” setup.
Use useradd only if you’re writing automated scripts. Mastering these small nuances is what makes a great sysadmin!

Using Claude Code on a Self-Managed VPS: My Workflow

Sat, 28 Feb 2026 00:00:00 GMT

Most people run AI coding assistants on their laptop, pointed at a local project. That works fine when your app runs locally. But when your project lives on a VPS (a deploy user running services, nginx routing between staging and production, systemd managing processes), the AI has no idea what it's working with. It suggests Docker. It tries to run go without the full path. It creates files as root and wonders why the service crashes.

I run Claude Code directly on the server. Here's the setup that makes it actually useful.

Why Not Run It Locally?

The obvious alternative is to run Claude locally, write code, push to git, pull on the server, and rebuild. That works, and for frontend-heavy projects it's probably the right call.

But for backend work: API changes, database migrations, systemd service tweaks, nginx config updates. You're constantly switching context between your laptop and the server. Claude suggests a fix, you paste it, push it, pull it, rebuild, check the logs, paste the error back. It's friction. When Claude is running on the server itself, it can read the actual log output, check running services, and build the binary right there. The feedback loop is tighter.

There's also a context problem. Your laptop doesn't know that Go lives at /usr/local/go/bin/go instead of just go. It doesn't know that services run as a deploy user, not root. It doesn't know which ports are in use or how nginx is configured. Without that context, every session starts with Claude making wrong assumptions that you have to correct.

The fix is CLAUDE.md.

CLAUDE.md: The File That Changes Everything

CLAUDE.md is a file you put in the root of your project. Claude Code reads it automatically at the start of every session, before you type anything. It's not documentation for humans. It's instructions for Claude.

Mine looks like this (simplified):

## Stack
- Backend: Go, port 8081 (staging) / 8082 (production)
- Frontend: Astro SSR, port 4321 (staging) / 4322 (production)
- Database: SQLite at backend/data/cms.db

## This Server
- Go binary: /usr/local/go/bin/go — NOT in PATH, always use full path
- Services run as the deploy user. Claude Code runs as root.
- Files created as root must be chowned to deploy where services write to them.
- ProtectHome=yes is set in systemd — subprocesses cannot access /home

## Environments
- Staging: /var/www/myproject-staging/ → staging.myproject.com
- Production: /var/www/myproject-production/ → myproject.com
- Always work on staging first. Never edit production directly.

## Deploy Pattern
1. Edit on staging
2. Build: /usr/local/go/bin/go build -o bin/server ./cmd/server/
3. Restart: systemctl restart myproject-backend-staging
4. Test on staging domain
5. Promote: merge staging → main → rebuild production

That's it. Claude now knows the exact binary path, the user model, the port layout, and the deployment pattern. It stops suggesting go build and starts suggesting /usr/local/go/bin/go build. It stops creating files owned by root in directories the service writes to. It knows to test on staging before touching production.

The first session with a good CLAUDE.md feels noticeably different from one without it. You stop spending the first ten minutes correcting wrong assumptions.

Memory Across Sessions

CLAUDE.md captures stable facts: the stack, the ports, the conventions. But Claude also learns things during a session that aren't in CLAUDE.md: a bug it fixed and why, a pattern that's specific to this codebase, a decision you made and the reasoning behind it.

By default, that knowledge is gone when the session ends.

Claude Code has a memory system: a MEMORY.md file it reads at the start of every session and updates as it learns. Out of the box, this file lives in a hidden directory on the server. If the server dies, it's gone.

My fix: store the memory files in a separate git repository (I use one for VPS infrastructure and shared scripts) and symlink Claude's memory directory to a folder inside it. The memory is now version-controlled and backed up daily alongside the databases and env files.

# Move memory into your ops repo
mv ~/.claude/projects/-var-www-myproject-staging/memory \
   /var/www/ops-repo/claude-memory

# Symlink back so Claude still finds it
ln -s /var/www/ops-repo/claude-memory \
      ~/.claude/projects/-var-www-myproject-staging/memory

The project name in that path (-var-www-myproject-staging) is just the working directory with slashes replaced by dashes. Claude Code creates it automatically based on where you run it.

When I start a session now, Claude already knows things like: the deploy user issue that causes silent 500 errors when root creates files in directories the service writes to, the SQLite migration pattern we use, which blog posts are published and what they're about. It picks up where the last session left off.

After any session where something notable was figured out, I commit the memory files:

cd /var/www/ops-repo
git add claude-memory/
git commit -m "chore: update Claude memory"
git push

The Actual Workflow

A normal development session looks like this:

# SSH into the server
ssh root@myserver.com

# Start Claude in the project directory
cd /var/www/myproject-staging
claude

Claude reads CLAUDE.md and MEMORY.md. No re-explaining the project.

I describe what I want to build or fix. Claude reads the relevant files, proposes a change, and writes it. Then:

# Rebuild backend (if Go files changed)
systemctl stop myproject-backend-staging
/usr/local/go/bin/go build -o bin/server ./cmd/server/
systemctl start myproject-backend-staging

# Or rebuild frontend (if Astro files changed)
sudo -u deploy npm run build
systemctl restart myproject-astro-staging

I open the staging domain in the browser, check that it works, and either iterate or commit.

The staging to production promotion is explicit and manual, same as without AI:

# Sync staging → production
cd /var/www/myproject-production
git fetch origin
git merge origin/staging
/usr/local/go/bin/go build -o bin/server ./cmd/server/
systemctl restart myproject-backend-production

Claude doesn't touch production. I do that step myself, deliberately.

The Gotcha That Will Get You

The most common issue when running Claude on a VPS as root: Claude creates a file, the service crashes, logs say permission denied, and it's not obvious why.

The cause is the root/deploy user split. Claude runs as root. Your services run as a deploy user. When Claude creates or edits a file, that file is owned by root. The deploy service can't write to it.

This matters for:

Directories the service writes to (database files, uploaded content, cache)
Files the service reads at runtime that it might also need to write

The fix is consistent:

chown -R deploy:deploy /path/to/dir

And to prevent it from recurring every time root touches those directories, set a default ACL:

setfacl -d -m u:deploy:rwX /path/to/dir

Now any file created in that directory, by root, by Claude, by anyone, automatically gets deploy write access.

I document this in CLAUDE.md so Claude knows about it. When it creates a file in a service-writable directory, it adds the chown step. Most of the time. When it forgets, the error is quick to diagnose.

What I Tell Claude Before It Writes Any Code

For a new feature, I don't say "build me a comment system". I say:

I want to add a comment system. Before writing any code:
1. Propose the database schema
2. Propose the API endpoints
3. List any questions or assumptions

Do NOT write any code yet.

Reviewing a plan before code exists is much faster than reviewing code that made wrong assumptions. Once I'm happy with the plan, I say "looks good, proceed with the database migration first."

One feature at a time. Review between each one. This produces better code and catches wrong directions early.

Is This Right for You?

This setup makes sense if:

Your project runs on a VPS you control directly
Most of your work is backend: API changes, database schema, server configuration
You're working solo or with a small team where one person manages the server
You want a tight feedback loop without pushing and pulling for every test

It probably doesn't make sense if:

Your project is frontend-heavy and runs fine locally
You have multiple people making server changes simultaneously
You're not comfortable with an AI assistant that has root access to your server

On the root access point: Claude Code asks for confirmation before destructive operations. You see every command before it runs. But it is root access, and the risk is real. The practical risk on a personal project is low. Claude Code is conservative by default. On a production server handling real users, I'd think more carefully before running it there directly.

For my personal site, the workflow is working well. A CLAUDE.md that actually reflects the server setup, memory that persists across sessions, and the discipline to always test on staging first. That combination makes the AI genuinely useful instead of a context-reset every session.

The "Magic Numbers" of Software: SemVer Explained

Mon, 02 Feb 2026 00:00:00 GMT

Here is the breakdown of what those three numbers actually mean. 👇

The Breakdown: X . Y . Z

Think of a version number like a scale of "How much will this change my life?"

MAJOR (X): The "Breaking" Change.
MINOR (Y): The "New Feature" Change.
PATCH (Z): The "Oops, Fixed It" Change.

## PATCH (0.0.1)

The "Under the Hood" fix.

What it is: Bug fixes that don’t change how the software works.
The Vibe: Everything stays the same, it just works better now.
Action: Safe to update immediately.

MINOR (0.1.0)

The "Bonus Content" update.

What it is: New features added, but the old stuff still works exactly the same way (Backward Compatible).
The Vibe: "Oh cool, a new dark mode button!"
Action: Update when you want the new toys.

MAJOR (1.0.0) ⚠️

The "Clean Slate" update.

What it is: Big architectural shifts. Old code might break if you try to use it with this version.
The Vibe: "We moved the furniture and changed the locks."
Action: Read the manual before hitting 'Update.'

Why does this matter?

Without SemVer, updating software is like Russian Roulette. With it, developers know exactly what to expect before they click "install."

Consistency = Trust. 🤝

Why Your og:image Doesn't Show in Social Shares

Thu, 26 Feb 2026 00:00:00 GMT

When I shared one of my posts on socials, the link preview was blank. No image, just the title and a gray box. I'd set a featured image in my CMS dashboard — it was clearly there — but social crawlers were ignoring it completely.

Turns out there were two separate bugs. They're easy to miss because the site looks perfectly fine in a browser.

How Social Share Previews Work

When you paste a URL into Twitter/X, iMessage, LinkedIn, or Slack, the platform's crawler fetches that URL and reads the Open Graph meta tags in the <head>:

<meta property="og:image" content="https://example.com/image.webp" />
<meta property="twitter:image" content="https://example.com/image.webp" />

The crawler then fetches the image at that URL and renders the preview card. Two things can silently break this:

The URL isn't a real HTTP URL — it's a data: URI (base64-encoded image embedded directly in the HTML)
The URL is technically a URL, but it points to localhost — unreachable from the outside

Both give you the same result: no image in the preview. The crawler quietly fails and moves on.

How to Diagnose

Before guessing, check what your page is actually serving. View source on the live page (Ctrl+U) and search for og:image:

<!-- Bug 1: base64 data URI — crawlers can't fetch this -->
<meta property="og:image" content="data:image/webp;base64,UklGRvpD..." />

<!-- Bug 2: localhost URL — unreachable from the internet -->
<meta property="og:image" content="http://localhost:4321/_astro/hero.C4SheoqF.webp" />

<!-- Correct -->
<meta property="og:image" content="https://yoursite.com/_astro/hero.C4SheoqF.webp" />

If you're seeing either of the first two, read on.

Bug 1: The Base64 Image

This shows up when your CMS stores the hero image as a base64 data URI directly in the markdown frontmatter:

---
title: My Post
heroImage: data:image/webp;base64,UklGRvpDAABXRUJQVlA4...
---

This works fine in the browser — the image renders — but when Astro processes it into an og:image tag, the full base64 string ends up as the content attribute. Social crawlers treat og:image as a URL to fetch. They won't decode an embedded binary blob.

The Fix: Save Images as Real Files

Extract the base64 data URI and write it to a real file on disk. In a Go backend, SavePost() is the right place to intercept it:

func SavePost(dir string, post models.Post) error {
    // ... setup ...

    // If heroImage is a base64 data URI, save it as a file instead
    heroImage := post.HeroImage
    if strings.HasPrefix(post.HeroImage, "data:") {
        if path, err := saveHeroImageToDisk(postDir, post.HeroImage); err != nil {
            fmt.Printf("Warning: could not save hero image for %s: %v\n", post.Slug, err)
        } else {
            heroImage = path
        }
    }

    fm := frontmatterData{
        // ...
        HeroImage: heroImage, // now "./hero.webp" instead of "data:..."
    }
}

The saveHeroImageToDisk function parses the MIME type from the data URI, decodes the base64, and writes hero.webp (or .jpg, .png, etc.) into the post directory. The frontmatter ends up with:

heroImage: ./hero.webp

Astro's content collection schema with image() picks up that relative path at build time, optimizes it, and outputs a proper /_astro/hero.{hash}.webp URL. That's a real, crawlable HTTPS URL.

One more thing: your admin editor was probably uploading the image as base64 and expecting base64 back from the API. Now that the backend stores a file path, the GET /api/admin/posts/{slug} endpoint needs to read the file and return it as a data URI for the editor to display:

func (s *Server) GetPostAdminHandler(w http.ResponseWriter, r *http.Request) {
    // ...
    post, _ := fs.GetPost(s.ContentDir, slug)

    // Convert file path back to base64 for the editor
    if post.HeroImage != "" {
        post.HeroImage, _ = fs.ReadHeroImageAsDataURI(s.ContentDir, slug, post.HeroImage)
    }

    respondJSON(w, http.StatusOK, post)
}

The storage format (file on disk) is now separate from the API contract (base64 for the editor). Public readers get an optimized /_astro/ URL; the dashboard editor still sees the image it uploaded.

Bug 2: The Localhost URL

This one is specific to Astro in SSR mode (output: 'server'). In SSR, Astro.url returns the internal server URL — the one Node.js sees, not the public domain:

Astro.url → http://localhost:4321/blog/my-post

If you build your og:image URL from Astro.url, you're embedding localhost in every social share tag on every page:

---
// ❌ Wrong — Astro.url is localhost in SSR
const socialImageURL = new URL(ogImage, Astro.url).href
// result: http://localhost:4321/_astro/hero.C4SheoqF.webp
---

The Fix: Use Astro.site

Astro gives you Astro.site, which is the canonical public URL you configured in astro.config.mjs. Build your canonical URL from that instead:

---
// ✅ Correct — canonicalURL built from Astro.site
const canonicalURL = new URL(Astro.url.pathname, Astro.site)
const socialImageURL = new URL(ogImage, canonicalURL).href
// result: https://yoursite.com/_astro/hero.C4SheoqF.webp
---

The same issue applies to any other URL you construct in BaseHead.astro — canonical links, og:url, twitter:url. All of them should come from canonicalURL, not Astro.url directly:

---
const canonicalURL = new URL(Astro.url.pathname, Astro.site)
const socialImageURL = new URL(ogImage ?? config.socialCard, canonicalURL).href
---

<link rel='canonical' href={canonicalURL} />
<meta content={canonicalURL} property='og:url' />
<meta content={socialImageURL} property='og:image' />
<meta content={socialImageURL} property='twitter:image' />

Verifying the Fix

After rebuilding, view source on the live page and look for og:image:

<meta content="https://yoursite.com/_astro/hero.C4SheoqF.webp" property="og:image" />

If it's a real https:// URL pointing to your domain, you're done. You can also run it through social debugger tools — Twitter Card Validator, LinkedIn Post Inspector, or OpenGraph.xyz — though these cache aggressively and may show stale results for a while. Most have a "Scrape Again" button that forces a fresh fetch.

Gotchas

| Problem | Cause | Fix | |---|---|---| | Image shows in browser, not in social preview | base64 data URI in og:image | Save image as real file, store relative path | | og:image URL contains localhost | Astro.url used in SSR mode | Use new URL(Astro.url.pathname, Astro.site) | | Social debugger shows old/wrong image | Platform cache | Wait ~30 min, use "Scrape Again" | | Editor shows broken image after backend change | Admin endpoint returning file path, not data URI | Convert back to base64 in admin handler |

Both Bugs at Once

Worth noting: both bugs can exist simultaneously and compound each other. A data: URI embedded in a tag that's also been constructed from localhost is doubly broken. Fix the localhost URL first, then check whether the image content itself is valid — the failure mode for the second bug is harder to see until the URL is actually well-formed.

In my case I had both at the same time. The page looked completely fine in the browser on production. The only symptom was a blank card when sharing a link — easy to ignore if you're not actively testing it.

Deploying to a VPS Without Docker or CI/CD

Sun, 22 Feb 2026 00:00:00 GMT

When I moved this site from a managed static host to a bare VPS, the first thing people asked was: "Why not Docker?" or "Why not just use Coolify?" Fair questions. Let me answer them before getting into the actual workflow.

Why Not X?

Why not Docker?

Docker shines when you have complex multi-service apps, a team that needs consistent environments across many machines, or you're shipping to Kubernetes. For a personal site running a Go API and a Node.js Astro frontend, it's overkill.

Docker adds:

Build time overhead (image layers, registry pushes)
Runtime overhead (container daemon, networking abstraction)
Operational complexity (container orchestration, volume management)
A new failure domain to debug when something goes wrong

When your app is two processes and a static site, just run the processes. systemd is your process manager. It starts services on boot, restarts on crash, and gives you journalctl for logs. That's everything you need.

Why not a managed platform (Railway, Render, Fly.io)?

These platforms are genuinely good, and I'd recommend them for most projects. But I wanted to understand what's happening underneath — how nginx sits in front of your app, how systemd manages processes, how firewall rules protect internal ports from the internet. Managed platforms abstract all of that away. Also, at the hobby tier, they get expensive once you add a database and background workers.

Why not Coolify or Dokku?

Coolify and Dokku are excellent self-hosted tools that give you a Heroku-like experience on your own VPS. But they run Docker under the hood, so you're still adding that complexity. And they introduce an abstraction layer you have to learn and trust. For a site I'll maintain solo indefinitely, I'd rather know exactly what's running than have a tool I don't fully understand managing it.

Why not CI/CD?

I use GitLab for source control, but I don't use GitLab CI/CD for deployment. A few reasons:

Free compute minutes are limited. GitLab's free tier gives you 400 CI/CD compute minutes per month. For a personal project where you might deploy dozens of times during active development, that evaporates fast. Paying for compute just to run git pull && go build on a server you already pay for doesn't make sense.

A self-hosted GitLab Runner is the right long-term solution — the runner runs on your own VPS, uses your own compute, and has no minute limits. I plan to set that up eventually. But the manual workflow I'm using now works fine, and I actually prefer having an explicit promotion gate.

Manual deploys are fine for solo projects. CI/CD adds real value when multiple people are merging code and you need automated testing before every deploy. For a personal site with one contributor, a deliberate "I'm pushing this to production now" moment has its own value. You know exactly what you're deploying and when.

The Mental Model

[Local machine]  →  git push  →  [GitLab]  →  git pull  →  [VPS]
   write code          transport              runs code

Your local machine is where you write. GitLab is the transport layer. The VPS is where things run. You never SCP files directly — git is always in the middle.

The VPS runs two environments side by side:

/var/www/
├── myproject-staging/      ← test changes here first
│   ├── backend/            ← Go API (localhost-only port)
│   └── frontend/           ← Astro SSR (localhost-only port)
└── myproject-production/   ← promote when staging looks good
    ├── backend/
    └── frontend/

nginx routes by domain name, so both environments run simultaneously without interfering with each other.

Prerequisites

On the VPS

A non-root deploy user that runs your services
nginx
Go (installed system-wide — often not in PATH, use the full binary path)
Node.js + npm

GitLab deploy key

The VPS needs to git pull from your private repo. Create a key specifically for this:

# On the VPS as root
ssh-keygen -t ed25519 -f /root/.ssh/id_ed25519_gitlab -N "" -C "vps-deploy"
cat /root/.ssh/id_ed25519_gitlab.pub  # copy this

Add the public key to GitLab: Project → Settings → Repository → Deploy Keys → Add key (read-only is fine).

Then add to /root/.ssh/config on the VPS:

Host gitlab.com
    IdentityFile /root/.ssh/id_ed25519_gitlab
    IdentitiesOnly yes

Test it:

ssh -T git@gitlab.com
# Welcome to GitLab, @yourusername!

First-Time Server Setup

Clone and set ownership

mkdir -p /var/www/myproject-staging
git clone git@gitlab.com:yourusername/myproject.git /var/www/myproject-staging/myproject
chown -R deploy:deploy /var/www/myproject-staging/

If you later get fatal: detected dubious ownership when running git as root:

git config --global --add safe.directory /var/www/myproject-staging/myproject

Environment files

.env files are never committed to git. Create them manually on the server:

# Backend — owned by root (systemd reads it before dropping to deploy user)
nano /var/www/myproject-staging/myproject/backend/.env.staging
chmod 600 /var/www/myproject-staging/myproject/backend/.env.staging
chown root:root /var/www/myproject-staging/myproject/backend/.env.staging

# Frontend — owned by deploy (npm build runs as deploy)
nano /var/www/myproject-staging/myproject/frontend/.env.staging
chmod 600 /var/www/myproject-staging/myproject/frontend/.env.staging
chown deploy:deploy /var/www/myproject-staging/myproject/frontend/.env.staging

Content directory ownership

If your app writes files to disk (for example, blog posts as markdown files), that directory must be owned by deploy — the user the service runs as:

chown -R deploy:deploy /var/www/myproject-staging/myproject/frontend/src/content/

Missing this step causes confusing 500 errors when the API tries to create files. The error in the logs is permission denied on mkdir, not something that obviously points to ownership.

This same issue appears whenever you touch files as root. If you SSH in as root and edit a file, copy a file, or use any tool that runs as root (an AI coding assistant, a script, anything) — the resulting file is root:root. The deploy service silently fails to write it. The fix is always chown deploy:deploy <file>.

Building the App

Go backend

Build directly on the VPS. This avoids cross-compilation issues if your VPS is Linux x86_64 and your dev machine is Apple Silicon:

cd /var/www/myproject-staging/myproject/backend

# Stop the service first — you can't overwrite a running binary ("text file busy")
systemctl stop myproject-backend-staging

/usr/local/go/bin/go build -o myproject-backend ./cmd/server/main.go
chown deploy:deploy myproject-backend

systemctl start myproject-backend-staging

Node.js / Astro frontend

Always run npm as the deploy user, not root. Running as root creates files owned by root inside node_modules/ and dist/, which the service (running as deploy) can't later write or overwrite:

cd /var/www/myproject-staging/myproject/frontend
sudo -u deploy npm install
sudo -u deploy npm run build

systemd Services

systemd is your process manager. Two service files, one per process.

Backend service

/etc/systemd/system/myproject-backend-staging.service:

[Unit]
Description=Myproject Go Backend (Staging)
After=network.target

[Service]
Type=simple
User=deploy
WorkingDirectory=/var/www/myproject-staging/myproject/backend
EnvironmentFile=/var/www/myproject-staging/myproject/backend/.env.staging
ExecStart=/var/www/myproject-staging/myproject/backend/myproject-backend
Restart=always
RestartSec=3

# Hardening
NoNewPrivileges=yes
PrivateTmp=yes
ProtectHome=yes
ProtectSystem=strict
ReadWritePaths=/var/www/myproject-staging/myproject

[Install]
WantedBy=multi-user.target

Frontend service (SSR only)

Only needed if your frontend uses server-side rendering. For a static site, nginx serves the dist/ folder directly — no service needed at all.

/etc/systemd/system/myproject-astro-staging.service:

[Unit]
Description=Myproject Astro SSR (Staging)
After=network.target

[Service]
Type=simple
User=deploy
WorkingDirectory=/var/www/myproject-staging/myproject/frontend
ExecStart=/usr/bin/node dist/server/entry.mjs
Restart=always
RestartSec=3
Environment=HOST=127.0.0.1
Environment=PORT=3000
Environment=NODE_ENV=production

# Hardening
NoNewPrivileges=yes
PrivateTmp=yes
ProtectHome=yes
ProtectSystem=strict
ReadWritePaths=/var/www/myproject-staging/myproject/frontend

[Install]
WantedBy=multi-user.target

Enable and start:

systemctl daemon-reload
systemctl enable myproject-backend-staging myproject-astro-staging
systemctl start  myproject-backend-staging myproject-astro-staging

# Verify
systemctl status myproject-backend-staging
journalctl -u myproject-backend-staging -f

nginx Reverse Proxy

nginx listens on port 80/443 and proxies traffic to your internal app ports. The apps bind to 127.0.0.1 — they're never directly reachable from the internet.

server {
    listen 80;
    server_name staging.myproject.com;

    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_cache_bypass $http_upgrade;
    }
}

Repeat for your API subdomain, pointing to the backend's port.

nginx -t && systemctl reload nginx

Firewall: Lock Down App Ports

Your app ports should only be reachable from localhost (via nginx). Block everything else:

# IPv4 — allow localhost, drop everything else
iptables -I INPUT -p tcp --dport 3000 ! -s 127.0.0.1 -j DROP
iptables -I INPUT -p tcp --dport 8080 ! -s 127.0.0.1 -j DROP

# IPv6 — blanket drop
ip6tables -I INPUT -p tcp --dport 3000 -j DROP
ip6tables -I INPUT -p tcp --dport 8080 -j DROP

# Persist across reboots
netfilter-persistent save

One subtlety: use ! -s 127.0.0.1 -j DROP rather than a plain -j DROP. The localhost exemption means SSH tunnels (ssh -L 8888:127.0.0.1:8888) still work for local debugging. A blanket DROP silently breaks them.

The Daily Deploy Loop

# 1. Push from your local machine
git add . && git commit -m "fix: something" && git push

# 2. On the VPS — pull latest
cd /var/www/myproject-staging/myproject && git pull

# 3a. If backend changed
systemctl stop myproject-backend-staging
/usr/local/go/bin/go build -o myproject-backend ./cmd/server/main.go
chown deploy:deploy myproject-backend
systemctl start myproject-backend-staging

# 3b. If frontend changed
cd frontend && sudo -u deploy npm run build
systemctl restart myproject-astro-staging

# 4. Open browser → check staging → done

If only the frontend changed, you don't touch the backend. If only the backend changed, you don't rebuild the frontend. Do both if both changed.

Staging → Production: The Manual Promotion Gate

Once staging looks good, promote to production:

# Pull the same code into the production directory
cd /var/www/myproject-production/myproject && git pull

# Rebuild frontend
cd frontend && sudo -u deploy npm run build
systemctl restart myproject-astro-production

# Rebuild backend if it changed
cd ../backend
systemctl stop myproject-backend-production
/usr/local/go/bin/go build -o myproject-backend ./cmd/server/main.go
chown deploy:deploy myproject-backend
systemctl start myproject-backend-production

This is the manual promotion gate — you explicitly decide when production gets updated. For a solo project, this is a feature, not a limitation. You're never wondering why production is broken because something got auto-deployed while you were away.

Common Gotchas

| Problem | Cause | Fix | |---|---|---| | permission denied when app writes files | Service runs as deploy, directory owned by root | chown -R deploy:deploy <dir> | | API returns 500 after you manually created/edited a file as root | Root-owned files are unwritable by the deploy service | chown deploy:deploy <file> | | text file busy on Go rebuild | Can't overwrite a running binary | systemctl stop first, then build | | npm run build fails with EACCES | Ran npm as root | Always sudo -u deploy npm ... | | detected dubious ownership | git clone ran as root | git config --global --add safe.directory <path> | | nginx 502 | App not listening on expected port | Check journalctl -u <service>, verify PORT env var | | Changes don't appear after deploy | Forgot to rebuild or restart | Rebuild frontend, restart service | | SSH tunnel broken after adding DROP rule | Blanket -j DROP blocks localhost | Use ! -s 127.0.0.1 -j DROP |

Is This Right for You?

This workflow makes sense if:

You're running a personal project or small site on a VPS you already pay for
You want to understand deployment fundamentals rather than abstract them away
You have one or two contributors — the manual step doesn't scale to a team
You don't want to burn CI/CD compute minutes on simple deploys

It probably doesn't make sense if:

Multiple people are merging code and you need automated testing on every commit
You need zero-downtime blue/green deploys (use a proper pipeline)
You're managing many services and want a unified dashboard (Coolify is genuinely great for that)
You need horizontal scaling across multiple VPS nodes

For this site, the boring approach is working fine. Two systemd services, one nginx config, and a git pull to deploy. No containers, no orchestration, no surprise bills.

AI CLI Panic Wasn't Spying. It Was Permissions

Sun, 08 Feb 2026 00:00:00 GMT

The Myth vs. The Real Risk edited

There’s a story that keeps circulating in developer circles: “AI CLIs can see your entire machine.”

It’s the kind of claim that sticks because it feels plausible, after all, these tools can run commands, read files, and automate workflows.

The image people form is a black box roaming their filesystem, peeking into secrets, and reporting back.

But the truth is more grounded, and more useful. The real problem wasn’t secret surveillance. It was permissions.

The early panic around AI CLIs mostly came from people giving tools too much access, sometimes without realizing it, and then accidentally exposing sensitive data during normal usage.

This article explains the actual story behind the fear, then gives a clean, practical, battle‑tested playbook for using AI CLIs safely.

You’ll also get a checklist you can apply to any AI agent, regardless of vendor or tool.

The Origin Story: What Actually Happened

Phase 1: Hype and experimentation

When AI CLIs emerged, people rushed to test them. They ran them in their home directories, fed them logs, or asked them to “scan the repo.” The novelty was intoxicating: “Watch this tool refactor my codebase in seconds!”

Phase 2: Accidental oversharing

Soon after, a few stories appeared: someone pasted tokens into a chat, another ran a command that dumped a .env file, and a third granted the AI direct access to a directory containing SSH keys.

None of this required malicious behavior. It was just normal developer habits combined with powerful new tools. But the outcome was real: secrets ended up in logs or prompts.

Phase 3: The myth spreads

Those incidents quickly morphed into a simplified narrative: “AI CLIs can see your whole machine.” It’s emotionally compelling, but inaccurate. The AI doesn’t magically scan your system. It only sees what you explicitly share or what it is given permission to read or execute.

The real takeaway

The risk isn’t the AI. The risk is access, and how easy it is to accidentally widen access without noticing. That’s why the most important theme in safe AI CLI usage is AI CLI permissions: what the tool can read, execute, and exfiltrate.

What an AI CLI Actually Sees

An AI CLI is just an interface to:

What you ask it to read (files, output, logs)
What you ask it to run (commands, scripts, tests)
What you show it (copied text, pasted configs)

It isn’t omniscient. It doesn’t crawl your machine unless you allow it to. However, it can easily access more than you intended if you run it in the wrong directory or feed it with the wrong command output.

The Core Concept: AI CLI Permissions

Think of an AI CLI like a new teammate who wants to help. By default, it should only see what you decide to show them. The more rights you give it, the more damage it could do, usually accidentally.

This is why AI CLI permissions are the right mental model. It’s not about whether the AI is “trusted” or “safe.” It’s about what it can access, and whether that access is proportionate to the task.

The Practical Safeguards (The Real Best Practices)

Below are the safeguards that experienced teams now use. These are pragmatic, not theoretical. If you apply these, you can use AI CLIs with confidence.

1. Use the Principle of Least Access

Rule: Never run AI CLIs at a directory scope that is larger than necessary.
Why it works: This prevents accidental reads of unrelated files.

Good:

- ~/projects/my-app/

Bad:

- ~/
- /Users/yourname/

This single habit eliminates most accidental exposures.

2. Use a Dedicated Workspace for AI Tasks

Rule: Keep “AI‑assisted work” in a repo‑specific folder.
Why it works: If an AI agent scans or modifies files, it only touches what it should.

If your machine contains secrets or personal data, this separation reduces risk drastically.

3. Don’t Paste Secrets (Ever)

Rule: Never paste API keys, tokens, or private keys into any AI prompt.
Why it works: Even if the tool is trustworthy, you reduce the chance of accidental logging or exposure.

Use placeholders like:

OPENAI_API_KEY=REDACTED

4. Avoid Reading .env by Default

Rule: Keep .env files out of AI prompts unless absolutely necessary.
Why it works: These files typically contain the very secrets that should never leave your machine.

If a task requires environment variables, paste only the variable names (not values).

5. Use Scoped Tokens

Rule: Use least‑privilege tokens.
Why it works: If a token leaks, its damage is limited.

Example: A token limited to read‑only GitHub repos is safer than a token that can write, delete, or create.

6. Treat “Command Output” as Sensitive

Rule: Always skim output before pasting it into AI.
Why it works: Logs often contain secrets, file paths, or debug traces.

Even harmless commands like env or printenv can leak credentials.

7. Separate “Automation” from “Reasoning”

Rule: Use AI for planning and code review, but keep it away from secret‑bearing automation.
Why it works: It reduces the risk of exposing credentials while still benefiting from AI assistance.

8. Use a VM or Isolated Dev Environment (Optional but Powerful)

Rule: If you handle sensitive data, use a dedicated VM or container for AI‑assisted work.
Why it works: Even if a command is run, the blast radius is limited.

This is why some teams use isolated dev machines or VPN‑protected environments. It’s not because the AI “sees everything,” but because they want extra boundaries.

9. Rotate Credentials After Mistakes

Rule: If you ever accidentally paste a token, rotate it immediately.
Why it works: Reduces the time window of exposure.

This is the safest habit you can build, even if it’s a little inconvenient.

10. Ask for Command Explanations

Rule: If the AI suggests a command, ask it what it does before running.
Why it works: AI is helpful, but it can make mistakes. You should understand the command’s impact.

The Real Risk Model (Simplified)

When people panic about AI CLIs, they’re usually imagining a malicious tool. In reality, the risk is almost always accidental:

You run the tool in the wrong directory
You paste a config file without realizing it contains secrets
You run a command that dumps too much context

That’s why AI CLI permissions are the single most important concept. It’s not about whether the AI is safe. It’s about whether you gave it too much access.

A Simple Checklist You Can Keep

If you only remember one thing, remember this list:

Work in a dedicated repo folder
Don’t paste secrets
Avoid .env files
Use scoped tokens
Review output before sharing
Rotate keys if you slip

That’s the 80/20. Everything else is optional.

Why This Works: The Principle of Bounded Access

Most issues disappear if you bound the AI’s access. That’s the real solution. The tool doesn’t need full access to be useful. It only needs the files relevant to your task.

This is the same principle used in security engineering: the fewer privileges a system has, the fewer ways it can fail.

The Myth Finally Dies

The “AI sees everything” myth is a shortcut explanation. It feels true because the tools are powerful. But it’s not the right mental model.

The correct model is:

AI is a tool
Tools need permissions
Permissions should be minimal

Once you internalize that, you can enjoy the productivity benefits of AI CLIs without the fear.

Final Takeaway

The story behind AI CLI fear isn’t about spying. It’s about misunderstanding access. When you use these tools with intention: proper scope, no secrets, least privilege, they are safe, powerful, and genuinely worth it.

If you treat AI CLIs like a superuser, they’ll behave like one. If you treat them like a scoped assistant, they’ll be safe and useful.

That’s the real lesson. That’s the end of the story.