Tối ưu logging và access log trong Nginx khi server bị quá tải CPU hoặc RAM

1. Vì sao logging có thể gây quá tải

Mỗi request vào Nginx sẽ:

Ghi 1 dòng vào access.log
Có thể ghi thêm error.log
Flush xuống disk
Thực hiện format string

Khi traffic lớn (10k–100k request/giây):

CPU tăng do format log
I/O disk tăng mạnh
RAM tăng nếu buffer chưa flush
Ổ SSD dễ bị bottleneck

Kết quả là:

Load average tăng
Nginx chậm
Thậm chí treo server

1️⃣ Tối ưu Access Log trong Nginx

Tắt access log cho static file

Nếu bạn có nhiều request static như CSS, JS, ảnh:

location ~* \.(jpg|jpeg|png|gif|ico|css|js|svg|woff|woff2)$ {
    access_log off;
    log_not_found off;
}

Việc này có thể giảm 50–80% log write nếu website nhiều asset.

Tắt hoàn toàn access log nếu cần

Nếu hệ thống internal API hoặc không cần trace:

access_log off;

Áp dụng trong server hoặc location.

Giảm độ chi tiết log format

Format mặc định khá dài:

log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                '$status $body_bytes_sent "$http_referer" '
                '"$http_user_agent" "$http_x_forwarded_for"';

Bạn có thể rút gọn:

log_format minimal '$remote_addr "$request" $status $body_bytes_sent';
access_log /var/log/nginx/access.log minimal;

Giảm CPU xử lý format string đáng kể.

Dùng buffered logging

Thay vì ghi từng dòng xuống disk:

access_log /var/log/nginx/access.log minimal buffer=256k flush=5s;

Ý nghĩa:

buffer=256k → gom 256KB mới ghi
flush=5s → flush mỗi 5 giây

Giảm I/O rất nhiều khi traffic cao.

⚠ Lưu ý: nếu server crash, có thể mất vài KB log cuối.

Ghi log async bằng syslog

access_log syslog:server=127.0.0.1:514 minimal;

Ưu điểm:

Giảm I/O disk
Log được xử lý bởi rsyslog hoặc log collector

Phù hợp khi dùng ELK / Loki / Splunk.

2️⃣ Tối ưu Error Log

Chỉ dùng mức warn hoặc error

Mặc định nhiều server để:

error_log /var/log/nginx/error.log notice;

Nên giảm xuống:

error_log /var/log/nginx/error.log warn;

Hoặc production nặng:

error_log /var/log/nginx/error.log error;

Tránh ghi quá nhiều dòng debug.

3️⃣ Giảm CPU khi traffic cao

Giới hạn log theo điều kiện

Chỉ log khi status >= 400:

map $status $loggable {
    ~^[23]  0;
    default 1;
}access_log /var/log/nginx/access.log minimal if=$loggable;

Giảm 80–95% log nếu hệ thống ổn định.

4️⃣ Xử lý quá tải CPU / RAM tổng thể trong Nginx

Logging chỉ là một phần. Khi quá tải, cần tối ưu toàn hệ thống.

Tối ưu worker process

worker_processes auto;
worker_rlimit_nofile 200000;

Tối ưu worker connections

events {
    worker_connections 65535;
    multi_accept on;
    use epoll;
}

Bật keepalive hợp lý

keepalive_timeout 15;
keepalive_requests 1000;

Giới hạn request để chống flood

limit_req_zone $binary_remote_addr zone=one:10m rate=10r/s;server {
    location / {
        limit_req zone=one burst=20 nodelay;
    }
}

5️⃣ Xử lý khi server đang quá tải thật sự

Nếu CPU 100% và load tăng:

Tắt access log ngay: sed -i 's/access_log/#access_log/g' /etc/nginx/nginx.conf
nginx -s reload
Kiểm tra I/O: iotop
Kiểm tra log quá lớn: du -sh /var/log/nginx/*
Xoay log ngay: logrotate -f /etc/logrotate.d/nginx

6️⃣ Monitoring khi production tải cao

Bạn nên monitor:

CPU usage
RAM usage
I/O wait
Load average
Network throughput

Các công cụ phổ biến:

Prometheus
Grafana
Zabbix
Netdata

Nếu chỉ cần nhanh gọn:

htop
top
vmstat 1

7️⃣ Kiến trúc chuẩn khi traffic lớn

Production traffic cao thường:

Nginx làm reverse proxy
Log gửi về log server riêng
Dùng ELK stack:
- Elasticsearch
- Logstash
- Kibana

Hoặc dùng:

Grafana Loki

8️⃣ Checklist tối ưu nhanh cho production

Nếu bạn đang bị quá tải, hãy áp dụng theo thứ tự:

Tắt log static file
Giảm log format
Bật buffer
Chỉ log status >= 400
Giảm error_log level
Dùng logrotate chuẩn
Monitor I/O wait

Cấu hình tối ưu mẫu cho production chịu tải cao

worker_processes auto;
worker_rlimit_nofile 200000;events {
    worker_connections 65535;
    multi_accept on;
    use epoll;
}http {
    log_format minimal '$remote_addr "$request" $status $body_bytes_sent';    map $status $loggable {
        ~^[23]  0;
        default 1;
    }    access_log /var/log/nginx/access.log minimal buffer=512k flush=5s if=$loggable;
    error_log /var/log/nginx/error.log warn;    keepalive_timeout 15;
    keepalive_requests 1000;
}

Kết luận quan trọng

Trong môi trường traffic cao, logging không được cấu hình đúng có thể chiếm 20–40% CPU và gây nghẽn I/O.

Tối ưu access log thường là bước nhanh nhất để cứu một server đang quá tải.

Đại Nhân Nguyên

Life is what happens to us while we are making other plans. ― Allen Saunders