Reducing memory use
The GitLab Rails application code suffers from memory leaks. For web requests this problem is made manageable using a supervision thread that automatically restarts workers if they exceed a given resident set size (RSS) threshold for a certain amount of time. We use the same approach to the Sidekiq processes used by GitLab to process background jobs.
GitLab monitors the available RSS limit by default only for Linux package or Docker installations. The reason for this is that GitLab relies on runit to restart Sidekiq after a memory-induced shutdown, and self-compiled and Helm chart installations don't use runit or an equivalent tool.
With the default settings, Sidekiq restarts no more often than once every 15 minutes, with the restart causing about one minute of delay for incoming background jobs.
Some background jobs rely on long-running external processes. To ensure these
are cleanly terminated when Sidekiq is restarted, each Sidekiq process should be
run as a process group leader (for example, using chpst -P). If using a Linux package installation or the
bin/background_jobs script with runit installed, this is handled for you.
Configuring the limits
Sidekiq memory limits are controlled using environment variables
- 
SIDEKIQ_MEMORY_KILLER_MAX_RSS(KB): defines the Sidekiq process soft limit for allowed RSS. If the Sidekiq process RSS (expressed in kilobytes) exceedsSIDEKIQ_MEMORY_KILLER_MAX_RSS, for longer thanSIDEKIQ_MEMORY_KILLER_GRACE_TIME, the graceful restart is triggered. IfSIDEKIQ_MEMORY_KILLER_MAX_RSSis not set, or its value is set to 0, the soft limit is not monitored.SIDEKIQ_MEMORY_KILLER_MAX_RSSdefaults to2000000.
- 
SIDEKIQ_MEMORY_KILLER_GRACE_TIME: defines the grace time period in seconds for which the Sidekiq process is allowed to run above the allowed RSS soft limit. If the Sidekiq process goes below the allowed RSS (soft limit) withinSIDEKIQ_MEMORY_KILLER_GRACE_TIME, the restart is aborted. Default value is 900 seconds (15 minutes).
- 
SIDEKIQ_MEMORY_KILLER_HARD_LIMIT_RSS(KB): defines the Sidekiq process hard limit for allowed RSS. If the Sidekiq process RSS (expressed in kilobytes) exceedsSIDEKIQ_MEMORY_KILLER_HARD_LIMIT_RSS, an immediate graceful restart of Sidekiq is triggered. If this value is not set, or set to 0, the hard limit is not be monitored.
- 
SIDEKIQ_MEMORY_KILLER_CHECK_INTERVAL: defines how often to check the process RSS. Defaults to 3 seconds.
- 
SIDEKIQ_MEMORY_KILLER_SHUTDOWN_WAIT: defines the maximum time allowed for all Sidekiq jobs to finish. No new jobs are accepted during that time. Defaults to 30 seconds.If the process restart is not performed by Sidekiq, the Sidekiq process is forcefully terminated after Sidekiq shutdown timeout (defaults to 25 seconds) +2 seconds. If jobs do not finish during that time, all currently running jobs are interrupted with a SIGTERMsignal sent to the Sidekiq process.
- 
GITLAB_MEMORY_WATCHDOG_ENABLED: enabled by default. Set theGITLAB_MEMORY_WATCHDOG_ENABLEDto false, to disable Watchdog from running.
Monitor worker restarts
GitLab emits log events if workers are restarted due to high memory usage.
The following is an example of one of these log events in /var/log/gitlab/gitlab-rails/sidekiq_client.log:
{
  "severity": "WARN",
  "time": "2023-02-04T09:45:16.173Z",
  "correlation_id": null,
  "pid": 2725,
  "worker_id": "sidekiq_1",
  "memwd_handler_class": "Gitlab::Memory::Watchdog::SidekiqHandler",
  "memwd_sleep_time_s": 3,
  "memwd_rss_bytes": 1079683247,
  "memwd_max_rss_bytes": 629145600,
  "memwd_max_strikes": 5,
  "memwd_cur_strikes": 6,
  "message": "rss memory limit exceeded",
  "running_jobs": [
    {
      jid: "83efb701c59547ee42ff7068",
      worker_class: "Ci::DeleteObjectsWorker"
    },
    {
      jid: "c3a74503dc2637f8f9445dd3",
      worker_class: "Ci::ArchiveTraceWorker"
    }
  ]
}Where:
- 
memwd_rss_bytesis the actual amount of memory consumed.
- 
memwd_max_rss_bytesis the RSS limit set throughper_worker_max_memory_mb.
- 
running jobslists the jobs that were running at the time when the process exceeded the RSS limit and started a graceful restart.