I'm facing an issue with setup I am using for ocassionally doing maintenance on a bunch of customer servers via remote SSH
Following Setup:
1 Control Server
X Arbitrary number of Customer servers set up to have a 'service' account connect to my control server via SSH.
I've set up the clients to automatically connect to the control server, which has a fixed IP, via the service account using autoSSH after bootup. This is my /etc/ssh/ssh_config
on the customer machine:
# This is the ssh client system-wide configuration file. See
# ssh_config(5) for more information. This file provides defaults for
# users, and the values can be changed in per-user configuration files
# or on the command line.
# Configuration data is parsed as follows:
# 1. command line options
# 2. user-specific file
# 3. system-wide file
# Any configuration value is only changed the first time it is set.
# Thus, host-specific definitions should be at the beginning of the
# configuration file, and defaults at the end.
# Site-wide defaults for some commonly used options. For a comprehensive
# list of available options, their meanings and defaults, please see the
# ssh_config(5) man page.
Host *
# ForwardAgent no
# ForwardX11 no
# ForwardX11Trusted yes
# PasswordAuthentication yes
# HostbasedAuthentication no
# GSSAPIAuthentication no
# GSSAPIDelegateCredentials no
# GSSAPIKeyExchange no
# GSSAPITrustDNS no
# BatchMode no
# CheckHostIP yes
# AddressFamily any
# ConnectTimeout 0
# StrictHostKeyChecking ask
# IdentityFile ~/.ssh/id_rsa
# IdentityFile ~/.ssh/id_dsa
# IdentityFile ~/.ssh/id_ecdsa
# IdentityFile ~/.ssh/id_ed25519
# Port 22
# Protocol 2
# Ciphers aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc
# MACs hmac-md5,hmac-sha1,[email protected]
# EscapeChar ~
# Tunnel no
# TunnelDevice any:any
# PermitLocalCommand no
# VisualHostKey no
# ProxyCommand ssh -q -W %h:%p gateway.example.com
# RekeyLimit 1G 1h
SendEnv LANG LC_*
HashKnownHosts yes
GSSAPIAuthentication yes
ServerAliveInterval 300
On the control server I am using the following sshd_config:
# $OpenBSD: sshd_config,v 1.103 2018/04/09 20:41:22 tj Exp $
# This is the sshd server system-wide configuration file. See
# sshd_config(5) for more information.
# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin
# The strategy used for options in the default sshd_config shipped with
# OpenSSH is to specify options with their default value where
# possible, but leave them commented. Uncommented options override the
# default value.
Port --hidden--
#AddressFamily any
#ListenAddress 0.0.0.0
#ListenAddress ::
#HostKey /etc/ssh/ssh_host_rsa_key
#HostKey /etc/ssh/ssh_host_ecdsa_key
#HostKey /etc/ssh/ssh_host_ed25519_key
# Ciphers and keying
#RekeyLimit default none
# Logging
#SyslogFacility AUTH
#LogLevel INFO
# Authentication:
#LoginGraceTime 2m
#StrictModes yes
#MaxAuthTries 6
#MaxSessions 10
#PubkeyAuthentication yes
# Expect .ssh/authorized_keys2 to be disregarded by default in future.
#AuthorizedKeysFile .ssh/authorized_keys .ssh/authorized_keys2
#AuthorizedPrincipalsFile none
#AuthorizedKeysCommand none
#AuthorizedKeysCommandUser nobody
# For this to work you will also need host keys in /etc/ssh/ssh_known_hosts
#HostbasedAuthentication no
# Change to yes if you don't trust ~/.ssh/known_hosts for
# HostbasedAuthentication
#IgnoreUserKnownHosts no
# Don't read the user's ~/.rhosts and ~/.shosts files
#IgnoreRhosts yes
# To disable tunneled clear text passwords, change to no here!
#PermitEmptyPasswords no
# Change to yes to enable challenge-response passwords (beware issues with
# some PAM modules and threads)
ChallengeResponseAuthentication no
# Kerberos options
#KerberosAuthentication no
#KerberosOrLocalPasswd yes
#KerberosTicketCleanup yes
#KerberosGetAFSToken no
# GSSAPI options
#GSSAPIAuthentication no
#GSSAPICleanupCredentials yes
#GSSAPIStrictAcceptorCheck yes
#GSSAPIKeyExchange no
# Set this to 'yes' to enable PAM authentication, account processing,
# and session processing. If this is enabled, PAM authentication will
# be allowed through the ChallengeResponseAuthentication and
# PAM authentication via ChallengeResponseAuthentication may bypass
# If you just want the PAM account and session checks to run without
# and ChallengeResponseAuthentication to 'no'.
UsePAM yes
#AllowAgentForwarding yes
AllowTcpForwarding yes
GatewayPorts yes
X11Forwarding yes
#X11DisplayOffset 10
#X11UseLocalhost yes
#PermitTTY yes
PrintMotd no
#PrintLastLog yes
#TCPKeepAlive yes
#PermitUserEnvironment no
#Compression delayed
ClientAliveInterval 30
ClientAliveCountMax 99999
#UseDNS no
#PidFile /var/run/sshd.pid
#MaxStartups 10:30:100
#PermitTunnel no
#ChrootDirectory none
#VersionAddendum none
# no default banner path
#Banner none
# Allow client to pass locale environment variables
AcceptEnv LANG LC_*
# override default of no subsystems
Subsystem sftp /usr/lib/openssh/sftp-server
# Example of overriding settings on a per-user basis
#Match User anoncvs
# X11Forwarding no
# AllowTcpForwarding no
# PermitTTY no
# ForceCommand cvs server
PasswordAuthentication no
PermitRootLogin yes
Basically I would expect the servers to just keep the connections open, since both sides have enough timeouts set. However, the connections randomly keep dropping. I've checked /var/log/syslog
and it seems like sshd
randomly drops one of the active connections once a new connection comes in. So I'm pretty sure I'm hitting some connection limit here:
Nov 26 18:38:38 v2202102140578142103 systemd[1]: session-115234.scope: Succeeded.
Nov 26 18:38:38 v2202102140578142103 systemd[1]: Started Session 115376 of user service.
Nov 26 18:38:47 v2202102140578142103 systemd[1]: session-115235.scope: Succeeded.
Nov 26 18:38:47 v2202102140578142103 systemd[1]: Started Session 115377 of user service.
Nov 26 18:38:52 v2202102140578142103 systemd[1]: session-115236.scope: Succeeded.
Nov 26 18:38:53 v2202102140578142103 systemd[1]: Started Session 115378 of user service.
Nov 26 18:39:08 v2202102140578142103 systemd[1]: session-115237.scope: Succeeded.
Nov 26 18:39:08 v2202102140578142103 systemd[1]: Started Session 115379 of user service.
Nov 26 18:39:08 v2202102140578142103 systemd[1]: session-115238.scope: Succeeded.
Nov 26 18:39:08 v2202102140578142103 systemd[1]: Started Session 115380 of user service.
Nov 26 18:39:09 v2202102140578142103 systemd[1]: session-115239.scope: Succeeded.
Nov 26 18:39:09 v2202102140578142103 systemd[1]: Started Session 115381 of user service.
Nov 26 18:39:14 v2202102140578142103 systemd[1]: session-115240.scope: Succeeded.
Nov 26 18:39:15 v2202102140578142103 systemd[1]: Started Session 115382 of user service.
Nov 26 18:39:31 v2202102140578142103 systemd[1]: session-115241.scope: Succeeded.
Nov 26 18:39:31 v2202102140578142103 systemd[1]: session-115242.scope: Succeeded.
Nov 26 18:39:31 v2202102140578142103 systemd[1]: Started Session 115383 of user service.
Nov 26 18:39:31 v2202102140578142103 systemd[1]: Started Session 115384 of user service.
Nov 26 18:39:32 v2202102140578142103 systemd[1]: session-115243.scope: Succeeded.
Nov 26 18:39:33 v2202102140578142103 systemd[1]: Started Session 115385 of user service.
Probably something super simple to fix, but I'm not a linux networking expert, and I wasn't able to find anything useful via own research. So hopefilly someone is able to point me to the limit I have to change for this behaviour to stop?
Thanks in advance!