we are having a weird issue with a network of Cisco SG350 Switches that I cannot figure out. We are thinking it may be related to STP but have verified all the normal problem points (i.e. proper ports are showing as Root / SmartPort disabled / etc)
Here is the network diagram:
Cisco SG-250 Network Diagram
As you can see, we have 5 Cisco SG350 switches, all in a parallel daisy chain, except switch 243. All of these are connected on TRUNK ports all have 3 VLANs configured. The problem we are seeing is that throughout the day (its intermittent) traffic will drop on either the 242 and/or 243 switches for about 30-60 seconds.
When we investigate the logs, we are able to verify that a) the switches have not rebooted, b) the connection was lost, c) it appears that certain ports were in STP blocking for a period of time (usually 30-60 seconds).
For example, earlier today (Oct 21 around 23:30 GMT), sw242 went offline for about 30 seconds. The logs on sw242 only show gi17 going up/down, which we believe to be unrelated as gi17 is a CCTV camera. The logs on sw243 show nothing substation (even though devices on this switch went down but the upstream sw241 switch did not) and the upstream sw241 switch logs show ge24 STP Blocking and gi25 STP Blocking (this is the sw242/243 switches that went down).
It appears that for SOME reason sw241 is causing ge24 and ge25 (the two downstream switches) to STP block periodically but I cannot figure out why.
I have posted a copy of the TSR/CONFIG for each switch and I can provide logs if necessary but we have been troubleshooting this problem for several weeks and cannot seem to pinpoint it. Today, we upgraded the firmware on all the switches AND rebuilt the configuration for 244, 243, 242 from the ground up. We did not rebuild 251 or 241 as they do not seem to be causing the problem (that we can tell) and as the business was open, it was not conducive to take their entire network down.
Download Tech Support Logs
Any assistance is greatly appreciated!!