Press J to jump to the feed. Press question mark to learn the rest of the keyboard shortcuts

OSPF drops

Hey Guys,

I have a network problem which I am hoping you can shed some light on.

I work for an ISP and I have strange issue with one of the routers that we manage. From this router we run OSPF to our upstream PE and run LDP as well. The issue we are seeing is that the OSPF keep going from Loading to full on the CPE:

GigabitEthernet0/0.100 from LOADING to FULL, Loading Done

On the upstream PE we see the following:

GigabitEthernet0/1.100 from FULL to DOWN, Neighbor Down: Dead timer expired

The CPU on the router does spike occasionally, but not at the time of the drop. I am thinking that it may be due to a sudden burst of traffic causing saturation of the link, causing the OSPF to drop. Saying this however, the LDP neighbor doesn't/hasn't dropped and this forms via the same physical link as the OSPF does.

I have configured an EEM script on the router to try and give me a bit more info, but nothing has been generated yet. Im just wondering if it's possible for me QoS the OSPF packets generated from the router. Also if you have an troubleshooting tips they would be much appreciated.


67% Upvoted
What are your thoughts? Log in or Sign uplog insign up
level 1

Check the link between both connections for errors.

level 1

Have you tried setting your OSPF neighbors to NBMA and explicitly defining them? We use static neighbors in our WiFi distribution networks where we drop packets all the time. No more flapping. I think it uses more reliable communication. You should explore NBMA OSPF.

level 1
I drink and I route things2 points · 3 months ago · edited 3 months ago

Do you have a router ID set? If a router ID is not set the IP highest numbered interface will be used as mentioned by /u/Danisunoriginal even if it's not added to OSPF. When that interface flaps OSPF will go down as the router ID is no longer valid, even if the OSPF-connected interface is still up and working as expected.

level 2
2 points · 3 months ago

If the router-id isn't explicitly set, the router chooses one automatically.

Depends on the implementation, but most vendors just go with "the IP of the highest numbered interface"

level 3
I drink and I route things3 points · 3 months ago

Until that interface drops which will absolutely cause the OSPF session to die. Just ran into this recently with another customer - no router ID set, and the IP highest numbered interface was used. Even though OSPF was set on a different interface, when the RID interface flapped all of OSPF broke.

level 3

The router should automatically select the highest loopback as the RID, though. So as long as you have at least one loopback configured that shouldn't be an issue, right?

level 4
CCNP | ISP Operations1 point · 3 months ago

Correct. But not everybody does this.

level 2
3 points · 3 months ago

The fix for this of course is to set your router-id to whatever the loopback Interface is.

That way OSPF is always "nailed up"

Generally speaking, you should be using the high parts of your address space for things like connected interfaces and loopback addresses. Avoid using RFC1918 on anything other than firewalls and the design decision of "router-id == highest numbered interface" will mostly cause the router to guess correctly. But yeah, it's probably better to explicitly set router-id.

level 3
I drink and I route things2 points · 3 months ago

+1 to this and the reason why I asked this question. It's always a good practice to set a RID statically (like a loopback address). Also makes debugging the OSPF database far less painful :)

level 1
1 point · 3 months ago

What are the dead and hello timers set to?

We generally have ours set pretty aggressive. 4 seconds on the dead timer and 1 second on the hello timer.

This works pretty well until you're establishing adjacencies with older/different kinds of equipment. For example, between an MLXe and a 7606, we had to drop the timers back to their default values to get the flapping to stop.

level 2
Network Engineer8 points · 3 months ago

4 seconds on the dead timer and 1 second on the hello timer.

Let me introduce you to our lord and savior, BFD


level 2
Network Admin1 point · 3 months ago

that's tighter than I'd have recommended, but I was going to remind /u/benanater that the dead timer needs to be 4x that of the hello timer. If it isn't, you're going to see issues like described here

level 3
Original Poster1 point · 3 months ago

At the moment the we have dead 5 hello 2

level 4
CCFNG5 points · 3 months ago

If you have BFD available on your platform, you should use that instead of tweaking a control-plane protocol for fast failure detection.

level 4
Network Admin3 points · 3 months ago

change dead to 8 if you have hello at 2.

edit: also, 2 and 8 are what I use, rather than 1 and 4. That's a lab environment though, production primary IGP is EIGRP.

level 4
1 point · 3 months ago

Yeah this is exactly what your problem is. The dead timer needs to be 4 times your hello timer.

Community Details





###Enterprise Networking Routers, switches and firewalls. Network blogs, news and network management articles. Cisco, Juniper, Brocade and more all welcome.

Create Post
r/networking Rules
Rule #1: No Home Networking.
Rule #2: No Certification Brain Dumps / Cheating.
Rule #3: No BlogSpam / Traffic re-direction.
Rule #4: No Low Quality Posts.
Rule #5: No Early Career Advice.
Rule #6: Educational Questions must show effort.
Cookies help us deliver our Services. By using our Services or clicking I agree, you agree to our use of cookies. Learn More.