Jump to content

Stuck on Loading map page when joining Ginnungagap and Norse Harold


Yekaterina
 Share

Recommended Posts

Okay, I made a small source patch to adjust the MTU to 1392 bytes instead of 1400 bytes. It affects only the MTU of outgoing traffic. I don't see a way for the server to control the MTU used by clients for packets sent to the server, unless it's somewhere in the ENet logic for constraining the MTU. And, ideally the MTU is decided automatically instead of hardcoded in the source code. The ICMP traffic necessary for the OS to do that might be blocked locally or remotely, although that is not necessarily what ENet is relying on. Note that "ping -M do -s 1392 play0ad.com" through the VPN link succeeds. Edit: An idea for allowing the server to control the MTU used by clients: apparently MAX_CLIENTS peers are allocated immediately by the call to enet_host_create() in CNetServerWorker::SetupConnection() . We can iterate through all of the elements of the m_Host->peers array and adjust the MTU before the peer structures are used for actual connections.

Anyway, I used the improved MTU when hosting several games through VPN today. longsentenceasname, who was unable to stay connected in the past was able to stay connected and play a complete team game hosted by me. I assume that the modification will also allow Helicity, Cousin, and others to stay connected now. And no, clients don't need to apply the patch. Only hosters using VPNs, such as @Ginnungagap might need to apply this patch.

For anyone choosing to apply the patch, you need to have a build environment setup first. The build environment instructions mention the SVN version of the 0ad source code. Instead, use the source code for the stable release of alpha 26, since that's what this patch is intended for. That way it's usable with the current player base.

See the attached file below for the patch.

Adjust-MTU-for-VPN-link.patch

Edited by Norse_Harold
  • Like 1
Link to comment
Share on other sites

An enet package might be much larger than the MTU, then fragmentation is done based on MTU of both peers (server/client).

11 hours ago, Norse_Harold said:

We can iterate through all of the elements of the m_Host->peers array and adjust the MTU before the peer structures are used for actual connections.

That would be futile as it won't update the MTU on the side of the peer.

11 hours ago, Norse_Harold said:

longsentenceasname, who was unable to stay connected in the past was able to stay connected and play a complete team game hosted by me

Given the information in this thread so far the issue is not on your end, but you can ofc fix the issue on the other end if it's mtu related by lowering your enet mtu at the cost of possibly making all other connections less efficient at the same time.

 

Attached a cleaner patch without sanity checks that allows setting mtu in config or via command line, ie. pyrogenesis -conf=network.mtu:1392

Anyway not yet convinced that this is the actual issue, yes might be, but still waiting for the output of the original ping command from @Helicity to confirm.

enet-mtu.patch

  • Thanks 1
Link to comment
Share on other sites

On 06/03/2023 at 11:14 AM, hyperion said:

still waiting for the output of the original ping command from @Helicity to confirm.

Okay, Helicity did the ping command. Here are the results.

White redaction: Helicity's user and host name

Red redaction: Harold's VPN IP address

Blue redaction: Helicity's IP address

2000214709_ping-m1392haroldsvpnip.png.a2011b7278cd53a5427c7cd84d3fe670.png

Then I asked Helicity to test with a packetsize of 1400 bytes:

1804704268_ping-m1393haroldsvpnip.png.d4c54085d41107debbb8ab2a4139ed1b.png

 

We also tested a packetsize parameter of 1393, and it resulted in fragmentation needed. So, it seems that the maximum is 1392 bytes.

By the way, normally my firewall blocks all ping requests and replies. At first, Helicity got no replies regardless of the packetsize chosen, as expected. I added a firewall rule to allow ping requests and replies with only Helicity's IP address, which produced the above output when Helicity ran the tests.

Edited by Norse_Harold
Link to comment
Share on other sites

Just now, Norse_Harold said:

So, it seems that the maximum is 1392 bytes.

Which means mtu is 1420 between both of you, if now with your host patched to mtu of 1392 there are no drops anymore we have a bug in enet.

Will check enet over the weekend to confirm. Possibly a release blocker as it would mean vpn definitely is broken.

Link to comment
Share on other sites

I haven't tested a game with Helicity yet, but as I said with the patch applied to my copy of 0ad, longsentenceasname was able to stay connected for an entire game, despite large packets of at most 1392 bytes sent to him during the game. Therefore, there was not packet loss of the large outgong packets. There is stilil fragmentation of incoming large packets (1400 bytes), and it would be nice to have the ability to prevent that as a host, but those packets aren't being dropped.

I'm leaning toward it being a bug in ENet or 0ad. I'm surprised that so far I have found no feature in ENet for automatically adjusting the MTU of the link, but maybe that means that the user app is responsible for adjusting the MTU.

I think that a proper solution would involve looking at the OS's indication of the MTU of the link that's in use, and harmonizing the MTU used by ENet (or 0ad) to that value at first. Then, upon connection, both peers use the minimum MTU of each side of the connection. It may be difficult for the library or app to determine which link is in use, because when a VPN is in use there is more than one default gateway, and multiple routing table rules are in place. Therefore, maybe a system could be added that would allow the user to hint which interface is in use or at least the MTU to use.

Edited by Norse_Harold
Link to comment
Share on other sites

There is the thing called path mtu discovery, but it's tricky and or unreliable.

The reason why I'm somewhat skeptical is such a bug should have been found long ago one might think.

18 minutes ago, Norse_Harold said:

There is stilil fragmentation of incoming large packets (1400 bytes)

This sounds a bit like luck that the order was preserved.

Link to comment
Share on other sites

3 minutes ago, hyperion said:
38 minutes ago, Norse_Harold said:

There is stilil fragmentation of incoming large packets (1400 bytes)

This sounds a bit like luck that the order was preserved.

Linux is able to re-order received out-of-order fragmented packets, within reason.

See the documentation for sysctl variables net.ipv4.ipfrag_time and net.ipv4.ipfrag_max_dist in (Linux kernel source code tarball)/Documentation/networking/ip-sysctl.rst

7 minutes ago, hyperion said:

The reason why I'm somewhat skeptical is such a bug should have been found long ago one might think.

I would have thought so too. What other tests would you recommend in order to be sure one way or the other?

Link to comment
Share on other sites

Today we ran some tests with 0ad while I was connected via VPN, and Helicity was connected via school wifi. We first ran a controlled experiment without the MTU adjustment patch applied. As expected, the maximum outgoing packet size was 1400 data bytes, in fragments of 1400 bytes on wire and 28 bytes on wire, and the problem occurred where Helicity was disconnected after adjusting the number of player slolts to 5 or more.

Then I hosted a game with the patch applied that adjusts the MTU to 1392. Helicity connected, and I varied settings in the gamesetup that would normally have caused him to get disconnected. He remained connected.

We used cheat codes to create armies of 200 to 400 units and have them fight. It worked well, and Helicity remained connected until we ended the game voluntarily at about 6 minutes. During the test, the maximum outgoing packet size generated by 0ad was 1392 data bytes (1420 bytes on wire), of which there were 133 such packets, and none of which were fragmented.

 

Edited by Norse_Harold
  • Like 2
  • Thanks 1
Link to comment
Share on other sites

@Norse_Harold

Did great analysis. I remember some players were complaining not able to join my hosted games. I noticed that while using VPN, it made troubles while using VPN in and protocol was set to WireGuard I changed to OpenVPN and haven't heard any complains after. I'm assuming it confirms issue is related to that response on github link which posted by @hyperion

 

I can help you with testing if someones describes me what to do. Please have patience I'm smart but not as much as you guys.

image.thumb.png.32c2d2a9a7d0904e03d133cadd22785f.png

  • Like 2
Link to comment
Share on other sites

Today, HerkEule is on the list. However, the situation was quite strange: initially I joined him successfully and instantly without lag, but after one minute, I began to lose connection. After I tried to rejoin, the loading map page appeared.

Presumably, HerkEule didn't change his network settings so the MTU value wouldn't have changed, or at least neither I nor him were aware of a change in MTU.

In addition, all problematic hosts except Norse Harold are German. In fact, MarcAurel is the only German host who doesn't throw errors, although he does have stability issues.

Furthermore, I was able to join all of these hosts using my father's computer on a home network. When I use the school network, all sorts of problems occur.

I hope these new observations can provide some clues.

Do you think the MTU patch can be implemented into the default game in the next update? @Norse_Harold

Edited by Helicity
  • Thanks 1
Link to comment
Share on other sites

1 hour ago, Helicity said:

Do you think the MTU patch can be implemented into the default game in the next update? @Norse_Harold

I would advocate for a fix being implemented into the default game in the next update, but I'm not the decider of that. @Stan`what do you prefer?

 

1. Adjust the default MTU of 0ad to 1392.

2. Add a feature for users to override the MTU in the configuration and/or command line. (Note that without options 1 or 3, user education would still be necessary in order to actually solve the problem.)

3. Ask ENet to adjust the default MTU to 1392.

4. Options 1 and 3. Guard option 1 with a check of the version of ENet or a check of the value of ENET_HOST_DEFAULT_MTU. Undo option 1 when options 3 or 8 are implemented and used widely by the player base.

5. Add logic to 0ad to adjust the MTU of each connection between two peers to the minimum of those two peers.

6. Add logic to ENet to adjust the MTU of each connection between two peers to the minimum of those two peers.

7. Add logic to 0ad to sense a correct MTU.

8. Add logic to ENet to sense a correct MTU.

 

Notice that most or all options can be combined. They are listed roughly in ascending order of time required.

Edited by Norse_Harold
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

@Norse_Harold,

Had time to really dig into it. There are one or two bugs in enet depending on how you want to look at it, first enet uses MTU in a misleading way. Secondly during mtu negotiation on connect the server doesn't check for lower mtu and just uses the one passed by the client.

https://github.com/lsalzman/enet/issues/132

https://github.com/lsalzman/enet/pull/222

https://code.wildfiregames.com/D4967

  • Like 1
  • Thanks 4
Link to comment
Share on other sites

  • 1 month later...

In addition, if I try to join HerkEule, I get stuck on the loading map page. However, when HerkEule joins me, he doesn't see this error. When we have a large battle, HerkEule starts to loose connection. Whenever he rejoined, he was able to play for 5 seconds totally normally, then he starts to loose connection again. Could this be caused by the same MTU problem, or is it something else?

Link to comment
Share on other sites

2 hours ago, Helicity said:

Is it possible to implement the fix patch into the next release by default? It can count as a bug fix (so it can be added during feature freeze) and is quite urgent. Many players experience this problem.

Yes, it's already committed for the next release (for alpha 27), because it's a bugfix. See here for the proof. Notice that it says "Closed by commit rP27599: Use a lower default MTU for ENet hosts, and make it configurable. (authored by Itms)".

  • Thanks 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...