[25.04] Server Performance: Status Update

  • [25.04] Server Performance: Status Update

    Dear players,

    Last weekend, after we experienced extreme performance issues during the Invasion Day fights, I promised we would keep you informed about the progress being made on server performance in large fights.

    The short version is:
    • We’ve spent the week looking into the problem, running various networking tests and tweaking settings
    • We’ve decided to upgrade selected hardware which may be causing issues and in general decided to invest in additional redundancy
    • We’re hoping to roll out this additional hardware during tomorrow's (Friday April 26th) maintenance
    While we do hope that the additional hardware will bring some relief for the weekend, we’re still convinced we haven’t found ‘the thing’ that is causing these slowdowns. We currently still believe that one of the many improvements we made in preparation for the Free To Play launch is causing large fights to perform worse than they should, and we will continue to investigate.

    For those of you who are wondering why we’re not putting too much faith in additional server hardware, I’ve written up an in-depth explanation of the issues we’re facing below!

    We will continue to update you on this topic until we’re convinced large fight performance is as good as it can be.

    Sincerely,

    Robin ‘Eltharyon’ Henkys
    Game Director


    What is the problem?

    In short, the problem is that players involved in large fights experience severe ‘lag’. The term ‘lag’ describes all sorts of symptoms caused by communication issues between a game client and its servers. Typically it means that, while the rendering is unaffected (FPS are still high), the game becomes unresponsive to your commands, movement of your own and/or other characters becomes erratic and the often the impact of game actions (such as damage taken) will appear in sudden bursts on your game client, as information about them arrives much later than intended.

    What causes lag?

    There are many causes for lag. When lag occurs, it means that messages being sent back and forth between your computer and our servers are not being transported and/or processed as fast and reliable as they should be.

    This can be caused by a disruption in your local network (someone using all the available bandwidth to download/steam content for example), an interruption at your ISP, somewhere in transit or within our data center.

    If many players are experiencing lag at the same time, it is more likely the problem lies within our data center. When this occurs, there are generally two possible explanations: either a game server is unable to process all the messages it needs to receive and send out, or some piece of networking hardware (all the routers, switches firewalls etc. involved) is unable to put through the necessary bandwidth to send messages back and forth.

    Why does lag occur in big fights?

    Big fights are a particularly challenging case in terms of game engineering. This is due to fundamental mathematics and the nature of networking.

    In order for a player action to become visible on your client, the game server needs to send your computer a message about this action. In fact, every action you take, be it walking, casting a spell or chopping down a tree causes a message to be sent to the server and the server then distributes this information to all nearby players who should be able to see this action.

    This means, the more players are around to see an action you take, the more messages each action generates. Or, mathematically speaking: if n is the number of players nearby, a the number of actions a player takes on average and m the number of messages that need to be sent, then m = a*n².

    It's the squared bit that is causing difficulties in large fights and that puts an engineering limit on how many players can ever participate in a fight, as every additional player becomes significantly harder to process the more players there already are.

    To put this in perspective: If there are 100 players fighting, and each of them sends one action to everyone else, the server has to handle 100*99 = 9900 messages.
    If these players are running around in groups of two instead, all the server has to handle is 100*1 = 100 messages. In other words: in terms of stress on the server, 990 players could be duelling instead of 100 players having a fight together.

    It gets even worse when we get to 200 players. 200 players send messages to 199 players each, causing 200*199 = 39800 messages to be sent. This would be enough for 3980 players in pairs! It gets only worse from here, especially because this simplified scenario does not take into account additional multipliers which occur, like AoE attacks causing massive amounts of players to be hit (and massive amounts of messages being sent). In reality the numbers we are dealing with are much much larger, but the same principle applies.

    Due to the mathematics of this situation, not only are infinitely large fights impossible, even larger fights than a few hundred players are extremely challenging and increasing the maximum fighter count requires bigger and bigger optimizations to add fewer and fewer players into the mix.

    Despite all this, we’ve seen the Albion network code deal with well over 300 players in a fight reasonably well, which is why we’re particularly frustrated that right now problems seem to occur at far lower numbers than that.

    Why don’t you just add more servers?

    As you can see from the above explanation, the problems do not primarily occur due to the total number of players in the game. Albion’s game world is split amongst a large number of individual servers and the servers we have all show good health with the amount of players in the game world. Since most of the players are well distributed (and overload mode keeps hotspots like cities from becoming a problem), the amount of messages generated by the large player numbers is easily handled by all the different servers.

    The challenge in a battle that is taking place in a single zone is that it cannot handled by different computers. Synchronizing all the necessary information across several machines would be way too slow and actually slow down the entire process. Instead, a single machine has to be able to handle the entire fight. The quality of our individual machines is already at the upper limit of what you can get on the market, we cannot improve through further upgrades, especially when you take into account that significant improvements are needed to make a difference.

    This is why we’re focussing our search and improvements on other places which could be causing problems: network infrastructure and networking code.

    If infinitely large fights are not possible, what is your plan then?

    Right now we’re focussed on getting back to at least the same level of performance we’ve had before. Once we have managed to achieve that, we will continue to search for optimizations and improvements to push the number of players even higher.

    At the same time, we have to face the fact that there will always be more players wanting to fight than we can handle in a single fight. For this reason, we’re currently working on different concepts for handling overloaded fights. One solution we’re discussing involves temporarily putting weaker/uninvolved players in a stasis mode during large engagements, giving them a chance to wait until player number have gone down or being transferred to a nearby region.
    Implementing such a solution will be a priority once we’ve dealt with the immediate issues.

    The post was edited 1 time, last by Eltharyon ().

  • Great to hear, today we saw a MASSIVE performance improvement during ZvZ's throughout the day, we had zero lag issues & no rubber banding in a zone capped fight during the Warcamp in Well of the World.

    Castles went smooth as well, just with a little lag on engagements but far, far greater performance than we've seen over the last week or so.

    I'm glad you've wrote out and explained the issues here, it gives us, the players, a better understanding of what's going on and hopefully tomorrow's upgrades will see better improvements once more.


    clap
  • T-ox wrote:

    Holoin wrote:

    T-ox wrote:

    Well, if you turn off all name tags its way much better. Why is that?
    I think Devs mentioned in another post that they may have a memory leak related to nametags.
    Ofc there is leak with name tags in fight. Since always. But why is that?
    Regardless of any performance issues with nameplates, they are always adding to the overall client performance since those are additional UI widgets, which especially in big fights where you see dozens if not hundreds nameplates, need to be constantly updated.
  • Yeah, but no, the technology to deal with this situation is already live for years. I feel like i'm back in 2004. Nearly all mmorpg had some issues with big fight. But they all manage to deal with it. And that was decade ago ....

    Atm CCP with eve online, manage to have thousands of players fighting each other. And we aren't talking about the same scale at all between Eve and Albion.
  • Nimitz wrote:

    Yeah, but no, the technology to deal with this situation is already live for years. I feel like i'm back in 2004. Nearly all mmorpg had some issues with big fight. But they all manage to deal with it. And that was decade ago ....

    Atm CCP with eve online, manage to have thousands of players fighting each other. And we aren't talking about the same scale at all between Eve and Albion.
    Eve online doesn't handle big fights it slows big fights SBI has been trying to maintain the same feel across all fights.

    SBI has always been transparent with issues unfortunately lag whatever it's cause looks the same to the player base so huge numbers of players assume it's the same thing despite massive changes in both underlying game and player base leading to some thinking SBI doesn't care. It's obvious they do Albion is their baby they want it to be the best, they're invested in this game for the long haul. Must be super frustrating listening to the howls of frustration from the community especially when the frustration is misguided and arm chair experts offer up solutions way off the mark.

    Keep up the good work SBI.
  • Eltharyon wrote:

    Dear players,

    Last weekend, after we experienced extreme performance issues during the Invasion Day fights, I promised we would keep you informed about the progress being made on server performance in large fights.

    The short version is:
    • We’ve spent the week looking into the problem, running various networking tests and tweaking settings
    • We’ve decided to upgrade selected hardware which may be causing issues and in general decided to invest in additional redundancy
    • We’re hoping to roll out this additional hardware during tomorrow's (Friday April 26th) maintenance
    While we do hope that the additional hardware will bring some relief for the weekend, we’re still convinced we haven’t found ‘the thing’ that is causing these slowdowns. We currently still believe that one of the many improvements we made in preparation for the Free To Play launch is causing large fights to perform worse than they should, and we will continue to investigate.

    For those of you who are wondering why we’re not putting too much faith in additional server hardware, I’ve written up an in-depth explanation of the issues we’re facing below!

    We will continue to update you on this topic until we’re convinced large fight performance is as good as it can be.

    Sincerely,

    Robin ‘Eltharyon’ Henkys
    Game Director

    Allow me to comment on some particularities of your post, since I also work in this field, and I cannot agree with everything you are saying.

    Why does lag occur in big fights?

    It's not only in big fights, it happens everywhere, specially in Melee fights. This is not due to mathematics, but regarding the latency. I'll try to explain to people who do not understand networking. You can search for someting known as tracert (Trace Routing), where it tells you from your location, how many end devices (routers) will jump until it reaches the destination IP (also shows the latency between jumps). This itself, creates the "LAG", for EU, SEA and SA. For some, 150ms might be playable, others it won't work.

    Speaking in terms of networking, to help on the ammount of packets sent (the ammount of messages you are speaking of), try to invest in a load balancer / clustering, I'm sure you can hire someone or a company in this field to help you with this. Even with QoS (Quality of Service) to prioritize specific type of packets.


    Why don’t you just add more servers?
    Most of the people asking for servers, are not requesting because of the ZvZ, but because the PvP itself. The hangs, the warps, mostly everything. We (as players and myself included, not everyone though) requested a server located in EU due to the latency. Even in the arenas, 5 vs 5, I get hit by a melee in like 5cm away, I can't kite with the latency at all, and I even tried a Frost Build, whereas a slow should be applied (sorry speaking a little bit frustrated in here).

    If it was a hardware problem, one of the solutions would be to implement a Cluster infrastructure as I have said prior to this. I can give you an example of hundres of players fights, in raid castle (Lineage 2 - please delete the game name if it's against the rules of mentiong it), and you do not see this kind of behaviour.


    If infinitely large fights are not possible, what is your plan then?

    Try to hire outsourcing services to provide you networking solutions.

    Sincerely, Albion's Team should have a meeting regarding the infrastructure as a whole. If you want to grow with Europe players, you need an Europe Server. People will eventually get tired of the non-responsive the game is. Just from 22th to 25th April, you already lost 2k players according to the Steam. If it keeps going like this, you will lose a lot more.
  • Okay so how about this idea:

    Once your pre-designated limit that can be confirmed to run optimally has been reached in a pvp zone, a gatekeeper npc can be spawned that blocks the path for anyone else entering that zone.

    It takes no damage so it won't be sending more server messages since it cannot be attacked.

    This would create a temporary limit around the players in the pvp area that is creating server messages above a certain threshhold.

    Once the pvp zone has settled and decreased below a certain limit, that gatekeeper npc can de-spawn and again allow access to that pvp area.

    One other suggestion:

    In another mmo i've played they had a 'battle mode' which basically normalises all players gear visually so it all uses the same player models / armors (ultimately aiming to reduce the amount of variations of armors visible just for the large-scale pvp areas). Battle mode was selectable as a client-side toggle.
    » ᴘ ᴇ ɴ ɢ ᴜ ɪ ɴ • s ɴ ɪ ᴘ ᴇ ʀ « bit.ly/pokerface-albion

    The post was edited 1 time, last by PenguinSniper ().

  • Korn wrote:

    Storydor wrote:

    I think the only thing that would make me feel good would be if you just said:

    "Sorry, we screwed up big time. We've secretly been remaking this game in another engine that handles networking and scaling better. In a month we will relaunch the game in Unreal/whatever and things will most likely be better"
    Unfortunately, it's not a hardware issue. If the performance could be increased by adding more hardware, we would instantly do that.
    so now it could be hardware fault? you guys have said for years it wasnt hardware
  • Serbal wrote:

    Korn wrote:

    Storydor wrote:

    I think the only thing that would make me feel good would be if you just said:

    "Sorry, we screwed up big time. We've secretly been remaking this game in another engine that handles networking and scaling better. In a month we will relaunch the game in Unreal/whatever and things will most likely be better"
    Unfortunately, it's not a hardware issue. If the performance could be increased by adding more hardware, we would instantly do that.
    so now it could be hardware fault? you guys have said for years it wasnt hardware
    I highlighted the relevant part above. In the discussion that your quote is from, there was the question if we could fix performance issues by essentially buying more servers.

    I am not a tech guy myself, prior to becoming involved with game development I always wrongly assumed that you could improve an online game's performance if you just throw more and more hardware at it, and if a game lagged / had performance issues, it was because the developers did not spend enough money on servers / did not get enough of them.

    As the detailed statement above explains, the performance topic is much more complicated and indeed cannot be solved by simply adding more hardware. This means that there is no easy fix, rather, it's a ongoing optimization challenge.
  • Korn wrote:

    Serbal wrote:

    Korn wrote:

    Storydor wrote:

    I think the only thing that would make me feel good would be if you just said:

    "Sorry, we screwed up big time. We've secretly been remaking this game in another engine that handles networking and scaling better. In a month we will relaunch the game in Unreal/whatever and things will most likely be better"
    Unfortunately, it's not a hardware issue. If the performance could be increased by adding more hardware, we would instantly do that.
    so now it could be hardware fault? you guys have said for years it wasnt hardware
    I highlighted the relevant part above. In the discussion that your quote is from, there was the question if we could fix performance issues by essentially buying more servers.
    I am not a tech guy myself, prior to becoming involved with game development I always wrongly assumed that you could improve an online game's performance if you just throw more and more hardware at it, and if a game lagged / had performance issues, it was because the developers did not spend enough money on servers / did not get enough of them.

    As the detailed statement above explains, the performance topic is much more complicated and indeed cannot be solved by simply adding more hardware. This means that there is no easy fix, rather, it's a ongoing optimization challenge.
    Actually, it does involve more hardware, just not servers. Instead, network equipments for load balancing. But that's the datacenter competency where they host the servers, unless, it's located in Albion's HQ. And that is, if it's really related to networking.
  • Eltharyon wrote:

    Dear players,

    Last weekend, after we experienced extreme performance issues during the Invasion Day fights, I promised we would keep you informed about the progress being made on server performance in large fights.

    The short version is:
    • We’ve spent the week looking into the problem, running various networking tests and tweaking settings
    • We’ve decided to upgrade selected hardware which may be causing issues and in general decided to invest in additional redundancy
    • We’re hoping to roll out this additional hardware during tomorrow's (Friday April 26th) maintenance
    While we do hope that the additional hardware will bring some relief for the weekend, we’re still convinced we haven’t found ‘the thing’ that is causing these slowdowns. We currently still believe that one of the many improvements we made in preparation for the Free To Play launch is causing large fights to perform worse than they should, and we will continue to investigate.

    For those of you who are wondering why we’re not putting too much faith in additional server hardware, I’ve written up an in-depth explanation of the issues we’re facing below!

    We will continue to update you on this topic until we’re convinced large fight performance is as good as it can be.

    Sincerely,

    Robin ‘Eltharyon’ Henkys
    Game Director


    What is the problem?

    In short, the problem is that players involved in large fights experience severe ‘lag’. The term ‘lag’ describes all sorts of symptoms caused by communication issues between a game client and its servers. Typically it means that, while the rendering is unaffected (FPS are still high), the game becomes unresponsive to your commands, movement of your own and/or other characters becomes erratic and the often the impact of game actions (such as damage taken) will appear in sudden bursts on your game client, as information about them arrives much later than intended.

    What causes lag?

    There are many causes for lag. When lag occurs, it means that messages being sent back and forth between your computer and our servers are not being transported and/or processed as fast and reliable as they should be.

    This can be caused by a disruption in your local network (someone using all the available bandwidth to download/steam content for example), an interruption at your ISP, somewhere in transit or within our data center.

    If many players are experiencing lag at the same time, it is more likely the problem lies within our data center. When this occurs, there are generally two possible explanations: either a game server is unable to process all the messages it needs to receive and send out, or some piece of networking hardware (all the routers, switches firewalls etc. involved) is unable to put through the necessary bandwidth to send messages back and forth.

    Why does lag occur in big fights?

    Big fights are a particularly challenging case in terms of game engineering. This is due to fundamental mathematics and the nature of networking.

    In order for a player action to become visible on your client, the game server needs to send your computer a message about this action. In fact, every action you take, be it walking, casting a spell or chopping down a tree causes a message to be sent to the server and the server then distributes this information to all nearby players who should be able to see this action.

    This means, the more players are around to see an action you take, the more messages each action generates. Or, mathematically speaking: if n is the number of players nearby, a the number of actions a player takes on average and m the number of messages that need to be sent, then m = a*n².

    It's the squared bit that is causing difficulties in large fights and that puts an engineering limit on how many players can ever participate in a fight, as every additional player becomes significantly harder to process the more players there already are.

    To put this in perspective: If there are 100 players fighting, and each of them sends one action to everyone else, the server has to handle 100*99 = 9900 messages.
    If these players are running around in groups of two instead, all the server has to handle is 100*1 = 100 messages. In other words: in terms of stress on the server, 990 players could be duelling instead of 100 players having a fight together.

    It gets even worse when we get to 200 players. 200 players send messages to 199 players each, causing 200*199 = 39800 messages to be sent. This would be enough for 3980 players in pairs! It gets only worse from here, especially because this simplified scenario does not take into account additional multipliers which occur, like AoE attacks causing massive amounts of players to be hit (and massive amounts of messages being sent). In reality the numbers we are dealing with are much much larger, but the same principle applies.

    Due to the mathematics of this situation, not only are infinitely large fights impossible, even larger fights than a few hundred players are extremely challenging and increasing the maximum fighter count requires bigger and bigger optimizations to add fewer and fewer players into the mix.

    Despite all this, we’ve seen the Albion network code deal with well over 300 players in a fight reasonably well, which is why we’re particularly frustrated that right now problems seem to occur at far lower numbers than that.

    Why don’t you just add more servers?

    As you can see from the above explanation, the problems do not primarily occur due to the total number of players in the game. Albion’s game world is split amongst a large number of individual servers and the servers we have all show good health with the amount of players in the game world. Since most of the players are well distributed (and overload mode keeps hotspots like cities from becoming a problem), the amount of messages generated by the large player numbers is easily handled by all the different servers.

    The challenge in a battle that is taking place in a single zone is that it cannot handled by different computers. Synchronizing all the necessary information across several machines would be way too slow and actually slow down the entire process. Instead, a single machine has to be able to handle the entire fight. The quality of our individual machines is already at the upper limit of what you can get on the market, we cannot improve through further upgrades, especially when you take into account that significant improvements are needed to make a difference.

    This is why we’re focussing our search and improvements on other places which could be causing problems: network infrastructure and networking code.

    If infinitely large fights are not possible, what is your plan then?

    Right now we’re focussed on getting back to at least the same level of performance we’ve had before. Once we have managed to achieve that, we will continue to search for optimizations and improvements to push the number of players even higher.

    At the same time, we have to face the fact that there will always be more players wanting to fight than we can handle in a single fight. For this reason, we’re currently working on different concepts for handling overloaded fights. One solution we’re discussing involves temporarily putting weaker/uninvolved players in a stasis mode during large engagements, giving them a chance to wait until player number have gone down or being transferred to a nearby region.
    Implementing such a solution will be a priority once we’ve dealt with the immediate issues.
    It sounds like you are using a unicast type logic. Does your network stack/game engine/game logic allow the concept of multicast? Essentially with this method, as a player zones in they "subscribe" to game messaging occurring in that zone. A player performs an activity such as an AoE. This would have a target location with an impacted area. The game engine would decide who is impacted and how, then package that in to a multicast message that all clients would receive, but that message would include a client id list and only clients in that list would process the message.
  • With all due respect, the technical explanation is just verbiage, I do not care that my doctor tells me how red blood cells work, I go to the doctor for a cure.

    Long before the f2p the game was already getting annoying, I strongly agree that before I had a better performance with a higher number of players, but I want to insist, the game started to fail at least 3 or 4 weeks before the f2p and you did not do Nothing, the players did notice, and sbi did nothing.

    Only when the problem is enhanced with the f2p are supposed to start doing something, and well, still nothing ... even bugs are appearing that last week did not exist, [dead people, still moving naked]

    We are all aware that these problems are not resolved from one day to another or one week to another, I just want to emphasize the negligence and relaxation with which they took the issue, the publicity and negative reviews in the release are already insurmountable.