Jump to content

Parallel GPU-based A* algorithm in pathfinding


Recommended Posts

I recently got interested in helping with 0 A.D. development, especially optimization.

I was thinking how could we accelerate pathfinding and in consequence reduce lag.

It already seems quite optimized, A* with JPS optimization.

One of things that are still possible to do is to apply parallelism to pathfinding.

Toady I've found this: https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/download/9620/9366

By using method shown in the article we could not only run many pathfinding requests in parallel, but parallelize a single pathfinding request in a GPU-friendly way.

We could use CUDA or OpenCL to implement it.

Authors of the article have achieved 30x(EDIT: actually it's only x6 speedup. See my third post.) speedup of A* algorithm, which is quite considerable, however I am not sure how much would it speedup 0 A.D. pathfinding.

I am not familiar with 0 A.D. code enough to start working on it, or even assess how much could it actually speedup 0 A.D. pathfinging.

I have found few posts from 2012 about running multiple pathfinding queries in parallel(not parallelising single request), saying that this wouldn't be really useful for players without strong GPUs or multicore processors.

https://wildfiregames.com/forum/index.php?/topic/16018-supreme-commander-2-pathfinding/&tab=comments#comment-239673

Quote

We want the gameplay to work well for every player; and to support multiplayer and replays (and to preserve sanity when debugging problematic behaviour) we really need the pathfinder to give identical results for every player; so performance is constrained by the slowest player we want to bother supporting (which nowadays is probably still (just about) single-core CPU and middling Intel GPU, i.e. not much opportunity for parallel computing).

Well, today we have 2018(2019 soon), and even things as basic as desktop environments have multi core CPUs in their requirements, and most 3D RTS games have dedicated GPUs in their requirements.

Sure it would be much more useful for player with NVIDIA GTX 1080 than for player with intel i3 integrated GPU, however I think that speedup for most of players could still be considerable.

Maybe it will turn out too hard to implement, not as efficient as I assume here, not good for every player, etc...

But maybe it will turn out that it really speeds up pathfinding , at least for players with dedicated GPUs.

 

Edited by Kuba386
  • Like 4
Link to comment
Share on other sites

3 minutes ago, stanislas69 said:

Thanks for your interest in contributing. What would you need to get started ?

Oh, probably mostly some time. Code is quite well-organized and I'm quickly getting into it.

I'm probably too young to be considered a real programmer(15 years old), but I'm learning fast and already have some C++ experience (programming since 11 years old).

Probably many of you have programming experience much greater than 4 years but it's better than nothing.

My main area of interest is computer science (training hard to get some nice title in Polish Olympiad in Informatics), and that's why I'm mostly interested in things like pathfinder optimizations.

My amount of free time isn't very big, but maybe I could still help the project a bit. I will also have good chance to learn something.

  • Like 3
Link to comment
Share on other sites

1 minute ago, Kuba386 said:

I'm probably too young to be considered a real programmer(15 years old), but I'm learning fast and already have some C++ experience (programming since 11 years old).

Don't worry too much about that. Everyone has a different learning rate. We don't value people by their experience here, but by what they bring to the community. If you make the game faster I think a lot of people will enjoy it :)

2 minutes ago, Kuba386 said:

My amount of free time isn't very big, but maybe I could still help the project a bit. I will also have good chance to learn something.

We all have lives, (Some less than others) so any contribution however small is welcome.

Link to comment
Share on other sites

46 minutes ago, Kuba386 said:

By using method shown in the article we could not only run many pathfinding requests in parallel, but parallelize a single pathfinding request in a GPU-friendly way.

There're constraints when you try to move calculations to GPU. I.e. we have depended data (two units can't exist in the same position). The article you provided:

Quote

This step is straightforward, and the rationality is simply based on the observation that the computation of heuristic functions for each expanded state is mutually independent.

I had only a fast look and I'm not sure, but it seems that they calculate all states independently. Correct me, if I'm wrong.

15 minutes ago, Kuba386 said:

training hard to get some nice title in Polish Olympiad in Informatics

I'd recommend to train on codeforces.com, if you're not there yet :)

Link to comment
Share on other sites

6 hours ago, vladislavbelov said:

There're constraints when you try to move calculations to GPU

You mean time constraints? Yeah, copying data to GPU memory takes some time. I'm not yet sure how much would it be for this case. We could use shared memory, but only pascal(10xx) and higher nvidia cards support it

 

6 hours ago, vladislavbelov said:

I.e. we have depended data (two units can't exist in the same position)

Not sure what you mean here...

The time taken for copying data to GPU memory and copying result back is a real problem. Using shared memory seems to be only good solution, however it is only supported on nvidia cards with pascal or newer architecture. Not sure about AMD.

EDIT:

Wait... Hmmm... Maybe time needed to copy data from RAM to GPU memory won't be that much....

END_EDIT

Let's say it's possible to implement with CUDA shared memory. Then it would be a ~6x speedup (based on benchmarks in the article), however available only for owners of nvidia cards with  at least pascal architecture.

This is already not looking as good as it is in my first post here...

6 hours ago, vladislavbelov said:

I had only a fast look and I'm not sure, but it seems that they calculate all states independently. Correct me, if I'm wrong.

They calculate expanded states independently, only states that have the same 'parent'. Not all of states. Main purpose of this is to parallelize heuristic function, however parallelization of heuristic function is not actually needed for pathfinding purposes.

GA* algorithm parallelizes open list and heuristic function calls. In some usages of A* heuristic functions are a bottleneck, and then GA* algorithm provides great speedup (~x30).

However in pathfinding usages of A* heuristic functions are very simple and doesn't really need to be parallelized. That's why GA* algorithm only provides ~x6 speedup for pathfinding purposes.

 

6 hours ago, vladislavbelov said:

I'd recommend to train on codeforces.com, if you're not there yet :)

I'm already there, but recently has been very inactive there(like I forgot :)). Thank you for reminding me, I will take part in next round :)

 

Maybe running pathfinder in separate thread(s) would be a nice idea? I'm not sure, but I think I could do it. (This would however take me a while)

Edited by Kuba386
Link to comment
Share on other sites

55 minutes ago, Kuba386 said:

Maybe running pathfinder in separate thread(s) would be a nice idea? I'm not sure, but I think I could do it. (This would however take me a while)

You can try, patches are welcome, or you can start to look at current pathfinder patches :)

 

Link to comment
Share on other sites

Running pathfinder in separate thread and applying these GPU optimizations are one of last things we can do without completely redesigning pathfinder.

It already has lots od optimizations, looks that there is really nothing else to do.

If multithreading and GPU-based parallelization won't give us results, we would have to seriously think about redesigning pathfinder.

 

These are my toughts of pathfinding in 0 A.D. after first day spent on reading code and searching the internet for pathfinding algorithms.

Correct me if I understand something wrong.

Link to comment
Share on other sites

Yeah certainly it will. I said that this 'boost' may be not enough. Then only option would be redesigning pathfinding.

15 minutes ago, stanislas69 said:

The game is mainly using one thread.

This is mainly what's wrong with it.

 

I will try to multithread pathfinder, but many better programmers already failed to do so. Therefore I'm not sure how much I can do.

There are many other things to multithread. Like NetClient or map generation. I can try multithread them too, but I'm really not sure how much can I do here.

 

 

EDIT:

If I do it all I will become a multithreading specialist :)

 

EDIT 2:

I just realized that what I said here may bet quite not nice. I really understand and respect these huge amount of work people have already put into this game. Only pointing out things that are still left undone.

 

Edited by Kuba386
  • Like 2
Link to comment
Share on other sites

GPUs have much much better floating point performance and precision. And can run thousands of threads in parallel, in opposite to CPU. CPU is however much faster on single thread. That's why using GPU only makes sense when you can have high degree od parallelization.

8 minutes ago, Angen said:

And would it not be needed to use specific graphic card family, like for example just NVIDIA?

We could also Have OpenCL version(for AMD(and Intel?)).

OpenCL and CUDA are generally the same thing however CUDA only works for NVIDIA, and OpenCL is słowem.

Link to comment
Share on other sites

@Angen depends on what you do. But as much as I'd like our engine to compute stuff on the GPU (Youdbe surprise of what 0AD can run on.) for the sake of compatibility it's a very bad idea. If we keep both systems the code will become really messy. Maybe another use for the strategy design pattern.

@Kuba386 Don't be afraid to try it. At best you will learn a lot and make something really precious and at worst you'll still learn a lot.

No that many programmer tried to thread it. It was designed to be threaded that was what programmers worked on.The original one was way worse.

You're only as good as you let yourself be. If you don't try you'll never know.

Netclient will likely soon be threaded.

 

 

 

Link to comment
Share on other sites

13 hours ago, Kuba386 said:

There are many other things to multithread. Like NetClient or map generation. I can try multithread them too, but I'm really not sure how much can I do here.

NetClient is already multithreaded. Patches just not commited. (Nor uploaded by the author)

Regarding map generation, as you said, it currently runs on a single albeit a seperate thread. Although, I am skeptical of how much of a boost it would give to justify the additional work.

Unlike most things, it would be pretty hard to parallelize tasks in a mapscript. Every part of the map script depends on the result of an earlier part. So, you cant just do multiple things simultaneously. Most lag comes from the *huge* number of retries and point-in-tileclass tests anyway. Which maybe something that a thread could do better. Delegating some of the point testing to another thread. Apart from that, I am not really sure.

I am more onboard with optimizing the unoptimized JS code. Especially, the tileclasses and constraints system.

Edited by Guest
Link to comment
Share on other sites

8 hours ago, Kuba386 said:

GPUs have much much better floating point performance and precision.

Unfortunately not all cards has standard floating point precision, e.g. sqrt on different cards may be different and even different from IEEE standard for it. But I don't think floating point is really needed for pathfinding. If needed - there is a problem, because we should have the same result for all users.

11 hours ago, Kuba386 said:

We could use shared memory, but only pascal(10xx) and higher nvidia cards support it

There is DMA, it depends on drivers, but some middle cards support it.

11 hours ago, Kuba386 said:

Not sure what you mean here...

11 hours ago, Kuba386 said:

They calculate expanded states independently, only states that have the same 'parent'. Not all of states. Main purpose of this is to parallelize heuristic function, however parallelization of heuristic function is not actually needed for pathfinding purposes.

Example, we have the next simple field (0 - passable cell, 1 - wall):

0 1 0
0 0 0
0 1 0

Let's we have two units in (x=0, y=0) and (x=0, y=2), and they need to be in (x=2, y=1). How the parallelised algorithm will solve it? What would be an answer?

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

4 hours ago, vladislavbelov said:

How the parallelised algorithm would solve it?

If we only parallelize processing single request we could do it the same way we do it for unparallelized algorithm.

I don't have 100% understanding of current pathfinding code yet, put parallelization of single request seems not to produce any new problems.

However if we want to process multiple requests at once then it would be a little bit problematic.

Today when I'm back from school I AM going to write a prototype code for GA*  pathfinding.

Link to comment
Share on other sites

1 hour ago, Kuba386 said:

If we only parallelize processing single request we could do it the same way we do it for unparallelized algorithm.

Yes, but in case of N requests you need to do Application > Driver > GPU synchronisation N times, and update all data too (both on CPU and GPU to calculate next request).

1 hour ago, Kuba386 said:

However if we want to process multiple requests at once then it would be a little bit problematic.

That's what I'm talking about.

1 hour ago, Kuba386 said:

Today when I'm back from school I AM going to write a prototype code for GA*  pathfinding.

That's good. I wish you a nice coding regardless of result you'll get :)

  • Like 3
Link to comment
Share on other sites

I think reducing retries in pathfinder and threading the pathfinding calls should give us enough performance. Also iirc the bottleneck is not our global JPS pathfinder, but the local vector(not sure if it was really called like that) pathfinder. But @wraitii should be able to tell you more, as he already started improving the pathfinder some time ago (better shortrange pathfinder, less retries and threading). He uploaded some WIP on D13

  • Like 2
Link to comment
Share on other sites

I thought it was the other way around. Long range pathfinder being worse than the short range one. Although, I am not particularly sure.

In any case, I wouldnt say the long range one is of satisfactory quality just yet. I mean, the game almost freezes when the AI attacks. Which uses the longrange pathfinder for some of its attack-planning stuff.

Edited by Guest
Link to comment
Share on other sites

Just now, (-_-) said:

I thought it was the other way around. Long range pathfinder being worse than the short range one. Although, I am not particularly sure.

In any case, I wouldnt say the long range one is of satisfactory quality just yet. I mean, the game almost freezes when the AI attacks.

Nah the short one is the one that handles the obstructions constantly to go around buildings. The long range plans using bigger zones. When AI attacks I think two things happens. First the planning of the AI is done on the same thread as the rest, and thus slows everything, adds that the way the AIs handle units make them collide with each other a lot and that triggers the lag, because each unit is computing its own path taking into account all the other units around it.

Link to comment
Share on other sites

21 hours ago, stanislas69 said:

The game is mainly using one thread. Just using multiple core will give us a boost on most systems.

This is true. It's something I've been considering for Wyrmsun, too. But doing pathfinder multithreaded is very difficult to do without either 1. losing determinism or 2. losing the performance improvements multithreading would provide. Losing determinism is not as big a deal for single player games, but for multiplayer ones it is essential, or else different players will end up with different game states.

The UI could certainly be handled in a thread of its own, though (if it isn't already).

Edited by Andrettin
Link to comment
Share on other sites

Okay, at this point I see enough problems with GPU utilization on pathfinding.

Sorted from most important to less important:

- it would be hard to keep both GPU and CPU pathfinders deterministic and the same.

- not available for all players.

- will use GPU resources that are needed for rendering.

- probably not needed very much. Multithreading is more important here.

 

Therefore maybe utilizing GPU can wait. There are more important optimizations that have to be done first.

 

20 hours ago, Imarok said:

reducing retries in pathfinder

Retries?

On 12/14/2018 at 12:45 PM, Kuba386 said:

If we only parallelize processing single request we could do it the same way we do it for unparallelized algorithm. 

Are retries what we do in a problematic situation? (I haven't read all of the code yet, today I had no time to do it.) Well.... We should not only reduce number of retries but remove them. That is actually multiplying our time complexity by number of retries.(hmm... you probably are already perfectly aware of this sad fact(no, not probably, sure you are))

Well, seems that multithreading and reducing number of retries to 0 would solve our problems with pathfinding.

Not however sure if reducing retries to 0 is possibile or it has to be incredibly hard if we haven't done it yet.

Well there is still lot of work for me. Thats both good and bad :)

20 hours ago, Imarok said:

But @wraitii should be able to tell you more, as he already started improving the pathfinder some time ago (better shortrange pathfinder, less retries and threading). He uploaded some WIP oD13

Yeah, @Imarok working with already made pathes is certainly a good thing to do. And it would be nice to talk to people who already worked with pathfinder.

Tomorrow I'll probably have lot of time and I will complete reading the code and maybe start to write something.

 

EDIT:

This is a nice visualization of some pathfinding algortihms :)https://www.youtube.com/watch?v=rZHtHJlJa2w

 

Edited by Kuba386
  • Like 1
Link to comment
Share on other sites

On 12/14/2018 at 4:34 PM, Kuba386 said:

- it would be hard to keep both GPU and CPU pathfinders deterministic and the same.

Correct, this could easily break sync. GPUs and CPUs alike have various quirks when it comes to floating point. The standard itself is broken. We've dealt with that CPU side already, but moving it to the GPU you'd probably lack libraries for ensuring floating point consistency.

On 12/14/2018 at 4:34 PM, Kuba386 said:

- not available for all players.

- will use GPU resources that are needed for rendering.

Correct. If you want to mess with the GPU you'd be better off doing real graphics instead. :P

On 12/14/2018 at 4:34 PM, Kuba386 said:

- probably not needed very much. Multithreading is more important here.

The current pathfinder is actually inefficient in a lot of ways, not the least of which in the way it handles collisions (which really shouldn't be part of the pathfinder at all). Apparently it also uses flood-fill in a lot of situations which is horribly inefficient. @wraitii has been working on some sort of complex multi-stage patch to fix all of those issues, but along with a lot of other things it got stalled by the halted development. From what I saw of it the pathfinder is also ridiculously complex.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...