Jump to content

less precise sorting in range manager


Recommended Posts

Discussion about the overkill situation brought me to this vague idea: consider more units at an equal range, so that units spread out their damage a little. Ideally, a target precision value could be stored in the templates (higher for archers, lower for skirms for example) and then used to influence the 'EntityDistanceOrdering' on certain calls to the range manager. (Things like choosing the closest unit don't need to be super precise and maybe shouldn't be wrt overkill).

https://code.wildfiregames.com/source/0ad/browse/ps/trunk/source/simulation2/components/CCmpRangeManager.cpp#:~:text=class EntityDistanceOrdering,}%3B

Perhaps some replacement for .CompareLengths() could be used with an optional precision value (0.1,0.5,1,3,5,10).

Since stable_sort is used, maybe this would also be faster with more 'equivalent' values?

Link to comment
Share on other sites

Posted (edited)
		int CompareLengthSquared(u64 cmpSquared) const
	{
		u64 d2 = SQUARE_U64_FIXED(X) + SQUARE_U64_FIXED(Y); // d2 <= 2^63 (no overflow)

		if (d2 < cmpSquared)
			return -1;

		if (d2 > cmpSquared)
			return +1;

		return 0;
	}

https://code.wildfiregames.com/source/0ad/browse/ps/trunk/source/maths/FixedVector2D.h#:~:text=int CompareLengthSquared(,}

ok so, yes the precision for lengths is 1 meter, but the differences are found for x^2+y^2, so for a 70 meter range archer, 1 meter out of up to 4900 is very little.

So basically, the precision when sorting lengths gets higher the farther away the unit is: a single integer represents a range differences of 0.014 meters near 70 meters range. The relationship looks like this:

image.thumb.png.11e67fe9a860908f247b23046c139f2a.png

The only thing is the relationship needs to be a parabola, so you can't just do (x^3/Px^2) because this would incorrectly rank units distances.

I suppose the thing to do would be to use an integer square root estimation algorithm, and then integer division by the precision value. Then, the profiling would be needed to see if the potential improvements to sorting would be worth it. Even if its pretty similar, I think the impacts on the overkill situation would be nice.

https://stackoverflow.com/questions/34187171/fast-integer-square-root-approximation

Edited by real_tabasco_sauce
Link to comment
Share on other sites

Posted (edited)

I was curious so I did some testing in this c++ reference: https://en.cppreference.com/w/cpp/algorithm/stable_sort

and it seems like stable_sort() is faster when there are redundant values, but when they are made quite different on purpose, sort() is faster.

Spoiler

#include <algorithm>
#include <array>
#include <iostream>
#include <string>
#include <vector>
#include <time.h>

struct Employee
{
    int age;
    std::string name; // Does not participate in comparisons
};
 
bool operator<(const Employee& lhs, const Employee& rhs)
{
    return lhs.age < rhs.age;
}
 
#if __cpp_lib_constexpr_algorithms >= 202306L
consteval auto get_sorted()
{
    auto v = std::array{3, 1, 4, 1, 5, 9};
    std::stable_sort(v.begin(), v.end());
    return v;
}
static_assert(std::ranges::is_sorted(get_sorted()));
#endif
 
int main()
{
    clock_t tStart = clock();
    std::vector<Employee> v{{18, "Zaphod"}, {113, "Ford"}, {128, "Ford"}, {28, "Zaphod"}, {100, "Zaphod"}, {88, "Zaphod"}, {108, "Zaphod"}, {32, "Arthur"}, {42, "Arthur"}, {32, "Arthur"}, {92, "Arthur"}, {16, "Ford"}, {62, "Arthur"}, {74, "Arthur"}, {108, "Ford"}};
 
    std::sort(v.begin(), v.end());
 
    for (const Employee& e : v)
        std::cout << e.age << ", " << e.name << '\n';
    printf("Time taken: %.9fs\n", (double)(clock() - tStart)/CLOCKS_PER_SEC);
}

Edited by real_tabasco_sauce
Link to comment
Share on other sites

Posted (edited)

Ok, I like the looks of this approximation:

(from https://stackoverflow.com/questions/4930307/fastest-way-to-get-the-integer-part-of-sqrtn/63457507#63457507 and https://stackoverflow.com/questions/34187171/fast-integer-square-root-approximation)

#include <algorithm>
#include <array>
#include <iostream>
#include <string>
#include <vector>
#include <math.h>
#include <time.h>

unsigned char bit_width(unsigned long long x) {
    return x == 0 ? 1 : 64 - __builtin_clzll(x);
}

unsigned sqrt_lo(const unsigned n) noexcept
{
    unsigned log2floor = bit_width(n) - 1;
    return (unsigned) (n != 0) << (log2floor >> 1);
}

unsigned sqrt_newton(const unsigned n, unsigned int P)
{
    unsigned a = sqrt_lo(n);
    unsigned b = n;
    
    // compute unsigned difference
    while (std::max(a, b) - std::min(a, b) > P*P) { //P*P reduces iterations of while loop when using higher P.
        //printf("iteration\n");
        b = n / a;
        a = (a + b) / 2;
    }

    // a is now either floor(sqrt(n)) or ceil(sqrt(n))
    // we decrement in the latter case
    // this is overflow-safe as long as we start with a lower bound guess
    return a - (a * a > n); //could remove - (a * a > n); if using low precision.
}

int main()
{
    unsigned Nums[29] = {27,28,29,30,31,32,33,34,35,36,37,38,39,40,44,45,46,47,48,58,59,60,61,62,98,99,100,101,102};
    
    int P = 5;
    clock_t tStart = clock();
    printf("Range value\tEstimate\tExact\n");
    for (int i=0;i<29;i++){
        unsigned test = Nums[i]*Nums[i];
        printf("%i\t",Nums[i]);
        printf("%i\t",sqrt_newton(test,P)/P);
        printf("%i\n",(int)floor(sqrt(test))/P);
    }
    //printf("Estimate: %i\t",sqrt_newton(test)/P);
    //printf("Exact: %i\n",(int)floor(sqrt(test))/P);
    printf("Time: %.9fs\n", (double)(clock() - tStart)/CLOCKS_PER_SEC);
}

results:

P = 1

Spoiler

 

Range value	Estimate	Exact
27	27	27
28	28	28
29	29	29
30	30	30
31	31	31
32	32	32
33	33	33
34	34	34
35	35	35
36	36	36
37	37	37
38	38	38
39	39	39
40	40	40
44	44	44
45	45	45
46	46	46
47	47	47
48	48	48
58	58	58
59	59	59
60	60	60
61	61	61
62	62	62
98	98	98
99	99	99
100	100	100
101	101	101
102	102	102
Time: 0.000056000s
0.000003000s without printing

P = 3:

Spoiler
Range value	Estimate	Exact
27	9	9
28	9	9
29	9	9
30	10	10
31	10	10
32	10	10
33	11	11
34	11	11
35	11	11
36	12	12
37	12	12
38	12	12
39	13	13
40	13	13
44	14	14
45	15	15
46	15	15
47	15	15
48	16	16
58	19	19
59	19	19
60	20	20
61	20	20
62	20	20
98	32	32
99	33	33
100	33	33
101	33	33
102	34	34
Time: 0.000062000s
0.000002000s without printing

P = 5

Spoiler
Range value	Estimate	Exact
27	5	5
28	6	5
29	6	5
30	7	6
31	7	6
32	6	6
33	6	6
34	6	6
35	7	7
36	7	7
37	7	7
38	7	7
39	7	7
40	8	8
44	9	8
45	9	9
46	9	9
47	9	9
48	10	9
58	11	11
59	11	11
60	12	12
61	12	12
62	12	12
98	19	19
99	19	19
100	20	20
101	20	20
102	20	20
Time: 0.000055000s
0.000002000s without printing

P = 10:

Spoiler
Range value	Estimate	Exact
27	2	2
28	3	2
29	3	2
30	3	3
31	3	3
32	3	3
33	3	3
34	3	3
35	3	3
36	3	3
37	3	3
38	3	3
39	3	3
40	4	4
44	4	4
45	4	4
46	4	4
47	4	4
48	5	4
58	6	5
59	6	5
60	7	6
61	7	6
62	7	6
98	10	9
99	10	9
100	10	10
101	11	10
102	11	10
Time: 0.000060000s
0.000002000s without printing

The only issue I found is that with 5 >= P >= 9, there are a couple points where the estimate is higher than estimates of larger ranges. (see P = 5)

I could just go ahead and use this and see what happens. (id have to figure out the windows build process first).

Edited by real_tabasco_sauce
  • Like 2
Link to comment
Share on other sites

8 minutes ago, real_tabasco_sauce said:

Also, my windows build with VS was very very slow without any changes at all. Is that the case for others?

How slow? On a good machine it's about 2-4 mins.

30 minutes ago, real_tabasco_sauce said:

https://code.wildfiregames.com/D5282

it built, but didn't do anything.

Actually it failed to build :P 

What did you expect

  • Like 1
Link to comment
Share on other sites

2 hours ago, Stan&#x60; said:
2 hours ago, real_tabasco_sauce said:

Also, my windows build with VS was very very slow without any changes at all. Is that the case for others?

How slow? On a good machine it's about 2-4 mins.

I meant the game once it is built. For me, the build process is quite fast, under a minute.

in game, we are talking 1 to 4 fps as soon as I start moving the camera.

2 hours ago, Stan&#x60; said:

Actually it failed to build :P 

on anything but windows, yeah that's (hopefully) because I don't yet have code to use the GCC builtin.

Link to comment
Share on other sites

13 hours ago, real_tabasco_sauce said:

 

in game, we are talking 1 to 4 fps as soon as I start moving the camera.

That's because you built in debug and not in release mode. :)

13 hours ago, real_tabasco_sauce said:

on anything but windows, yeah that's (hopefully) because I don't yet have code to use the GCC builtin.

You can't assume gcc sadly. There is also clang. Also maybe the intrisic you use isn't available on ARM.

Link to comment
Share on other sites

3 hours ago, Stan&#x60; said:

That's because you built in debug and not in release mode. :)

ah, makes sense.

3 hours ago, Stan&#x60; said:
16 hours ago, real_tabasco_sauce said:

on anything but windows, yeah that's (hopefully) because I don't yet have code to use the GCC builtin.

You can't assume gcc sadly. There is also clang. Also maybe the intrisic you use isn't available on ARM.

Oh, so there's no way to check the compiler? I guess that makes sense lol. That definitely throws a wrench in it.

Link to comment
Share on other sites

11 minutes ago, real_tabasco_sauce said:

Oh, so there's no way to check the compiler? I guess that makes sense lol. That definitely throws a wrench in it.

Sorry wasn't clear. Point is you *can* check the compiler, but you have to account for a few ones :)

And you *also* have to take the cpu architecture Risc v, power pc, i386, x64, elbrus2k, arm, aarch64 etc...

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...
Posted (edited)

@Stan`

will all the builds have access to this in SDL2?

https://code.wildfiregames.com/source/0ad/browse/ps/trunk/libraries/win32/sdl2/include/SDL/SDL_bits.h#:~:text=SDL_MostSignificantBitIndex32(Uint32,return msbIndex%3B

it already does what I need but using u32 instead of u64.

my build is working fine, but my changes make absolutely no effect on anything.

Edited by real_tabasco_sauce
Link to comment
Share on other sites

I don't know you have to check the SDL2 versions to know what's the minimum version that supports it.

3 hours ago, real_tabasco_sauce said:

my build is working fine, but my changes make absolutely no effect on anything.

Well you can try printing stuff with LOGWARNING to see if that function is actually being used.

If it is you might use the profiler inside the game (there are some docs on trac) to compare between versions.

If it's not you have a bug :)

And eventually if it doesn't yield any result maybe it's because it's not a bottleneck :)

  • Thanks 1
Link to comment
Share on other sites

Posted (edited)
3 hours ago, Stan` said:

 

And eventually if it doesn't yield any result maybe it's because it's not a bottleneck :)

Well, the primary goal isn't performance: it is to evaluate closeness with less precision to avoid overkill. However, if it is faster than the current sqrt that would be cool.

Ok thanks so much for the help! The function is being called and everything, but either the square root doesn't happen at all, or the fixed datatype is remaining fixed.

I suppose it could also have to do with u64 vs u32

Edited by real_tabasco_sauce
Link to comment
Share on other sites

Ok so the square root function works, its just the numbers involved are still huge. So that's why there is no obvious impact on the game.

I thought meters were the fundamental unit of distance, like for entity_pos_t instances, so how does the square of the x component of a vector get this large? 13035623883809.

maps are maybe between 1000 and 10000 meters across, but the square of these numbers doesn't come close to the above number, which was collected from some archer fighting.

Link to comment
Share on other sites

It already often occurs that when pressing formations, some units target totally random entities (including buildings, and not even the one that are closest). I can't figure out why that is but it's a annoying bug because half your units will run to go capture a house instead of attacking units (in their really attack range).
I fear that adding arbitrary functions as you suggest won't make unit behavior more intuitive. What's needed to mitigate overkills ect might just be a feature. With more experience into the game now I believe a good feature would be the ability to spread damage accross a selection of enemy units with box selection.
Description:
1: Select units to order (your archers for example)
2: Use the feature's hotkey or button
3: Box select over enemy unitsscreenshot0005.thumb.png.cab946c38081f114e5dc3b346a6fc5d3.png
=> Your selected archers will target units in the box. Example, if you have 60 archers and hovered 30 enemy units with the box selection then 2 archers will attack each enemy.

This feature would, in my opinion, give a counter-action possible to unit dances, simplify sniping and in general make micro probably more interesting.

  • Like 2
Link to comment
Share on other sites

4 hours ago, real_tabasco_sauce said:

Ok so the square root function works, its just the numbers involved are still huge. So that's why there is no obvious impact on the game.

I thought meters were the fundamental unit of distance, like for entity_pos_t instances, so how does the square of the x component of a vector get this large? 13035623883809.

maps are maybe between 1000 and 10000 meters across, but the square of these numbers doesn't come close to the above number, which was collected from some archer fighting.

The game uses integers for its calculations, because floating point are note reliable on different machines, so we need to have the exact same values on each machines else the game will go out of sync pretty quickly. So I suppose what you are seeing are decimal values represented as int.

1 hour ago, Atrik said:

Box select over enemy units

https://code.wildfiregames.com/D1971

You can find most gameplay interesting patches by querying for @Freagarach on Phabricator https://code.wildfiregames.com/search/query/zNwhd_3YKp.n/#R

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Posted (edited)
11 hours ago, Stan` said:
16 hours ago, real_tabasco_sauce said:

Ok so the square root function works, its just the numbers involved are still huge. So that's why there is no obvious impact on the game.

I thought meters were the fundamental unit of distance, like for entity_pos_t instances, so how does the square of the x component of a vector get this large? 13035623883809.

maps are maybe between 1000 and 10000 meters across, but the square of these numbers doesn't come close to the above number, which was collected from some archer fighting.

The game uses integers for its calculations, because floating point are note reliable on different machines, so we need to have the exact same values on each machines else the game will go out of sync pretty quickly. So I suppose what you are seeing are decimal values represented as int.

Ok, I can do it if I can just figure out how the integer representation of float is done. Then I would just need to add the necessary arguments to pass the needed precision value up to the square root. I suppose EntityDistanceOrdering would need to accept P as a parameter, and then queries that don't need so much precision could use the alternative sqrt.

if isqrtEst is faster than the current isqrt, i guess it would be fine to just use isqrtEst with a P of 1. With p=1 the result is always floor(sqrt(x)).

Edited by real_tabasco_sauce
Link to comment
Share on other sites

Ok so it works now, the skiritai are not sorted.

Square rooting and dividing by 10^5*P pretty much delivers what I showed in the tables above.

The video is with P=10 to exaggerate the effects. I wonder now if a tiebreaker would be necessary for more common cases like P=1,2, or 3. Something like a random shuffle if the low-precision distances are "the same".

  • Like 1
Link to comment
Share on other sites

Posted (edited)

Just doing a couple rounds of combat demo huge, I found roughly 13-14 minimum fps go down to 11-12fps.

The video was at P=10. I added a pseudorandom tie breaker so the fighting looks like this now (P=3):

I think 3 is ideal for archers. I bet 2 would be good for slingers/han crossbows and 1 for skirms, melee units. I suppose the next thing to do would be to figure out how the unit templates can determine P. I guess it involves modifying the UnitAI schema and adding P as an argument to the query setup calls in unitAI.

Also, if someone wants to apply the patch and test, that would be awesome.

Edited by real_tabasco_sauce
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...