xela wrote:
jann wrote:
Another example is when you find an otherwise weaker side ahead, because of higher extent of tree reuse (thus effectively more but weaker search).
This one I'm still finding hard to imagine. Tree reuse happens when the opponent plays a move that you've already explored, so you can reuse that part of the tree. Tree reuse is maximised when the opponent plays the most explored move, which is often the move that you assess as best.
The point here that the less of a branching factor a policy has, the narrower tree it builds in its memory, the more the potential for reuse. So if the weaker net only looks at 2 moves everywhere (vs, say, 3 for the stronger one), it may be weaker with blind spots, but it will benefit from tree reuse more.
As mentioned earlier, this may account for a, say, 1.5x search speed advantage. Then if you don't know this specifically, and only have the result of a test match at 1000 playouts, you are less likely to correctly predict the result of 10000 playouts match (your actual use case). Basically you are in same situation like if you did time based test on unknown hardware - there is an unclear speed related factor that affects your results in unknown ways and extent, and not necessarily the same way during test than during later usage (speed vs strength works very differently at low search than at high search).
The wider your test is, the more factors you allow to affect its result, the more tests you need to perform to get the same knowledge/confidence (because first you need to guess each individual factor from the results). Again this is for the case where your use case / conditon is significantly different and you cannot test on it directly (otherwise you don't need to know individual factors and are fine with a single test there, since you can be sure all factors will work the same way during test than during later usage).
Limeztone wrote:
jann wrote:
...whichever side wins at 1000 visits will likely also win at 10000 visits.
How do you reach that conclusion?
Look above where we talked about the linked scalability graph. Same winner at 1000 visits as at 10000 visits = curves don't cross the 1.0 line. Unknown bonus from tree reuse = different winner at 1000 vs 2000 visits than at 10000 vs 20000 visits = some curves cross the 2.0 line etc.