Life In 19x19 :: Exploring LZ's search algorithm: worked examples

Life In 19x19 http://prod.lifein19x19.com/

Exploring LZ's search algorithm: worked examples http://prod.lifein19x19.com/viewtopic.php?f=18&t=17181	Page 1 of 3

Author:	xela [ Tue Jan 07, 2020 6:33 pm ]
Post subject:	Exploring LZ's search algorithm: worked examples
For a while I've been playing with the LZ source code to learn a bit more about how it works. I'm now at a point where I can output a CSV file with a trace of exactly which variations it's explored, in what order, and "why" (in terms of the various numbers that get calculated along the way). Over the next few weeks I'd like to examine some positions where we've been saying "why did LZ do that???" and see how much we can explain. To start with, I'm looking at the position in chapter 1 of James Davies's book Tesuji. This should be relatively uncontroversial, and then I'll look for some different examples. I know there were some curly situations discussed in these forums a few months back, but I don't have them at my fingertips right now. Suggestions are welcome! Click Here To Show Diagram Code `[go]$$Bc Black to play and capture some white stones $$-------------+ $$ . . . . . . \| $$ . O O . . . \| $$ . . X O O . \| $$ . . X X O . \| $$ . . X . . . \| $$ . . X O . . \| $$ . . X O . . \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X X . \| $$ . . . . . . \|[/go]` Davies uses this for an extended discussion of how to read. For this position, I can break down his discussion into four phases. Details behind the cut for those who want to have a go as solving it first. First, he puts down "the obvious move" and the "obvious" (to his target audience?) reply: Click Here To Show Diagram Code `[go]$$Bc The obvious way to start $$-------------+ $$ . . . . . . \| $$ . O O . . . \| $$ . . X O O . \| $$ . . X X O . \| $$ . . X 2 1 . \| $$ . . X O . . \| $$ . . X O 3 . \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X X . \| $$ . . . . . . \|[/go]` Success! Black has captured some stones. But then Davies cautions us to look a bit deeper. What about other white replies? Click Here To Show Diagram Code `[go]$$Bc White's options $$-------------+ $$ . . . . . . \| $$ . O O . . . \| $$ . . X O O . \| $$ . . X X O . \| $$ . . X . 1 b \| $$ . . X O a c \| $$ . . X O . . \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X X . \| $$ . . . . . . \|[/go]` For white a or b, we see some variations where white can connect up if black is careless, and then we get black's correct answer. For white c, there's the following variation: Click Here To Show Diagram Code `[go]$$Bc Black's failure $$-------------+ $$ . . . . . . \| $$ . O O . . . \| $$ . . X O O . \| $$ . . X X O . \| $$ . . X 5 1 3 \| $$ . . X O 4 2 \| $$ . . X O . 6 \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X X . \| $$ . . . . . . \|[/go]` Black can split white into two groups, but each piece lives independently. At this point Davies breaks off to look at some other choices for , and wonders whether black should tenuki instead of playing out this position. Then at the end of the chapter, he reassesses the variation above and finds that at will actually capture some white stones. So this is the solution. When I first read this book, I'd only recently given up on chess, so my memory of Kotov's Think Like a Grandmaster was still fresh. Kotov was a strong advocate of structured, disciplined thinking. He writes about how to construct a tree of variations sytematically and carefully, one of his key principles being that you assess each position once and once only. No going round in circles, no indecision, no wasted effort. He would have been appalled at Davies's approach of "Explore this for a while ... no, I'm not satified, let's try something else ... actually, I want to reevaluate the first one...". Since then, Kotov's approach has been criticised as too rigid, and "the Davies method" seems like an effective approach if you've got limited time and if your assessments aren't always correct on the first go. When I next get a spare hour, I'll post some traces of LZ looking at the same position. My feeling so far is that MCTS is surprisingly human-like in how it bounces between alternatives and reevaluates things. But this is very much work in progress, meaning that I don't actually know the answer until I write it up (and perhaps not even then). More in the next day or two, fingers crossed.

Author:	xela [ Tue Jan 07, 2020 8:42 pm ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
Quick update: for anyone who'd like to explore in Lizzie while you're waiting for me to get organised, I'll be using this full-board position. Click Here To Show Diagram Code [go]$$Bc Black can capture the O17 stones $$ --------------------------------------- $$ \| . O . . . . . . . . . . . . . . . . . \| $$ \| . X O . . O . . X O O O . . . O O O . \| $$ \| . O X O O . . . X X X X O O . X O . O \| $$ \| . . X X O . O . . , . O X X X X X O . \| $$ \| . . . X X O . . X O . X O . . . X . . \| $$ \| . . . X O . O . X X X . X . . . X . . \| $$ \| . . . X O . . . O O O . . . . . O . . \| $$ \| . X X X O . . O X X O . O X . . O . . \| $$ \| . O O X O . X . X O O O X X . . . . . \| $$ \| . . . O X X . . X O X X O . . , . . . \| $$ \| . . . O X . X X O O O X X X . X O O . \| $$ \| . . . X . X O X O . X O O X . . X O . \| $$ \| . . X X X O O O O O O X O O X . X O . \| $$ \| . . O . . . . O X X X X . O X . X X O \| $$ \| . O O X X X . X O . . . . O . . . O . \| $$ \| . X X O O O X X O X . . . . . X . . . \| $$ \| . O O . X O O X . . . . . X . X O . . \| $$ \| . . . O O X O X . . . . . . . O O . . \| $$ \| . . . . . . O X . . . . . . . . . . . \| $$ ---------------------------------------[/go] Attachment: ex1-davies_tesuji.sgf [2.82 KiB] Downloaded 660 times It was surprisingly hard to make a realistic full-board position where finding the tesuji and the correct follow-up is the difference between winning and losing! LZ is quick to say "actually, this other part of the board looks more interesting". Thanks to Go Seigen and Yamabe Toshiro for providing a context that almost matches what I want to look at, and apologies to same for mangling their creation. This position is GoGoD game 1953-04-08e at move 164 with a few stones added. I suggest trying a few different LZ networks. The recent ones can solve the problem largely on "intuition", with minimal reading, so they're less interesting to explore in this respect. I'll start off with network number 157.

Author:	Bill Spight [ Tue Jan 07, 2020 9:54 pm ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
xela wrote: For a while I've been playing with the LZ source code to learn a bit more about how it works. I'm now at a point where I can output a CSV file with a trace of exactly which variations it's explored, in what order, and "why" (in terms of the various numbers that get calculated along the way). Over the next few weeks I'd like to examine some positions where we've been saying "why did LZ do that???" and see how much we can explain. Bravo! More power to you. Quote: When I first read this book, I'd only recently given up on chess, so my memory of Kotov's Think Like a Grandmaster was still fresh. Kotov was a strong advocate of structured, disciplined thinking. He writes about how to construct a tree of variations sytematically and carefully, one of his key principles being that you assess each position once and once only. No going round in circles, no indecision, no wasted effort. He would have been appalled at Davies's approach of "Explore this for a while ... no, I'm not satified, let's try something else ... actually, I want to reevaluate the first one...". Since then, Kotov's approach has been criticised as too rigid, and "the Davies method" seems like an effective approach if you've got limited time and if your assessments aren't always correct on the first go. I also like Kotov's approach. It fits with Terence Reese's advice for contract bridge: Don't dither. I was somewhat surprised to find out later that research has disproven Kotov's once only dictum. (OC, you can revisit nodes in the search tree without dithering. ) But there are a couple of reasons to question the once only rule, anyway. For one thing, iterative deepening, even though it is not the best search strategy, has been shown to be surprisingly effective. And it violates the once only rule about as much as you can! (Without dithering. ) Second, going back to The Chess Mind, we know that chess grand masters often see the best move instantaneously, with no conscious search at all. Kotov's method is a way of guiding conscious, linear processing. But, truth to say, the brain's unconscious, parallel search is more powerful. The only bridge book I gave my sister, a social player, was one by the great bridge author, Victor Mollo, in which, for amateurs, he advocates seeing, not the methodical calculation of variations. As readers here may be aware, I advocate seeing for go. I believe that Kotov's method is great for training, but in actual play I think you have to allow your unconscious to work, as well. Today's top bots are showing us plays that human have largely overlooked for centuries. Is it possible to train ourselves to see more of these plays? Yes. Play over games and guess where the bot played or wants to play. Then see if you are right. OC, humans are still better than the bots at local depth first search, so there is no reason not to train those skills, as well. FWIW, here is what I saw in Davies' problem. Click Here To Show Diagram Code `[go]$$Bc Black to play and capture some white stones $$-------------+ $$ . . . . . . \| $$ . O O a b . \| $$ . . X O O . \| $$ . . X X O . \| $$ . . X 2 1 . \| $$ . . X O . . \| $$ . . X O 3 . \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X X . \| $$ . . . . . . \|[/go]` is indeed obvious. OC, I knew that does not work, but I saw that variation, anyway. Because the task is to capture some stones, I took a quick look at the corner stones, and didn't see anything there. I suppose that unconsciously I noticed that if Black cuts at a, White can reply at b, but nothing came to my consciousness. Click Here To Show Diagram Code `[go]$$Bc Variation $$-------------+ $$ . . . . . . \| $$ . O O . . . \| $$ . . X O O . \| $$ . . X X O . \| $$ . . X a 1 c \| $$ . . X O 2 3 \| $$ . . X O . b \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X X . \| $$ . . . . . . \|[/go]` Next, I saw this variation as far as . I knew that White could not play at a, and that Black could fill at c in response to b. I did not check that White's cut off stones were dead, because is the only play. Click Here To Show Diagram Code `[go]$$Bc Variation 2 $$-------------+ $$ . . . . . . \| $$ . O O . . . \| $$ . . X O O . \| $$ . . X X O . \| $$ . . X 4 1 3 \| $$ . . X O 6 2 \| $$ . . X O 5 . \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X X . \| $$ . . . . . . \|[/go]` Next, I looked at . Again, was obvious, to keep White cut in two. OC, I saw in response to . I could tell that was futile, and did not consciously see the capture with . I was aware that damezumari was a theme here, which prevented White from playing at 4, but I did not articulate it. Click Here To Show Diagram Code `[go]$$Bc Variation 3 $$-------------+ $$ . . . . . . \| $$ . O O . . . \| $$ . . X O O . \| $$ . . X X O . \| $$ . . X . 1 3 \| $$ . . X O 4 2 \| $$ . . X O a 5 \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X X . \| $$ . . . . . . \|[/go]` Next, I looked at this variation. It was obvious that was correct. It did not register consciously that prevented a one point eye at a, Reflecting on this later, I wondered if I would have seen if I had not already seen variation 2. Click Here To Show Diagram Code `[go]$$Bc Variation 4 $$-------------+ $$ . . . . . . \| $$ . O O . . . \| $$ . . X O O . \| $$ . . X X O 5 \| $$ . . X 6 1 2 \| $$ . . X O . 3 \| $$ . . X O 7 4 \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X O . \| $$ . . . X X . \| $$ . . . . . . \|[/go]` I did not see , but I believe that beginners should consider it, if only to find this snapback. And, even though not consciously considering in a game would not bother me, I cannot claim to have solved the problem. Aficionados of Sonoda may note that the four Black stones, - , all have a diagonal relationship. ---- OT, but I don't think I can refrain from a comment on current events. The world is on fire, both literally and figuratively. The Buddha saw that centuries ago: All things, O Bhikkus, are on fire.

Author:	jlt [ Wed Jan 08, 2020 3:30 am ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
Bill Spight wrote: The only bridge book I gave my sister, a social player, was one by the great bridge author, Victor Mollo, in which, for amateurs, he advocates seeing, not the methodical calculation of variations. As readers here may be aware, I advocate seeing for go. I believe that Kotov's method is great for training, but in actual play I think you have to allow your unconscious to work, as well. If I understand correctly, during actual play you let sequences pop up to your mind without trying to read "all" variations. But in order to get an efficient neural net (=brain), you need to train it by studying go problems, pro games, etc. During that training process, you think that the methodical calculation of variations is "great", but do you consider it as necessary?

Author:	Bill Spight [ Wed Jan 08, 2020 5:32 am ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
jlt wrote: Bill Spight wrote: The only bridge book I gave my sister, a social player, was one by the great bridge author, Victor Mollo, in which, for amateurs, he advocates seeing, not the methodical calculation of variations. As readers here may be aware, I advocate seeing for go. I believe that Kotov's method is great for training, but in actual play I think you have to allow your unconscious to work, as well. If I understand correctly, during actual play you let sequences pop up to your mind without trying to read "all" variations. But in order to get an efficient neural net (=brain), you need to train it by studying go problems, pro games, etc. During that training process, you think that the methodical calculation of variations is "great", but do you consider it as necessary? It is important not to dither. It is important to supplement the human tendency to depth first search and shore up its weaknesses. Kotov's method does that. As for the best way to train the human brain, quien sabe? I don't think that we know the best way to train neural networks. I have touched on these questions over the years. See the SL page about the fudge factor, for instance. https://senseis.xmp.net/?GoProblemsTheFudgeFactor I look forward to seeing what xela is finding out about LZ's searches.

Author:

xela [ Wed Jan 08, 2020 6:46 am ]

Post subject:

Re: Exploring LZ's search algorithm: worked examples

Bill Spight wrote:

I look forward to seeing what xela is finding out about LZ's searches.

OK, I can give you the first instalment of raw data now, but I'll have to keep you in suspense another day or two for my commentary.

Here's a line-by-line trace of 100 playouts (LZ network number 157, using 1 thread with random number seed 42) with calculations (showing rejected second choice move at each node, as well as the chosen move):

Attachment:

ex1-davies_tesuji-trace157_100po.csv [107.91 KiB]
Downloaded 690 times

SGF file showing all variations with the evaluation of each playout (first choice moves only):

Attachments:

ex1-davies_tesuji-trace157_100po.sgf [34.97 KiB]
Downloaded 1170 times

Author:	Bill Spight [ Wed Jan 08, 2020 8:08 am ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
Hmmmm. It looks like maybe you need to put a Black stone at H-18 or maybe G-18.

Author:	RobertJasiek [ Wed Jan 08, 2020 8:35 am ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
jlt wrote: during actual play you let sequences pop up to your mind without trying to read "all" variations. Such might be ok for strategic planning but for tactical reading it is a total failure. For tactical reading, in order not to read all variations, apply correct methods (such as Regular Reading) and valid, applicable simplifications (such as ignoring dominated moves). Quote: During that training process, you think that the methodical calculation of variations is "great", but do you consider it as necessary? It is mandatory and essential. Omitting it blocks tactical improvement.

Author:	RobertJasiek [ Wed Jan 08, 2020 8:44 am ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
xela wrote: Kotov's Think Like a Grandmaster was still fresh. Kotov was a strong advocate of structured, disciplined thinking. He writes about how to construct a tree of variations sytematically and carefully What are his major principles for choosing among several next (half-)moves? How does he walk through trees?

Author:	Bill Spight [ Wed Jan 08, 2020 9:55 am ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
RobertJasiek wrote: xela wrote: Kotov's Think Like a Grandmaster was still fresh. Kotov was a strong advocate of structured, disciplined thinking. He writes about how to construct a tree of variations sytematically and carefully What are his major principles for choosing among several next (half-)moves? How does he walk through trees? Kotov's first chapter is on the calculation of variations. It is 74 pages long. His basic approach appears to be breadth first. The first thing to do is to identify candidate plays. And when, in analysis, he spends 30 min. on a position, he may well have done that at each ply. In an actual game, with an average of only a few minutes per move, I doubt if he explored hundreds of variations to a shallow depth, however.

Author:	Kirby [ Wed Jan 08, 2020 11:20 am ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
breadth first seems harder to do for humans. I like the idea of identifying some candidate plays, especially at the root of the tree. but after that, it's confusing. let's say you have candidate moves 'A', 'B', and 'C'. then you want to do breadth first search on that. you take 'A0' and think of candidate responses to that move: 'A_0', 'A_1', and 'A_2' - and you visualize them. and you don't know if any of them work. then you stop visualizing them, and move onto the children of 'B'. What are the responses to 'B'? Let's say they're 'B_0', 'B_1', and 'B_2'. And you still don't know whether any of them are ideal. you keep going this way until you get to a leaf node. By that time, you may have explored almost the entire tree. And when you get to a leaf node, you can finally say, "Oh - leaf at 'A_0_0_0_0_0' doesn't work. let's rule that out." contrast this with depth first search. with depth first search, you can look at 'A'. Then immediately see what a response 'A_0' may yield, and then go to 'A_0_0', ... until you get to a leaf node in lg(N) time. and at that point, you can already start pruning options. thinking in terms of breadth is important, because you don't want to miss moves. but it's pretty darn hard to keep the entire game tree in your head before you can start eliminating options from search.

Author:	RobertJasiek [ Wed Jan 08, 2020 11:38 am ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
Bill Spight wrote: His basic approach appears to be breadth first. The first thing to do is to identify candidate plays. Once more: how does he choose among next candidates? In my Regular Reading method, one successful optimal candidate (e.g. "achieves independent life") is enough; no need to find all. When you just refer to breadth first, this says nothing about reading fewer than all next candidates. He must give some advice on that, mustn't he?

Author:	Bill Spight [ Wed Jan 08, 2020 11:46 am ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
Kirby wrote: breadth first seems harder to do for humans. I like the idea of identifying some candidate plays, especially at the root of the tree. but after that, it's confusing. I like the idea, from SOAR, of generating subgoals. For instance, in this problem you can quickly form the subgoal of preventing White from playing R-15. That makes the first line hane obvious for . When jumps to the first line, you have a new subgoal of preventing White from connecting underneath. It turns out that the descent to the third line achieves both subgoals. Then, when connects to the jump, you have a new subgoal of preventing a one point eye. And there is a play that achieves both that goal and prevents R-15. Bingo! Back in the 90s I was unable to program a go search program in SOAR, however.

Author:	Bill Spight [ Wed Jan 08, 2020 11:50 am ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
RobertJasiek wrote: Bill Spight wrote: His basic approach appears to be breadth first. The first thing to do is to identify candidate plays. Once more: how does he choose among next candidates? In my Regular Reading method, one successful optimal candidate (e.g. "achieves independent life") is enough; no need to find all. When you just refer to breadth first, this says nothing about reading fewer than all next candidates. He must give some advice on that, mustn't he? Of course he does. I refer you to the book.

Author:	Kirby [ Wed Jan 08, 2020 12:35 pm ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
Bill Spight wrote: Kirby wrote: breadth first seems harder to do for humans. I like the idea of identifying some candidate plays, especially at the root of the tree. but after that, it's confusing. I like the idea, from SOAR, of generating subgoals. For instance, in this problem you can quickly form the subgoal of preventing White from playing R-15. That makes the first line hane obvious for . When jumps to the first line, you have a new subgoal of preventing White from connecting underneath. It turns out that the descent to the third line achieves both subgoals. Then, when connects to the jump, you have a new subgoal of preventing a one point eye. And there is a play that achieves both that goal and prevents R-15. Bingo! Back in the 90s I was unable to program a go search program in SOAR, however. you can create subgoals to guide your search, but doesn't it still mean you have potentially more to keep in your head with bfs? with depth first search, i really like that you can prune entire branches of the tree (i.e. this is no longer worth exploring, because i've come to a terminal state and identified that it's bad) - you don't have to revisit nodes in the tree once you've eliminated them. but with breadth first, you aren't "done" with a node yet after you've first seen it, because you might come back to it after searching later in the tree.. you could do that with subgoals, too: i no longer want to explore this branch at all, because after i searched down the tree a bit, i found out that it cannot meet my subgoal. if there's one optimal move in the game tree, both approaches, in worst case, find the result after looking through the entire tree. but at least with bfs, you don't have to keep the moves you've already explored in your head.

Author:	Bill Spight [ Wed Jan 08, 2020 1:29 pm ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
Kirby wrote: if there's one optimal move in the game tree, both approaches, in worst case, find the result after looking through the entire tree. but at least with bfs, you don't have to keep the moves you've already explored in your head. It is pretty clear that humans prefer depth first search because we have a short stack. With breadth first search we have to keep track of unvisited nodes in short term memory. It also appears that go professionals have sculpted their brains at an early age to increase their short term memory to include a workspace for go.

Author:

xela [ Thu Jan 09, 2020 12:59 am ]

Post subject:

Re: Exploring LZ's search algorithm: worked examples

Thanks for all the interesting comments! I'll reply to some of them later, but first, back to LZ's way of reading, and the tracing file (CSV attachment) from post number 6 above.

I'll walk through the first few playouts in a lot of detail, to recap how Monte Carlo tree search (MCTS) works. (And also to expose the gaps in my own knowledge, so that the real experts can teach me something.) Then I'll back out and look at the "big picture".

Remember that each playout goes through one cycle of explore-expand-score-update (different descriptions of MCTS use different names for the phases).

Explore: using the tree of variations that you've built from previous playouts, walk down the tree one node at a time, choosing the "most promising" node at each step, until you reach a leaf.
Expand: list all the legal moves from the position you've reached, and add them to the tree as new childe nodes.
Score: evaluate the new children and decide which one looks like the "best" move. (In old-school MCTS, you'd do this with random playouts, adding moves until you reach the end of the game and can see who's won. Since AlphaGo Zero, we use neural net evaluations instead.)
Update: go back up the tree the same way you came down, updating the value at each node in the light of the scores you just calculated. Also keep a count of how many times each node has been visited: this is part of how you calculate "most promising" when exploring.

Here's how I've represented it in the CSV file. I've put the first few lines behind the cut because it might be too wide for some screens:

Line 1: initialise. The root node (starting position) gets one playout, and a value. Without reading at all, the network thinks black's winrate is 44%.

Next line: first playout, first node. In the file I've shown how LZ chooses between P18 and P2, its top two choices. (In fact, it does the same calculations for all legal moves, but I didn't want to make the file too big.) There's quite a few numbers here. Reading from left to right:

value is winrate plus puct
fpu_eval is a number based on "first play urgency". I'll say a little more about that below. It's the same number for all nodes, but as the playouts progress, the fpu number will go down. (At the very beginning of the search, the first fpu_eval is exactly equal to the value of the root node. In general it's the network value of the parent node minus an adjustment depending on which other nodes have been visited.)
winrate is equal to the fpu_eval the first time LZ looks at a node, and later on it gets updated based on the network evaluation.
puct is the "exploration factor". More detail later
policy is the network policy value. This is the only number that doesn't change on repeated visits. It feeds into the puct value.
And then the same set of numbers for the second choice move

So here, P18 has a value of 0.44 (winrate inherited from the parent node because it's the first visit, and no puct adjustment). And P2 (and all the other moves) also has the same value of 0.44. So we use the policy value as a tie-break: P18 wins with 45% policy, while P2 has 29%. This finishes the first iteration of explore-expand-score.

Next line: first playout, update for P18. The visit count is now 1. The value gets updated, and some mysterious calculations produce 0.30. (I'm a bit hazy on the details here. But as you look at a bunch of playouts, you'll get the general idea). This value is from the perspective of white to play, so 0.30 means that, at first glance, black is ahead after playing P18. There's also a "lower confidence bound" (LCB). On small numbers of playouts, these numbers are pretty meaningless, but later the LCB will be just a little bit smaller than the value.

Next line: first playout, going back up the tree, update for root node. Visit count for the root is 2 (it got one visit for initialising, then a second visit for the playout). Magic calculations give us an updated value of 0.57.

Next line: second playout, first node. In the initial position, LZ will assess the scores of all legal moves and decide which is most promising. We've already done this for playout number 1, but as the visit counts change, some of the numbers change. The fpu_eval has gone down from 0.44 (first playout) to 0.27. For P18, it's already been visited, so we ignore the fpu_eval and instead use the number from the first playout. Remember it got updated to 0.30 from white's perspective: that's 0.70 (or 0.6989 rounded up) from black's perspective. And because the parent node has been visited, there's now an exploration bonus puct=0.08. Winrate + puct equals value of 0.78 for P18.

Same line, second choice move: P2 is still univisited, so its winrate is the fpu_eval of 0.27. But there's also a puct bonus of 0.11: P2 gets a higher bonus than P18, because it's had fewer visits. Total value for P2 is 0.38, lower than P18, so P18 is still first choice.

Digression: if you scroll down the CSV file, looking at explore depth 1 for each playout, you'll see the puct number for P2 getting higher and higher the longer it's ignored, until at playout number 14, P2 overtakes P18 as the first choice.

Next line: you'll see similar calculations for replies to P18. Because none of the replies have been visited yet, they all get the fpu_eval number, and we pick the highest policy move.

Next three lines: back up the tree, update the visit counts and values, calculate the LCBs (still not realistic, three visits isn't enough).

And you see how the tree grows, and starts to branch after 14 playouts.

Technical details of fpu and puct calculations behind the cut, for those who are interested.

I'll start with puct because I think I understand that part better. UCT is upper confidence bounds for trees. Google will show you a lot of research papers and blog posts with all the theory. Then PUCT is polynomial UCT, a modification to the basic UCT formula.

The basic UCT formula is u = constant times psa times square root (sum of visit counts for current node and all siblings) divided by (1 + visit count for current node).

What does psa stand for? I spent a while wondering why the LZ source code is full of public service announcements. But no, psa is P(s,a), the probability in game state s of action a.

So if you keep visiting the same node, then the bottom line of the fraction increases faster than the top, so the u value goes down: the exploration bonus for a node gets smaller the more you've already explored that node. But if you keep ignoring a node and visiting its siblings instead, the top line of the fraction increases and the bottom doesn't: the u value goes up (and there's no upper limit), so eventually, given enough visits, you will explore every node.

PUCT adds some extra terms to the top line of the fraction, but it's the same basic idea.

First play urgency (FPU) is a bit more mysterious. It's mentioned on a few web pages, but I've not seen it in the research literature, so I'm not sure if it was a "secret sauce" part of AlphaGo that didn't make it into the Nature paper, or if it was invented as part of LZ, or if it comes from somewhere else.

The formula is simply fpu_eval = network eval (i.e. winrate from neural net) - constant times sqrt (sum of policies for all children that have been visited). This is calculated for the parent node and then applied to all child nodes. So if no children have been visited yet, the second term is zero, and fpu of child = network eval of parent. But as more different nodes get visited, the fpu_eval goes down. The second part of the formula is referred to as "fpu reduction", because fpu means that you reduce the initial winrate estimate.

In the CSV file, if you look at the fpu_eval at depth 1, you'll see how it starts at 0.44 (equal to initial value of root node), goes down to 0.27 on the first playout (as soon as P18 has been visited), stays at 0.27 for the first 14 playouts (because P18 is being revisited every time), and goes down to 0.22 on playout 15 (once P2 gets its first visit).

As far as I can tell, the point of FPU reduction is to make the search tree narrower: using the fpu_eval instead of the raw winrate means that the second choice move has less chance of being selected. If a move has been ignored for a long time, and your winrate is looking OK, then you probably don't need to waste time exploring that move. You'll only get to it if the value of your first choice node ends up being lower than the fpu_eval number. In a future post I plan to show an example of where that has a dramatic impact.

Author: xela [ Thu Jan 09, 2020 2:05 am ]

Post subject: Re: Exploring LZ's search algorithm: worked examples

Now for the "big picture" view: what's the story of playouts 1-100? Here you might want to refer to the SGF file from the previous post. Let's see how LZ bounces around between three main variations, reassessing each of them.

Click Here To Show Diagram Code: [go]$$Bc Variation 1: playouts 1-5 and 51 $$ --------------------------------------- $$ | . O . . . . . . . . . . 4 3 5 . . . . | $$ | . X O . . O . b X O O O . 2 1 O O O . | $$ | . O X O O . . . X X X X O O a X O . O | $$ | . . X X O . O . . , . O X X X X X O . | $$ | . . . X X O . . X O . X O . . . X . . | $$ | . . . X O . O . X X X . X . . . X . . |[/go]

The first five playouts explore this position: here is LZ's first instinct. :w2:

at a doesn't get examined at all: the network can see "at a glance" that it's hopeless and gives it a low policy value. After five playouts, LZ has worked out that this :w2:

isn't working, so it turns to :w2:

instead. Later, after finding out that don't work either, it returns to this variation (playout 51) to see if white b next will do anything interesting. (It doesn't.) Overall, the position in this diagram gets seven playouts. Notice that LZ never tries :w6:

at a, or any other local move. I think it can "see" that the white stones are cut off and unable to live independently, and doesn't need to read out "elementary" sequences.

Click Here To Show Diagram Code: [go]$$Bc Variation 2: playouts 6-13 and many more $$ --------------------------------------- $$ | . O . . . . . . . . . . . 2 c . . . . | $$ | . X O . . O . . X O O O . a 1 O O O . | $$ | . O X O O . . . X X X X O O b X O . O | $$ | . . X X O . O . . , . O X X X X X O . | $$ | . . . X X O . . X O . X O . . . X . . | $$ | . . . X O . O . X X X . X . . . X . . |[/go]

This position gets most of LZ's attention, 79 visits in total. It looks at six choices for :b3:

: in order of preference,

Tenuki at P2: policy value 24%, playout numbers 7, 27, 28, 33, 34, 35
a: 23%, 20 visits between playout 8 and 40
Tenuki at B16: 11%, one visit only at playout 22
b: 8%, playout numbers 29, 31, 32
Tenuki at N4: 6%, 1 visit only at playout 39
c: 6%, 47 visits between playout 41 and 100

So first it tries tenuki, gets a winrate of 41% for black P2 (the value in the file is 0.5884, but remember that's for white to play), and decides that it can do better. Then for :b3:

at a we see this variation:

Click Here To Show Diagram Code: [go]$$Bc Variation 2a, black P2 next, winrate 44% $$ --------------------------------------- $$ | . O . . . . . . . . . . . 2 6 . . . . | $$ | . X O . . O . . X O O O 4 3 1 O O O . | $$ | . O X O O . . . X X X X O O 5 X O . O | $$ | . . X X O . O . . , . O X X X X X O . | $$ | . . . X X O . . X O . X O . . . X . . | $$ | . . . X O . O . X X X . X . . . X . . |[/go]

Playing the most instinctive moves, LZ can reduce white's territory in sente and then turn to P2, and this improves the winrate slightly, but still isn't satisfactory. It takes a bit of trial and error before LZ gets to the right moves:

Click Here To Show Diagram Code: [go]$$Bc Variation 2b $$ --------------------------------------- $$ | . O . . . . . . d b . . 5 2 3 . . . . | $$ | . X O . . O . . X O O O c 4 1 O O O . | $$ | . O X O O . . . X X X X O O a X O . O | $$ | . . X X O . O . . , . O X X X X X O . | $$ | . . . X X O . . X O . X O . . . X . . | $$ | . . . X O . O . X X X . X . . . X . . |[/go]

This position gets 34 visits, starting from playout number 52. It's obvious from the very first visit that this is bad for white (winrate of 5.6%), but it takes a number of visits for this information to filter back up the tree: LZ has to overcome the prejudice of both :b3:

and

being low-policy moves.

Before trying this :b5:

, LZ looks briefly at a (playouts 43-49) before realising that white can live. As alternatives to :w4:

, it quickly checks out (two playouts each) white a and white b. In response to :b5:

, LZ reads out :w6:

at b, c or d. It turns out that white can actually live, but at the cost of black damaging the area to the left, so this is still a winning proposition for black.

Click Here To Show Diagram Code: [go]$$Bc Variation 3, late entry, needs further investigation $$ --------------------------------------- $$ | . O . . . . . . . . . . . b 2 . . . . | $$ | . X O . . O . . X O O O . a 1 O O O . | $$ | . O X O O . . . X X X X O O . X O . O | $$ | . . X X O . O . . , . O X X X X X O . | $$ | . . . X X O . . X O . X O . . . X . . | $$ | . . . X O . O . X X X . X . . . X . . |[/go]

LZ gives this a quick glance at playouts 79, 80 and 87-90, looking at black's replies at a and b. It seems to prefer black, but the results aren't conclusive. I expect we'd see more of this variation on a longer run.

Author:	Bill Spight [ Thu Jan 09, 2020 8:36 am ]
Post subject:	Re: Exploring LZ's search algorithm: worked examples
Hmmm. I wonder about this position. Click Here To Show Diagram Code `[go]$$Bc Black kills $$ --------------------------------------- $$ \| . O . . . . . . . . . . . . . . . . . \| $$ \| . X O . . O W B X O O O . . . O O O . \| $$ \| . O X O O . . . X X X X O O . X O . O \| $$ \| . . X X O . O . . , . O X X X X X O . \| $$ \| . . . X X O . . X O . X O . . . X . . \| $$ \| . . . X O . O . X X X . X . . . X . . \|[/go]` In this position Black can kill. I know that the margin of victory does not matter, but surely the possibility of a clean kill will affect the winrate estimate, no? Does that then affect the search?

Author:

xela [ Thu Jan 09, 2020 4:10 pm ]

Post subject:

Re: Exploring LZ's search algorithm: worked examples

Thanks Bill for the suggestion! I'll get to that soon, but first I want to show a couple of other scenarios.

Analysing the same position using network number 258 instead of 157, LZ explores fewer variations and goes into more depth. It's quicker to find the right answer, and spends less time exploring moves that don't work.

Attachment:

ex1-davies_tesuji-trace258_100po.csv [140.47 KiB]
Downloaded 643 times

Referring to variation numbers in my previous post:

It gets to variation 2 first, and only decides to check variation 1 on playout number 99. Variation 3 doesn't appear at all.
It still goes down the wrong path of variation 2a, but now discards it by playout number 7 (compared with playout number 40 for the previous attempt)
The position of variation 2b gets 69 visits (compared with 34 visits on the previous attempt). With the stronger network, LZ spends literally twice as much time evaluating the "correct answer" diagram rather than exploring alternatives.

Attachments:

ex1-davies_tesuji-trace258_100po.sgf [43.24 KiB]
Downloaded 1014 times

Page 1 of 3	All times are UTC - 8 hours [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/