Go back

More articles with no series

No series

Chess Engines

An introductory guide to chess engines: board representation, move generation, evaluation, search algorithms, and performance optimizations.

World Champion chess player Garry Kasparov was defeated by IBM’s Deep Blue chess-playing computer in a six-game match in May, 1997. The match received enormous media coverage, much of which emphasized the notion that Kasparov was defending humanity’s honor.

Introduction to Chess Engines

This article is an exposition and explanation on how chess engines work, particularly this project.

A [[chess engine]] is a computer program that analyzes chess positions and determines the best move through systematic evaluation. Modern chess engines combine efficient board representation, rapid move generation, sophisticated evaluation functions, and deep search algorithms to achieve superhuman playing strength.

Position-Based Evaluation Philosophy

Traditional chess engines, including this one, operate on a fundamental principle: they evaluate individual board positions as static snapshots, determining whether a position is “good” or “bad” through quantifiable metrics (material count, piece placement, king safety, etc.). The engine doesn’t inherently understand chess strategy or plans-it evaluates millions of positions and selects moves leading to the best-scored positions.

This engine uses hand-crafted evaluation based on:

  • Material counting (queen = 9 pawns, rook = 5 pawns, etc.)
  • Piece-square tables (e.g. knights in center are better than on rim)
  • Static position features

In contrast, modern neural network engines like Stockfish NNUE (Efficiently Updatable Neural Network) and Leela Chess Zero use learned evaluation functions that can implicitly recognize patterns, piece coordination, and positional motifs that emerge from training. These neural approaches evaluate positions by recognizing high-dimensional patterns “learned” from millions of games, rather than explicitly coded heuristics. Note that NNUE in Stockfish replaces only the evaluation function while retaining classical search (alpha-beta with all its pruning and extensions), whereas Leela uses a pure neural approach with Monte Carlo Tree Search.2526

Engine Architecture Spectrum

Engine Type Evaluation Method Training Approach Example
Classical Hand-crafted heuristics Human chess knowledge coded This engine, early Stockfish
Hybrid (NNUE) Neural network + search Pre-trained on positions with classical engine evaluations Stockfish 12+
Pure Neural Deep neural network Self-play from random positions (tabula rasa) AlphaZero, Leela Chess Zero

Stockfish (modern) combines the best of both worlds, it uses the efficient search techniques described in this document (alpha-beta, transposition tables, move ordering) and replaces the hand-crafted leaf evaluation with NNUE trained on billions of positions evaluated by classical Stockfish. Importantly, NNUE only handles position evaluation-the classical search algorithm, pruning heuristics, and move ordering remain largely hand-crafted. This “bootstrap” approach leverages human chess insight encoded in both the classical evaluation and search techniques to accelerate training.

AlphaZero and Leela Chess Zero take a more radical approach, they start from zero knowledge (only the rules of chess) and learn entirely through self-play reinforcement learning. AlphaZero’s neural network learns to evaluate positions and suggest moves simultaneously, discovering chess principles independently without human input. This achieved superhuman play in just hours of training, with a more complex and natural playstyle, though requiring massive computational resources.326

$ \theta $ Mathematical View of Chess

From a mathematical perspective, chess can be formalized as an exploration of state space transformations:

Move as a Function

A move $ m $ is a function that transforms one board state to another:

$$ m: S \to S' \quad \text{where } S, S' \in \mathcal{G} $$

The set of all legal positions $ \mathcal{G} $ forms the state space (estimated at between $ 10^{43} $ and $ 10^{47} $ reachable positions).

State Space as Vector Space

This is a conceptual abstraction useful for understanding neural network representations. Classical engines (including this one) operate on discrete structures (bitboards, integers, arrays) rather than continuous vector spaces.

Each position can be conceptually represented as a vector in a high-dimensional space:

$$ \mathbf{s} = (p_0, p_1, \ldots, p_{63}, c, e, h) \in \mathbb{Z}^n $$

Where:

  • $ p_i \in \{0, 1, 2, \ldots, 11\} $ represents piece type on square $ i $ (0-11 for the 12 piece types, or use -1/12 for empty)
  • $ c \in \{0, 1, 2, \ldots, 15\} $ encodes castling rights
  • $ e \in \{-1, 0, 1, \ldots, 63\} $ is the en passant square
  • $ h \in \{0, 1, \ldots, 50\} $ is the halfmove clock

Neural network engines (NNUE, Leela) use similar but more sophisticated encodings as input to their networks. Classical engines usually hand-design features such as material, mobility, king safety, pawn structure, and piece-square bonuses, then combine them explicitly in code. NNUE-style engines still start from a discrete board state, but transform it into sparse input features that a neural network can update efficiently after each move; AlphaZero-style systems instead feed richer spatial encodings into a deeper policy-and-value network.42526

Move Generation as Set Mapping

Legal move generation is a set-valued function:

$$ \mathcal{M}: S \to 2^{\text{Moves}} \quad \text{where } |\mathcal{M}(s)| \approx 35 \text{ (average)} $$

The search space is better modeled as a directed graph with positions as nodes and moves as edges. In practice, transpositions and repetitions mean it is not strictly acyclic unless history-sensitive draw handling is included:20212223

$$ \mathcal{G} = (V, E) \quad \text{where } V = \mathcal{S}, \quad E = \{(s, s') : \exists m \in \mathcal{M}(s), m(s) = s'\} $$

Evaluation as Projection

The evaluation function projects the high-dimensional state space onto the real line:

$$ \text{eval}: \mathcal{S} \to \mathbb{R} \quad \text{(classical engines)} $$

Neural network evaluations can be viewed as learned non-linear projections:

$$ \text{eval}_{\text{NN}}(\mathbf{s}) = f_\theta(\mathbf{s}) \quad \text{where } \theta \text{ are learned parameters} $$

More concretely, $ f_\theta $ is a learned nonlinear function approximator. It maps an encoded position to a scalar score, a win-draw-loss estimate, or both a value and move priors depending on the engine architecture. That is a more accurate way to think about neural evaluation than a binary separator between “good” and “bad” positions, because engine evaluation is continuous, contextual, and strongly dependent on side to move, tactical threats, and long-range coordination.426

Core Components

Component Purpose Complexity
Board Representation Store position state $ O(1) $ access
Move Generation Generate legal moves $ O(n) $ per position
Evaluation Function Score positions $ O(1) $
Search Algorithm Find best move $ O(b^d) $ exponential
Transposition Table Cache positions $ O(1) $ lookup

The engine operates in a think-evaluate-move cycle:

Position → Generate Moves → Evaluate → Search → Best Move
     ↑                                              ↓
     └──────────────────────────────────────────────┘

This loop assumes the opponent will always choose the strongest available reply within the search horizon. That worst-case assumption is exactly what makes minimax-style search robust, but it also means the engine is not explicitly optimizing for human fallibility, time pressure, or psychologically awkward positions unless extra heuristics or opponent models are added.111

Bitboard Representation

What are Bitboards?

A bitboard is a 64-bit unsigned integer where each bit represents a square on the chessboard. This representation enables highly efficient bitwise operations for move generation and board queries.7

typedef uint64_t Bitboard;

Board Mapping

The chessboard is mapped to bits in little-endian rank-file mapping:

a8 b8 c8 d8 e8 f8 g8 h8    8 | 56 57 58 59 60 61 62 63
a7 b7 c7 d7 e7 f7 g7 h7    7 | 48 49 50 51 52 53 54 55
a6 b6 c6 d6 e6 f6 g6 h6    6 | 40 41 42 43 44 45 46 47
a5 b5 c5 d5 e5 f5 g5 h5    5 | 32 33 34 35 36 37 38 39
a4 b4 c4 d4 e4 f4 g4 h4    4 | 24 25 26 27 28 29 30 31
a3 b3 c3 d3 e3 f3 g3 h3    3 | 16 17 18 19 20 21 22 23
a2 b2 c2 d2 e2 f2 g2 h2    2 |  8  9 10 11 12 13 14 15
a1 b1 c1 d1 e1 f1 g1 h1    1 |  0  1  2  3  4  5  6  7
                               -------------------------
                                a  b  c  d  e  f  g  h

Board Structure

The engine uses piece-centric bitboards where each piece type has its own bitboard:

typedef struct {
    // White pieces
    Bitboard pawn_w, knight_w, bishop_w, rook_w, queen_w, king_w;
    
    // Black pieces
    Bitboard pawn_b, knight_b, bishop_b, rook_b, queen_b, king_b;
    
    // Derived occupancy masks
    Bitboard occupancy;       // All pieces
    Bitboard occupancyWhite;  // White pieces
    Bitboard occupancyBlack;  // Black pieces
    
    // Game state
    int turn;            // WHITE=1 or BLACK=0
    int castling;        // Bitfield: KQkq
    int epSquare;        // En passant target square
    int halfmoves;       // 50-move rule counter
    int fullmoves;       // Move number
    
    // Cached king positions
    int whiteKingSq;
    int blackKingSq;
    
    // Zobrist hash
    Bitboard hash;
    
    // Attack bitboard
    Bitboard attacks;
} Board;

Basic Bitboard Operations

Bitboards are useful because the primitive operations of chess become primitive operations of the CPU. Placing pieces, removing them, finding occupancies, generating pushes, and iterating through piece lists. Instead of looping over 64 squares and branching repeatedly, the engine can often express the same logic with a handful of branch-light integer operations.

Two patterns show up constantly:

  • Masking: isolate a subset of squares with &
  • Iteration by least-significant bit: repeatedly extract one piece with ctzll, then clear it with bb &= bb - 1

LSB iteration is especially important because it turns a dense 64-square board scan into a loop over only the occupied squares.

Operation Formula Purpose
Set bit bb |= (1ULL << square) Place piece
Clear bit bb &= ~(1ULL << square) Remove piece
Toggle bit bb ^= (1ULL << square) Flip state
Test bit (bb >> square) & 1 Check occupancy
Count bits __builtin_popcountll(bb) Count pieces
LSB index __builtin_ctzll(bb) Find first piece

Bitwise Advantages

Space Efficiency: 12 bitboards (64 bytes) store the entire position, compared to 64-element arrays (256+ bytes).

Parallel Operations: A single instruction can operate on all 64 squares simultaneously:

// Check if any pawns are on the 7th rank
bool pawnsOnSeventh = (pawn_w & RANK_7) != 0;

// Get all white pieces in one operation
Bitboard whitePieces = pawn_w | knight_w | bishop_w | rook_w | queen_w | king_w;

Example: Initial Position

White pawns on rank 2:

Bitboard pawn_w = 0x000000000000FF00;

Binary representation:
0000000000000000  (rank 8)
0000000000000000  (rank 7)
0000000000000000  (rank 6)
0000000000000000  (rank 5)
0000000000000000  (rank 4)
0000000000000000  (rank 3)
1111111111111111  (rank 2) ← Pawns here
0000000000000000  (rank 1)

Move Generation

Move Structure

Each move is encoded in a compact structure:

typedef struct {
    int fromSquare;     // Origin (0-63)
    int toSquare;       // Destination (0-63)
    int promotion;      // Promoted piece type (-1 if none)
    int castle;         // Castling type (0 if not castling)
    int validation;     // LEGAL, ILLEGAL, or NOT_VALIDATED
    int pieceType;      // Moving piece type
    int score;          // Move ordering heuristic
    bool exhausted;     // Search tracking flag
} Move;

Move Generation Pipeline

  1. Generate Pseudo-Legal Moves
  2. Apply Move on Copy Board
  3. Check if King is Under Attack
  4. Filter Illegal Moves (leaving king in check)
  5. Return Legal Move List

Piece Movement Patterns

Non-Sliding Pieces

Pre-computed lookup tables for knights, kings, and pawn attacks:

// Knight moves from each square
Bitboard KNIGHT_MOVEMENT[64];

// King moves from each square
Bitboard KING_MOVEMENT[64];

// Pawn attacks (split by direction and color)
Bitboard PAWN_W_ATTACKS_EAST[64];
Bitboard PAWN_W_ATTACKS_WEST[64];

Knight Movement from square $ s $:

$$ \text{moves}(s) = \{s \pm 17, s \pm 15, s \pm 10, s \pm 6\} \cap \text{valid\_squares} $$

Where moves must respect file boundaries to prevent wrapping.

Pawn Movement Logic

Pawns have the most complex movement rules:

void pawnSingleAndDblPushes(Board board, Bitboard* single, Bitboard* dbl) {
    // Single push: shift pawns one rank forward
    *single = board.turn ? board.pawn_w << 8 : board.pawn_b >> 8;
    *single &= ~board.occupancy;  // Remove blocked squares
    
    // Double push: pawns on starting rank move two squares
    *dbl = board.turn ? 
        (board.pawn_w & PAWN_START_WHITE) << 16 : 
        (board.pawn_b & PAWN_START_BLACK) >> 16;
    
    // Can only double-push if both squares are empty
    *dbl &= ~board.occupancy;      // Destination must be empty
    // Shift back to intermediate square, intersect with single-push
    // to ensure path is clear, then shift forward again
    *dbl = board.turn ? *dbl >> 8 : *dbl << 8;
    *dbl &= *single;               // Square in between must be empty
    *dbl = board.turn ? *dbl << 8 : *dbl >> 8;  // Restore to destination
}

/* Implementation comment: The intermediate shifts ensure the square between
 * the starting position and destination is empty. While this could be
 * optimized with a direct mask, this approach clearly expresses the logic. */

Pawn promotions generate four moves per promoting pawn (queen, rook, bishop, knight):

if (isPromoting) {
    int promotions[] = {QUEEN_W, ROOK_W, BISHOP_W, KNIGHT_W};
    for (int i = 0; i < 4; i++) {
        Move move = {from, to, promotions[i], 0, turn ? PAWN_W : PAWN_B};
        // Validate and add move
    }
}

Special Moves

Castling

Castling requires checking multiple conditions:

bool canCastle = 
    (castling_rights & castle_flag) &&           // Legal right exists
    !(castle_path & occupancy) &&                // Path is clear
    !isSquareAttacked(board, kingSquare) &&      // Not in check
    !isSquareAttacked(board, transitSquare);     // Don't castle through check

En Passant

En passant captures are enabled when an opponent pawn double-pushes:

if (pawn_moved && abs(toSquare - fromSquare) == 16) {
    board.epSquare = (fromSquare + toSquare) / 2;  // Set target square
}

Attack Detection

To determine if a square is attacked:

bool isSquareAttacked(const Board board, int square) {
    Bitboard sqBb = SQUARE_BITBOARDS[square];
    Bitboard pawn = board.turn ? board.pawn_b : board.pawn_w;
    Bitboard king = board.turn ? board.king_b : board.king_w;
    Bitboard knight = board.turn ? board.knight_b : board.knight_w;
    Bitboard bishop = board.turn ? board.bishop_b : board.bishop_w;
    Bitboard rook = board.turn ? board.rook_b : board.rook_w;
    Bitboard queen = board.turn ? board.queen_b : board.queen_w;
    
    // Check queen attacks (both diagonal and orthogonal)
    while (queen) {
        int sq = __builtin_ctzll(queen);
        
        // Queens attack like bishops and rooks combined
        Bitboard attacks = getBishopAttacks(sq, board.occupancy);
        attacks |= getRookAttacks(sq, board.occupancy);
        
        if (attacks & sqBb) return true;
        
        queen &= queen - 1;  // Remove this queen, check next
    }
    
    // Check bishop attacks (diagonal)
    while (bishop) {
        int sq = __builtin_ctzll(bishop);
        Bitboard attacks = getBishopAttacks(sq, board.occupancy);
        if (attacks & sqBb) return true;
        bishop &= bishop - 1;
    }
    
    // Check rook attacks (orthogonal)
    while (rook) {
        int sq = __builtin_ctzll(rook);
        Bitboard attacks = getRookAttacks(sq, board.occupancy);
        if (attacks & sqBb) return true;
        rook &= rook - 1;
    }
    
    // Check knight attacks
    while (knight) {
        int sq = __builtin_ctzll(knight);
        if (KNIGHT_MOVEMENT[sq] & sqBb) return true;
        knight &= knight - 1;
    }
    
    // Check pawn attacks
    while (pawn) {
        int sq = __builtin_ctzll(pawn);
        
        if (board.turn) {
            // Check if black pawn attacks this square
            if (PAWN_B_ATTACKS_EAST[sq] & sqBb) return true;
            if (PAWN_B_ATTACKS_WEST[sq] & sqBb) return true;
        } else {
            // Check if white pawn attacks this square
            if (PAWN_W_ATTACKS_EAST[sq] & sqBb) return true;
            if (PAWN_W_ATTACKS_WEST[sq] & sqBb) return true;
        }
        pawn &= pawn - 1;
    }
    
    // Check king attacks
    while (king) {
        int sq = __builtin_ctzll(king);
        if (KING_MOVEMENT[sq] & sqBb) return true;
        king &= king - 1;
    }
    
    return false;
}

Magic Bitboards

Sliding pieces (bishops, rooks, queens) require ray-based attack generation that depends on board occupancy. A rook on d4 has different moves when the board is empty versus when pieces block its rays.

The natural approach is, for each square and occupancy pattern, compute attacks on-the-fly with ray tracing. But ray tracing is expensive. A position may need to check thousands of sliding piece attacks during search.

The solution is to pre-compute all possible attack patterns and use pseudo-perfect hashing to retrieve them in $ O(1) $ time.

Magic Bitboard Technique

Magic bitboards use a pseudo-perfect hash function (commonly called “magic numbers”) to map occupancy patterns to pre-computed attack sets. While collisions are theoretically possible, magic numbers are carefully selected offline to eliminate all collisions for the relevant occupancy patterns.

For a slider on square $ s $ with occupancy $ occ $:

$$ \text{attacks}[s][\text{index}] \quad \text{where} \quad \text{index} = \frac{(occ \land \text{mask}[s]) \times \text{magic}[s]}{2^{(64 - \text{bits}[s])}} $$

Where:

  • $ \text{mask}[s] $ = relevant occupancy squares (excluding board edges)
  • $ \text{magic}[s] $ = pre-computed magic number
  • $ \text{bits}[s] $ = number of relevant bits

Magic Number Example

For a rook on e4 (square 28) the occupancy mask is:

. . . . . . . .     0 0 0 0 0 0 0 0
. . . . 1 . . .     0 0 0 0 1 0 0 0
. . . . 1 . . .     0 0 0 0 1 0 0 0
. . . . 1 . . .     0 0 0 0 1 0 0 0
. 1 1 1 . 1 1 .     0 1 1 1 0 1 1 0  ← Rook on e4
. . . . 1 . . .     0 0 0 0 1 0 0 0
. . . . 1 . . .     0 0 0 0 1 0 0 0
. . . . . . . .     0 0 0 0 0 0 0 0

This mask has 10 relevant bits, resulting in $ 2^{10} = 1024 $ possible occupancy patterns.

Note that the diagram shows the relevant occupancy mask, not the rook’s attack squares. Edge squares are deliberately excluded from the mask.

Edge squares don’t affect which squares are blocked. For example, on the e-file, if a piece is on e7, the rook is blocked before e8 regardless of whether e8 is occupied. So e8 never needs to be in the mask

The rook’s magic number for e4:

ROOK_MAGICS[28] = 0xd14880480100080ULL;

Lookup Tables

The engine pre-computes all attack patterns during initialization:

// Attack lookup tables
Bitboard BISHOP_ATTACKS[64][512];   // 64 squares × max 512 patterns
Bitboard ROOK_ATTACKS[64][4096];    // 64 squares × max 4096 patterns
Piece Max Relevant Bits Max Patterns Table Size
Bishop 9 512 32 KB
Rook 12 4096 256 KB
Total 21 4608 ~288 KB

Initialization Process

void initBishopRookAttackTables() {
    for (int square = 0; square < 64; square++) {
        // Initialize rook attack tables
        Bitboard rookMask = rookMovement(square);      // Get relevant squares
        int rookRelevantBits = ROOK_RELEVANT_BITS[square];
        int rookVariations = 1 << rookRelevantBits;   // 2^bits patterns
        
        for (int i = 0; i < rookVariations; i++) {
            // Generate specific occupancy pattern
            Bitboard occupancy = occupancyMask(i, rookRelevantBits, rookMask);
            
            // Compute magic index
            Bitboard index = (occupancy * ROOK_MAGICS[square]) >> (64 - rookRelevantBits);
            
            // Store pre-computed attacks
            ROOK_ATTACKS[square][index] = rookAttacksOnTheFly(square, occupancy);
        }
        
        // Initialize bishop attack tables
        Bitboard bishopMask = bishopMovement(square);     // Get relevant diagonal squares
        int bishopRelevantBits = BISHOP_RELEVANT_BITS[square];
        int bishopVariations = 1 << bishopRelevantBits;  // 2^bits patterns
        
        for (int i = 0; i < bishopVariations; i++) {
            // Generate specific occupancy pattern for diagonals
            Bitboard occupancy = occupancyMask(i, bishopRelevantBits, bishopMask);
            
            // Compute magic index
            Bitboard index = (occupancy * BISHOP_MAGICS[square]) >> (64 - bishopRelevantBits);
            
            // Store pre-computed attacks
            BISHOP_ATTACKS[square][index] = bishopAttacksOnTheFly(square, occupancy);
        }
    }
}

Runtime Attack Generation

At runtime, attack generation is a simple lookup:

Bitboard getRookAttacks(int square, Bitboard occupancy) {
    occupancy &= ROOK_MOVEMENT[square];           // Apply mask
    occupancy *= ROOK_MAGICS[square];             // Multiply by magic
    occupancy >>= 64 - ROOK_RELEVANT_BITS[square]; // Shift to index
    return ROOK_ATTACKS[square][occupancy];       // Table lookup
}

The time complexity is $ O(1) $, in just a few CPU cycles.

Relevant Bits per Square

The number of relevant bits varies by square position:

int ROOK_RELEVANT_BITS[64] = {
    12, 11, 11, 11, 11, 11, 11, 12,    // Rank 1: corners need 12 bits
    11, 10, 10, 10, 10, 10, 10, 11,    // Rank 2-7: edges need 11, center 10
    11, 10, 10, 10, 10, 10, 10, 11,
    11, 10, 10, 10, 10, 10, 10, 11,
    11, 10, 10, 10, 10, 10, 10, 11,
    11, 10, 10, 10, 10, 10, 10, 11,
    11, 10, 10, 10, 10, 10, 10, 11,
    12, 11, 11, 11, 11, 11, 11, 12     // Rank 8
};

Corner squares need more bits because they have longer rays.

Why “Magic”?

The term “magic” comes from the fact that these multipliers act as pseudo-perfect hash functions. They map all relevant occupancy patterns to unique indices with no collisions for that specific set of patterns. While not mathematically perfect hashes (collisions could occur with other bit patterns), they’re carefully selected to be collision-free for chess move generation.

Finding magic numbers requires:

  1. Trying random 64-bit numbers
  2. Testing all possible occupancy combinations for a square
  3. Verifying no hash collisions occur among valid patterns
  4. Confirming all patterns map correctly to precomputed attacks

This is done once through trial and error, and the resulting magics are hard-coded into the engine.

Performance Impact

Approach Attack Generation Time Memory Usage
Ray tracing ~50-100 cycles Minimal
Magic bitboards ~5-10 cycles ~288 KB
Speedup ~10-20x faster Acceptable

Magic bitboards are the standard technique in all modern chess engines.8

Position Evaluation

Evaluation Function Goal

The evaluation function assigns a numerical score to a position from the current side’s perspective:

$$ \text{eval}(p) = \begin{cases} +\infty & \text{if } p \text{ is checkmate for player} \ -\infty & \text{if } p \text{ is checkmate for opponent} \ 0 & \text{if } p \text{ is drawn/equal} \

0 & \text{if } p \text{ favors player} \ < 0 & \text{if } p \text{ favors opponent} \end{cases} $$

Evaluation Components

The engine uses material + piece-square tables evaluation:

$$ \text{eval}(p) = \sum_{i=0}^{11} \left( \text{count}_i \times (\text{material}_i + \text{positional}_i) \right) $$

Material Values

Standard centipawn values (1 pawn = 100 centipawns)69:

int PIECE_HEAD_VALUES[] = {
    100,    // White pawn
    320,    // White knight
    330,    // White bishop
    500,    // White rook
    900,    // White queen
    2000,   // White king (UNUSED)
    -100,   // Black pawn
    -320,   // Black knight
    -330,   // Black bishop
    -500,   // Black rook
    -900,   // Black queen
    -2000   // Black king (UNUSED)
};

King values are included in the array for structural completeness (matching the 12-piece enumeration) but are never used in actual evaluation since kings cannot be captured. The game ends before king material would factor into scoring.

These values reflect:

  • Bishop pair bonus: Bishops slightly more valuable than knights
  • Queen value: Less than two rooks to encourage trading

Piece-Square Tables

Piece-square tables (PST) encourage pieces to occupy strong squares10:

int PAWN_W_POS[] = {
     0,   0,   0,   0,   0,   0,   0,   0,  // Rank 1 (pawns can't be here)
     5,  10,  10, -20, -20,  10,  10,   5,  // Rank 2 (starting position)
     5,  -5, -10,   0,   0, -10,  -5,   5,  // Rank 3
     0,   0,   0,  20,  20,   0,   0,   0,  // Rank 4 (central control)
     5,   5,  10,  25,  25,  10,   5,   5,  // Rank 5 (advanced pawns)
    10,  10,  20,  30,  30,  20,  10,  10,  // Rank 6 (near promotion)
    50,  50,  50,  50,  50,  50,  50,  50,  // Rank 7 (almost promoting)
     0,   0,   0,   0,   0,   0,   0,   0   // Rank 8 (promotion square)
};

Knight PST penalizes rim squares, rewards center:

int KNIGHT_W_POS[] = {
    -50, -40, -30, -30, -30, -30, -40, -50,  // Rank 1 (back rank bad)
    -40, -20,   0,   5,   5,   0, -20, -40,  // Rank 2
    -30,   5,  10,  15,  15,  10,   5, -30,  // Rank 3
    -30,   0,  15,  20,  20,  15,   0, -30,  // Rank 4 (ideal squares)
    -30,   5,  15,  20,  20,  15,   5, -30,  // Rank 5
    -30,   0,  10,  15,  15,  10,   0, -30,  // Rank 6
    -40, -20,   0,   0,   0,   0, -20, -40,  // Rank 7
    -50, -40, -30, -30, -30, -30, -40, -50   // Rank 8 (back rank bad)
};

We encode this positions into the engines based in our knowledge and ideas of chess.

Evaluation Pseudocode

int evaluate(Board board, int gameResult) {
    if (gameResult == DRAW) return 0;
    if (gameResult == WHITE_WIN) return MAX_EVAL;
    if (gameResult == BLACK_WIN) return MIN_EVAL;
    
    int score = 0;
    
    // Iterate through all piece types
    for (int pieceType = 0; pieceType < 12; pieceType++) {
        Bitboard pieces = *(&board.pawn_w + pieceType);
        
        while (pieces) {
            int square = __builtin_ctzll(pieces);  // Get LSB position
            
            // Add material value
            score += PIECE_HEAD_VALUES[pieceType];
            
            // Add positional value
            score += PIECE_SQUARE_TABLES[pieceType][square];
            
            pieces &= pieces - 1;  // Clear LSB
        }
    }
    
    return score;
}

Evaluation Symmetry

Black piece-square tables are vertically flipped versions of white tables:

// Black pawn values are white values mirrored and negated
int PAWN_B_POS[64] = mirror_and_negate(PAWN_W_POS);

Other Advanced Evaluation Techniques

More sophisticated engines include:

Feature Description
Mobility Count legal moves
King safety Pawn shield, attack zones
Pawn structure Doubled, isolated, passed pawns
Bishop pair Bonus for having both bishops
Rook on open file Rooks on files without pawns
Endgame tables Different PSTs for endgame

Search Algorithm

Minimax Foundation

Chess engines use minimax - the assumption that both players play optimally. This engine implements the negamax variant, which simplifies implementation by treating both players symmetrically:

$$ \text{negamax}(p, d) = \begin{cases} \text{eval}(p) & \text{if } d = 0 \\ \max_{m \in \text{moves}(p)} -\text{negamax}(\text{apply}(p, m), d-1) & \text{otherwise} \end{cases} $$

The negation (-negamax) switches perspective between players: each player maximizes their own score, which is the negative of their opponent’s score. This is mathematically equivalent to classical minimax (which alternates between max and min nodes) but requires writing only one function.11112

Alpha-Beta Pruning

Alpha-beta pruning eliminates branches that cannot influence the final decision:

$$ \alpha = \text{best score player can guarantee} \\ \beta = \text{worst score opponent can force} $$

Pruning condition: If $ \alpha \geq \beta $, stop searching - opponent won’t allow this position.

            Root (α=-∞, β=+∞)
           /     |      \
      Move A  Move B  Move C
       /  \      |       |
     ...  ...   ...     ...
     
At Move A: after searching left child, α = 50
           Right child returns -60
           No update since -60 < 50
           
At Move B: Returns 70
           Update α = 70 (new best)
           
At Move C: First child returns 40
           Since 40 < α=70, skip remaining children
           Opponent won't allow this worse position

Alpha-Beta Implementation

static int alphabeta(Board board, int depth, int alpha, int beta, SearchContext* ctx) {
    ctx->nodesSearched++;
    int origAlpha = alpha;
    
    // Transposition table probe (check if we've seen this position)
    TTEntry entry = getTTEntry(board.hash);
    if (board.hash == entry.zobrist && entry.depth >= depth) {
        if (entry.nodeType == EXACT) {
            return entry.eval;  // Exact previous evaluation
        } else if (entry.nodeType == LOWER) {
            alpha = max(alpha, entry.eval);  // At least this good
        } else if (entry.nodeType == UPPER) {
            beta = min(beta, entry.eval);  // At most this good
        }
        
        if (alpha >= beta) {
            return entry.eval;  // Cutoff
        }
    }
    
    // Generate all legal moves
    Move moves[256];
    int moveCount = legalMoves(&board, moves);
    
    // Terminal node check (checkmate/stalemate/draw)
    int result = gameResult(board, moves, moveCount);
    if (result) {
        int eval = evaluate(board, result);
        // Prefer faster mates
        if (result != DRAW)
            eval += (ctx->maxDepth - depth) * (board.turn ? 1 : -1);
        return eval * (board.turn ? 1 : -1);
    }
    
    // Leaf node evaluation
    if (depth == 0) {
        return evaluate(board, result) * (board.turn ? 1 : -1);
    }
    
    // Move ordering (see next section)
    score_moves(board, entry, moves, moveCount);
    
    int eval = MIN_EVAL;
    Move bestMove;
    int nextMove;
    
    // Iterate through moves in order
    while ((nextMove = select_move(moves, moveCount)) != -1) {
        if (moves[nextMove].validation == LEGAL) {
            Board child = board;
            pushMove(&child, moves[nextMove]);
            
            // Recursive call with negated window
            int childEval = -alphabeta(child, depth - 1, -beta, -alpha, ctx);
            
            if (childEval > eval) {
                eval = childEval;
                bestMove = moves[nextMove];
                
                if (depth == ctx->maxDepth) {
                    ctx->bestMove = bestMove;  // Update root best move
                }
            }
            
            alpha = max(alpha, childEval);
            
            // Beta cutoff
            if (alpha >= beta) {
                break;
            }
        }
    }
    
    // Store in transposition table
    addTTEntry(board, eval, bestMove, depth, beta, origAlpha);
    
    return eval;
}

Search Tree Complexity

Without pruning, the search tree has branching factor $ b \approx 35 $ (average legal moves):

$$ \text{Nodes}(d) = b^d $$

Depth Nodes (unpruned) Alpha-Beta (est.) Reduction
4 1,500,625 ~50,000 30x
6 1,838,265,625 ~1,000,000 1800x
8 ~$ 2.25 \times 10^{12} $ ~$ 50 \times 10^6 $ ~45000x

Alpha-beta can achieve $ O(b^{d/2}) $ with perfect move ordering.213

Iterative Deepening

Rather than searching directly to the target depth, the engine searches depth 1, then depth 2, then depth 3, and so on until reaching the requested depth:

int search(Board board, int depth, SearchContext* ctx) {
    memset(&ctx->bestMove, 0, sizeof(Move));
    ctx->nodesSearched = 0;

    int eval = 0;
    for (int d = 1; d <= depth; d++) {
        ctx->maxDepth = d;
        eval = alphabeta(board, d, MIN_EVAL, MAX_EVAL, ctx);
        if (ctx->onDepthComplete)
            ctx->onDepthComplete(d, eval, ctx->nodesSearched);
    }
    return eval;
}

This looks wasteful — why repeat work already done at shallower depths? The answer is that the overhead is negligible. Because the search tree grows geometrically, the last iteration dominates:

$$ \text{Total work} = b^1 + b^2 + \cdots + b^d = b^d \cdot \frac{b - 1}{b^d - 1} \approx \frac{b}{b-1} \approx 1.03 \times b^d $$

For $ b \approx 35 $, the overhead is about 3%. In practice it’s closer to 20% because earlier depths are cheap but not free.

The real benefit comes from move ordering. After completing depth $ d $, the best move is stored in the transposition table. At depth $ d+1 $, score_moves() retrieves it and tries it first. This means the PV move is always searched before any other move — dramatically increasing the chance of early alpha-beta cutoffs. In the worst case (random move ordering) alpha-beta is $ O(b^d) $; with the TT move tried first it approaches the $ O(b^{d/2}) $ ideal.

$$ \text{Nodes}_{ID} \approx 0.2 \cdot b^d \quad \text{vs} \quad \text{Nodes}_{\text{single pass}} \approx b^d \quad \text{(in practice)} $$

The iterative deepening loop also enables per-depth info lines to be emitted over UCI, so GUIs can display search progress during a long think.

Mate Distance Bonus

When finding checkmate, the engine prefers faster mates:

if (result != DRAW) {
    eval += (ctx->maxDepth - depth) * (board.turn ? 1 : -1);
}

This adds a bonus proportional to how quickly the mate occurs:

  • Mate in 1 ply: bonus = maxDepth - 1
  • Mate in 10 ply: bonus = maxDepth - 10

Ensures mate-in-2 is preferred over mate-in-5. In real engines this is usually implemented with dedicated mate scores such as MATE - ply, but the idea is the same: preserve the winning result while ordering faster mates above slower ones and slower losses above immediate ones.

Advanced engines extend search with quiescence search - searching only captures/checks at leaf nodes to avoid horizon effect:

if (depth == 0) {
    return quiescence_search(board, alpha, beta);
}

This prevents bad evaluations when a capture is about to dramatically change the position.14

Transposition Tables

The Re-Search Problem

Chess positions can be reached via different move orders:

e4 → e5 → Nf3 → Nc6      (Position A)
    vs
Nf3 → Nc6 → e4 → e5      (Same Position A)

Without caching, the engine would re-evaluate identical positions multiple times, wasting computation.15

Search can revisit the same position through reversible move sequences, so repetition handling matters. Transposition tables reduce duplicated work, but they do not by themselves solve repetition semantics. Engines need history-aware draw detection and the halfmove clock. The relevant rules are threefold repetition and the fifty-move rule for claimable draws, with fivefold repetition and the seventy-five-move rule treated as automatic draws under current FIDE rules.20212223

Zobrist Hash as Key

The engine uses Zobrist hashing to assign an almost-unique 64-bit signature to a position. That signature serves as the transposition table key.516

Table Entry Structure

typedef struct {
    Bitboard zobrist;   // Position hash (key)
    int eval;           // Stored evaluation
    Move move;          // Best move from this position
    int depth;          // Search depth when evaluated
    int nodeType;       // EXACT, LOWER, or UPPER bound
} TTEntry;

Node Types

Type Meaning Usage
EXACT Evaluation is exact Use directly if depth sufficient
LOWER Evaluation is lower bound ($ \alpha $ cutoff) Update $ \alpha $
UPPER Evaluation is upper bound ($ \beta $ cutoff) Update $ \beta $

When a node is stored:

void addTTEntry(Board board, int eval, Move move, int depth, int beta, int alpha) {
    TTEntry entry;
    
    if (eval <= alpha)
        entry.nodeType = UPPER;  // Opponent can do better
    else if (eval >= beta)
        entry.nodeType = LOWER;  // We can do better
    else
        entry.nodeType = EXACT;  // Evaluation within window
    
    entry.eval = eval;
    entry.move = move;
    entry.depth = depth;
    entry.zobrist = board.hash;
    
    TT_TABLE[entry.zobrist % TT_SIZE] = entry;
}

Table Lookup

TTEntry getTTEntry(Bitboard zobrist) {
    return TT_TABLE[zobrist % TT_SIZE];
}

Collision handling: Simple replacement scheme - new entries overwrite old ones. More sophisticated engines use:

  • Two-tier tables (always replace vs depth-preferred)
  • Aging (prefer recent positions)
  • Bucket systems (multiple entries per hash)
TTEntry entry = getTTEntry(board.hash);

if (board.hash == entry.zobrist && entry.depth >= depth) {
    if (entry.nodeType == EXACT) {
        return entry.eval;  // Direct return
    } else if (entry.nodeType == LOWER) {
        alpha = max(alpha, entry.eval);
    } else if (entry.nodeType == UPPER) {
        beta = min(beta, entry.eval);
    }
    
    if (alpha >= beta) {
        return entry.eval;  // Cutoff
    }
}

Depth check: Only use stored evaluation if it was computed at equal or greater depth.

Memory Usage

const int TT_SIZE = 100000;
TTEntry TT_TABLE[100000];

With roughly 40 bytes per entry, this table uses:

$$ 100{,}000 \times 40 = 4{,}000{,}000 \text{ bytes} \approx 4 \text{ MB} $$

Modern engines use gigabytes of transposition table memory.

Performance Impact

Metric Without TT With TT Improvement
Nodes searched 10,000,000 2,000,000 5x reduction
Search depth 6 ply 8+ ply +2-3 ply
Move time 10s 2s 5x faster

Transposition tables are essential for modern chess engines.

Zobrist Hashing

Incremental Hash Requirement

A naive position hash (e.g., hashing all piece positions) would require $ O(n) $ computation per position. With millions of positions evaluated, this is too slow.

Solution: Zobrist hashing provides $ O(1) $ incremental updates.

Zobrist Technique

Pre-compute random 64-bit numbers for each:

  • Piece type on each square: PIECES[12][64]
  • Castling rights: CASTLE[16] (4 castling bits → $ 2^4 = 16 $ states)
  • En passant square: ENPASSANT[64]
  • Side to move: WHITE_MOVE
Bitboard PIECES[12][64];    // [piece_type][square]
Bitboard CASTLE[16];        // [castling_rights]
Bitboard ENPASSANT[64];     // [ep_square]
Bitboard WHITE_MOVE;

Hash Computation

The position hash is the XOR of all applicable components. This works because each factor that changes legal play must contribute its own random key: piece-square occupancy, castling rights, en passant state, and side to move. If any of those differ, the resulting signature should differ as well, which is exactly what the transposition table needs. In careful implementations, the en passant component is only hashed when an en passant capture is actually legal; otherwise two positions that are equivalent for repetition purposes can receive different keys.516

$$ H(p) = \bigoplus_{i} \text{PIECES}[t_i][s_i] \oplus \text{CASTLE}[c] \oplus \text{ENPASSANT}[e] \oplus \text{WHITE\_MOVE}^{turn} $$

Where $ \bigoplus $ denotes XOR.

Bitboard hash(const Board board) {
    Bitboard hash = 0LL;
    
    // XOR in castling rights
    hash ^= CASTLE[board.castling];
    
    // XOR in all pieces
    for (int i = 0; i < 12; i++) {
        Bitboard bb = *(&board.pawn_w + i);
        while (bb) {
            int square = __builtin_ctzll(bb);
            hash ^= PIECES[i][square];
            bb &= bb - 1;
        }
    }
    
    // XOR in en passant square
    if (board.epSquare != -1) {
        hash ^= ENPASSANT[board.epSquare];
    }
    
    // XOR in side to move
    if (board.turn == WHITE) {
        hash ^= WHITE_MOVE;
    }
    
    return hash;
}

Incremental Updates

When making a move, update the hash incrementally:

// Remove piece from origin square
board.hash ^= PIECES[pieceType][fromSquare];

// Add piece to destination square
board.hash ^= PIECES[pieceType][toSquare];

// If capture occurred, remove captured piece
if (capturedPiece != EMPTY) {
    board.hash ^= PIECES[capturedPiece][toSquare];
}

// Update en passant
if (board.epSquare != -1) {
    board.hash ^= ENPASSANT[oldEpSquare];  // Remove old
}
if (newEpSquare != -1) {
    board.hash ^= ENPASSANT[newEpSquare];  // Add new
}

// Update castling rights
board.hash ^= CASTLE[oldCastling];
board.hash ^= CASTLE[newCastling];

// Toggle side to move
board.hash ^= WHITE_MOVE;

XOR properties make this work:

  • $ a \oplus a = 0 $ (XORing twice removes)
  • $ a \oplus 0 = a $ (identity)
  • XOR is commutative and associative

Initialization

Random numbers are generated at engine startup:

void initZobrist() {
    for (int i = 0; i < 12; i++) {
        for (int j = 0; j < 64; j++) {
            PIECES[i][j] = randomBitboard();
        }
    }
    for (int i = 0; i < 64; i++) {
        ENPASSANT[i] = randomBitboard();
    }
    for (int i = 0; i < 16; i++) {
        CASTLE[i] = randomBitboard();
    }
    WHITE_MOVE = randomBitboard();
}

Bitboard randomBitboard(void) {
    Bitboard r = 0;
    for (int i = 0; i < 64; i++) {
        Bitboard tmp = (Bitboard)rand() % 2;
        r |= tmp << i;
    }
    return r;
}

Hash Collisions

With 64-bit hashes, collision probability is astronomically low:

$$ P(\text{collision}) \approx \frac{n^2}{2 \times 2^{64}} \approx \frac{n^2}{3.69 \times 10^{19}} $$

For $ n = 10^9 $ positions (deep search):

$$ P(\text{collision}) \approx \frac{10^{18}}{3.69 \times 10^{19}} \approx 0.027\% $$

Even with billions of positions, collisions are extremely rare (less than 3 in 10,000 searches). Modern engines accept this negligible risk, as verification costs would exceed collision costs.

Why Zobrist Works

Zobrist hashing has two critical properties:

  1. Uniqueness: Random XOR produces uniformly distributed hashes
  2. Incrementality: Move updates are $ O(1) $ XOR operations

This makes it the gold standard for chess position hashing.

Move Ordering

Why Move Ordering Matters

Alpha-beta pruning effectiveness depends on move order. Searching the best move first maximizes cutoffs:

$$ \text{Nodes}_{\text{perfect order}} = O(b^{d/2}) \quad \text{vs} \quad \text{Nodes}_{\text{random}} = O(b^d) $$

Example: At depth 6 with $ b=35 $:

  • Random order: ~1.8 billion nodes
  • Perfect order: ~43,000 nodes
  • ~42,000x speedup

Move Ordering Heuristics

The engine uses multiple heuristics to score moves:

void score_moves(const Board board, const TTEntry entry, Move moves[], int cmoves) {
    Move pvMove = {0};
    if (board.hash == entry.zobrist && entry.nodeType < UPPER)
        pvMove = entry.move;

    for (int i = 0; i < cmoves; i++) {
        Move* move = &moves[i];

        // 1. Hash move (best from previous iteration) gets highest priority
        if (both_moves_are_equal(pvMove, *move)) {
            move->score = MAX_MOVE_SCORE;  // 1000
        } else {
            bool isCapture = enemyOccupancy & SQUARE_BITBOARDS[move->toSquare];
            bool isEnPassant = board.epSquare == move->toSquare;

            if (isEnPassant) {
                move->score = PAWN_EXCHANGE_VALUE;  // equivalent to pawn captures pawn
            } else if (isCapture) {
                // 2. MVV-LVA: pre-scaled victim table minus attacker table
                move->score = VICTIM_PIECES[capturedPiece] - ATTACKING_PIECES[move->pieceType];
            }
            // Non-captures score 0 and are searched last
        }
    }
}

MVV-LVA (Most Valuable Victim - Least Valuable Attacker)

Prioritize captures by:

  1. Victim value (capturing queen > capturing pawn)
  2. Attacker value (pawn captures queen > queen captures pawn)

$$ \text{MVV-LVA}(m) = \text{VICTIM\_PIECES}[\text{victim}] - \text{ATTACKING\_PIECES}[\text{attacker}] $$

Victim values are pre-scaled (pawn = 1600, queen = 14400) and attacker values use a separate scale (pawn = 100, queen = 500), so high-value captures always outrank low-value ones regardless of attacker:

  • Pawn captures queen: $ 14400 - 100 = 14300 $
  • Queen captures pawn: $ 1600 - 500 = 1100 $

This encourages trading low-value pieces for high-value pieces.17

Using lookup tables (VICTIM_PIECES[] and ATTACKING_PIECES[]) avoids computing piece values inline and keeps the scoring branch-free. The large victim scale ensures capture ordering is dominated by the victim’s value, not the attacker’s.

Move Selection

Moves are selected using partial sorting - only the best unmoved move is found each iteration:

int select_move(Move moves[], int cmoves) {
    int bestScore = 0;
    int indx = -1;
    
    for (int i = 0; i < cmoves; i++) {
        if (!moves[i].exhausted && moves[i].score >= bestScore) {
            bestScore = moves[i].score;
            indx = i;
        }
    }
    
    if (indx != -1) {
        moves[indx].exhausted = true;
    }
    
    return indx;
}

This is more efficient than full sorting when alpha-beta cutoffs occur early.

Hash Move Priority

The hash move (best move from transposition table) gets highest priority:

if (entry.zobrist == board.hash) {
    score += 10000;  // Guaranteed first
}

This creates a positive feedback loop:

  1. Best move from previous search gets tried first
  2. Causes early cutoffs, confirming it’s strong
  3. Gets stored back in transposition table
  4. Prioritized again in future searches

Impact on Search Efficiency

Move Ordering Nodes @ Depth 8 Time
No ordering 4.2 billion 3min
Captures only 850 million 40s
+ Hash move 120 million 6s
+ MVV-LVA 45 million 2.3s
Full heuristics ~20 million ~1s

Good move ordering reduces nodes by ~200x.13171819

$ \theta $ UCI Protocol

What is UCI?

Universal Chess Interface (UCI) is a standard protocol for communication between chess engines and GUIs. It uses text-based commands over stdin/stdout.24

Basic UCI Flow

GUI: uci
Engine: id name ChessEngine
Engine: id author Felipe Lecot
Engine: uciok

GUI: isready
Engine: readyok

GUI: ucinewgame

GUI: position startpos moves e2e4 e7e5 g1f3
Engine: (sets up position)

GUI: go depth 10
Engine: info depth 1 score cp 20 nodes 150
Engine: info depth 2 score cp 15 nodes 850
Engine: ...
Engine: info depth 10 score cp 45 nodes 4823199
Engine: bestmove g1f3 ponder d8f6

UCI Commands

Command Purpose Example
uci Initialize UCI mode uci
isready Synchronization isready
ucinewgame Start new game ucinewgame
position Set position position startpos moves e2e4
go Start search go depth 10 or go movetime 5000
stop Stop search stop
quit Exit engine quit

Position Command

position [startpos | fen <fenstring>] [moves <move1> <move2> ...]

Examples:

position startpos
position startpos moves e2e4 e7e5
position fen rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1

Go Command Options

go [searchmoves <moves>] [depth <d>] [nodes <n>] [movetime <ms>]
   [wtime <ms>] [btime <ms>] [winc <ms>] [binc <ms>]

Common search modes:

  • go depth 10 - Search to depth 10
  • go movetime 5000 - Think for 5 seconds
  • go wtime 300000 btime 300000 - 5-minute time controls
  • go infinite - Search until stop command

Info Output

Engine reports search progress:

info depth <d> seldepth <sd> score <score> nodes <n> 
     time <ms> nps <nodes/sec> pv <move1> <move2> ...

Example:

info depth 10 score cp 45 nodes 4823199 time 2341 nps 2060000 pv e2e4 e7e5 g1f3
  • cp 45 - Score of 45 centipawns (0.45 pawns)
  • mate 3 - Checkmate in 3 moves
  • nps - Nodes per second (performance metric)
  • pv - Principal variation (best line)

Engine Implementation

char *ENGINE_NAME = "Basic Chess Engine";
char *AUTHOR = "Felipe Lecot";
const int DEFAULT_DEPTH = 7;

int main(void) {
    initBitboards();
    initZobrist();
    initMoveGeneration();
    initEvaluation();

    char input[2000];
    Board board = {};

    while (fgets(input, sizeof(input), stdin)) {
        if (strncmp(input, "uci", 3) == 0)
            printEngineInfo();
        else if (strncmp(input, "isready", 7) == 0)
            printf("readyok\n");
        else if (strncmp(input, "ucinewgame", 10) == 0)
            parsePosition("position startpos", &board);
        else if (strncmp(input, "position", 8) == 0)
            parsePosition(input, &board);
        else if (strncmp(input, "go", 2) == 0)
            parseGo(input, board);
        else if (strncmp(input, "quit", 4) == 0)
            break;
    }
}

void printEngineInfo(void) {
    printf("id name %s\n", ENGINE_NAME);
    printf("id author %s\n", AUTHOR);
    printf("uciok\n");
    fflush(stdout);
}

static void onDepthComplete(int depth, int eval, int nodes) {
    double ms = (double)(clock() - go_start) * 1000 / CLOCKS_PER_SEC;
    printf("info depth %d score cp %d nodes %d time %.0f\n", depth, eval, nodes, ms);
}

void parseGo(char *command, Board board) {
    int depth = DEFAULT_DEPTH;
    char *depthStr = strstr(command, "depth");
    if (depthStr) {
        int parsed = atoi(depthStr + 5);
        if (parsed > 0) depth = parsed;
    }

    SearchContext ctx = { .onDepthComplete = onDepthComplete };
    int eval = search(board, depth, &ctx);

    char san[6] = {0};
    moveToSan(ctx.bestMove, san);
    printf("bestmove %s\n", san);
}

Move Format

UCI uses long algebraic notation:

e2e4      (e-pawn moves from e2 to e4)
e7e8q     (e-pawn promotes to queen)
e1g1      (white kingside castle)
e1c1      (white queenside castle)

Castling is encoded as king captures rook square in FEN, but as king two squares in UCI.

GUI Integration

Popular chess GUIs supporting UCI:

  • Arena Chess GUI
  • ChessBase
  • Lichess (online)
  • Cute Chess
  • BanksiaGUI

These GUIs handle:

  • Position visualization
  • Clock management
  • Opening books
  • Engine tournaments

Performance Optimizations

Compiler Optimizations

CFLAGS = -O3 -march=native -flto
  • -O3: Aggressive optimization (loop unrolling, inlining, etc.)
  • -march=native: Use CPU-specific instructions (POPCNT, BMI2)
  • -flto: Link-time optimization (whole-program analysis)

Bitwise Intrinsics

Modern CPUs have hardware instructions for bit manipulation:

Operation C Code Intrinsic
Count bits Loop __builtin_popcountll()
Find LSB Loop __builtin_ctzll()
Find MSB Loop __builtin_clzll()
// Count pieces on bitboard
int count = __builtin_popcountll(bitboard);

// Find first piece
int square = __builtin_ctzll(bitboard);

// Remove first piece
bitboard &= bitboard - 1;  // Clear LSB trick

Lookup Tables vs Computation

Pre-compute expensive operations:

// Pre-computed
Bitboard attacks = KNIGHT_MOVEMENT[square];

// vs On-the-fly (much slower)
Bitboard attacks = computeKnightAttacks(square);

Memory-speed tradeoff: Tables use ~1 MB but provide ~100x speedup.

Move Structure Size

sizeof(Move) = 32 bytes  // Compact representation

Keeping moves small improves cache locality:

  • 256 moves = 8 KB (fits in L1 cache)
  • Faster iteration during move ordering

Pre-compute occupancy masks once per position:

void computeOccupancyMasks(Board* board) {
    board->occupancyWhite = board->pawn_w | board->knight_w | ...;
    board->occupancyBlack = board->pawn_b | board->knight_b | ...;
    board->occupancy = board->occupancyWhite | board->occupancyBlack;
}

Avoids recomputing these ORs in every getRookAttacks() call.

King Square Caching

int whiteKingSq;  // Cached king position
int blackKingSq;

Finding the king requires scanning a bitboard (ctzll). Caching eliminates this:

// Cached access (O(1))
int kingPos = board.whiteKingSq;

// vs Scanning (O(1) but slower constant)
int kingPos = __builtin_ctzll(board.king_w);

Static Evaluation Only at Leaves

Evaluation is only called at depth 0 or terminal nodes:

if (depth == 0 || gameOver) {
    return evaluate(board, result);
}

Don’t waste time evaluating at every node - search deeper instead.

Partial Move Sorting

Instead of fully sorting 40 moves:

// Partial sort: only find best unmoved move
int select_move(Move moves[], int count) {
    int bestIndex = -1;
    int bestScore = MIN_INT;
    
    for (int i = 0; i < count; i++) {
        if (!moves[i].exhausted && moves[i].score > bestScore) {
            bestScore = moves[i].score;
            bestIndex = i;
        }
    }
    
    moves[bestIndex].exhausted = true;
    return bestIndex;
}

Complexity:

  • Full sort: $ O(n \log n) $ upfront
  • Partial sort: $ O(n) $ per move, but alpha-beta often cuts off early

If only 5 moves are tried before cutoff, partial sorting is ~8x faster.

Performance Metrics

Typical performance on modern hardware:

Metric Value
Nodes per second ~2-5 million
Depth reached (5s) 8-10 ply
Transposition hits ~40-60%
Avg branching factor ~35 moves
Memory usage ~20 MB

Scaling with Depth

Depth Nodes Time NPS
4 50,000 0.02s 2.5M
6 1,000,000 0.4s 2.5M
8 50,000,000 20s 2.5M
10 2,000,000,000 13min 2.5M

Each 2-ply depth increase multiplies search time by ~50x.

References

  1. Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 4th ed. (Pearson, 2020), Ch. 5 “Adversarial Search and Games” - the minimax model, the worst-case opponent assumption, and the rationale for adversarial search.

  2. Donald E. Knuth and Ronald W. Moore, “An Analysis of Alpha-Beta Pruning,” Communications of the ACM 18, no. 10 (1975) - the original analysis of alpha-beta pruning and the $ O(b^{d/2}) $ best-case bound under ideal move ordering.

  3. David Silver et al., “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” DeepMind (2017), sec. 2 “Self-play” - tabula rasa reinforcement learning from rules alone.

  4. David Silver et al., “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” DeepMind (2017), sec. 3 “Neural network architecture” - the joint policy-and-value network used in AlphaZero-style systems.

  5. Albert L. Zobrist, “A New Hashing Method with Application for Game Playing,” Technical Report 88, University of Wisconsin (1970) - the original XOR-based incremental hashing scheme for game-tree search.

  6. Larry Kaufman, “The Evaluation of Material Imbalances,” Chess Life, 1999 - a widely cited empirical reference for centipawn piece values in classical evaluation.

  7. Chess Programming Wiki, “Bitboards” - LERF square mapping, occupancy masks, bit iteration, and the standard advantages over mailbox board representations.

  8. Chess Programming Wiki, “Magic Bitboards” - pseudo-perfect hashing for slider attacks, relevant occupancy masks, and runtime table lookup.

  9. Chess Programming Wiki, “Material” - common centipawn material baselines and their practical use in evaluation.

  10. Chess Programming Wiki, “Piece-Square Tables” - per-square positional bonuses and mirrored tables for black.

  11. Chess Programming Wiki, “Minimax” - adversarial search semantics and the maximizing-versus-minimizing game model.

  12. Chess Programming Wiki, “Negamax” - the score-negation reformulation that collapses minimax into one recursive routine.

  13. Chess Programming Wiki, “Alpha-Beta” - pruning windows, cutoffs, and the dependence of search size on move ordering.

  14. Chess Programming Wiki, “Quiescence Search” - the horizon effect, standing pat, and tactical extension beyond nominal leaf depth.

  15. Chess Programming Wiki, “Transposition Table” - table structure, bound types, replacement policies, and caching repeated positions.

  16. Chess Programming Wiki, “Zobrist Hashing” - practical piece-square keys, castling and en passant state, side-to-move keys, and collision considerations.

  17. Chess Programming Wiki, “MVV-LVA” - ordering captures by victim value and attacker cost.

  18. Chess Programming Wiki, “Killer Move Heuristic” - prioritizing non-captures that previously caused beta cutoffs at the same depth.

  19. Chess Programming Wiki, “History Heuristic” - scoring moves by accumulated cutoff history across the tree.

  20. Chess Programming Wiki, “Repetitions” - why chess search is graph-shaped rather than tree-shaped and why repetition detection needs history.

  21. Chess Programming Wiki, “Fifty-move Rule” - the role of the halfmove clock and draw handling in engine search.

  22. FIDE, Laws of Chess, effective 1 January 2023, Art. 9.2 and Art. 9.3 - threefold repetition and the fifty-move rule as claimable draws.

  23. FIDE, Laws of Chess, effective 1 January 2023, Art. 9.6 - fivefold repetition and the seventy-five-move rule as automatic draws.

  24. Stefan Meyer-Kahlen, “Description of the Universal Chess Interface (UCI)” - command definitions for uci, isready, position, go, stop, and bestmove, plus the info depth … score … pv output format.

  25. Official nnue-pytorch repository, docs/nnue.md - HalfKP and related feature encodings, network layout, and the modern Stockfish NNUE training pipeline.

  26. Leela Chess Zero developer documentation - policy-and-value outputs, PUCT-driven Monte Carlo Tree Search, and the overall AlphaZero-style engine architecture.

Bibliography

Table of contents