Chess Engines
An introductory guide to chess engines: board representation, move generation, evaluation, search algorithms, and performance optimizations.

Introduction to Chess Engines
This article is an exposition and explanation on how chess engines work, particularly this project.
A [[chess engine]] is a computer program that analyzes chess positions and determines the best move through systematic evaluation. Modern chess engines combine efficient board representation, rapid move generation, sophisticated evaluation functions, and deep search algorithms to achieve superhuman playing strength.
Position-Based Evaluation Philosophy
Traditional chess engines, including this one, operate on a fundamental principle: they evaluate individual board positions as static snapshots, determining whether a position is “good” or “bad” through quantifiable metrics (material count, piece placement, king safety, etc.). The engine doesn’t inherently understand chess strategy or plans-it evaluates millions of positions and selects moves leading to the best-scored positions.
This engine uses hand-crafted evaluation based on:
- Material counting (queen = 9 pawns, rook = 5 pawns, etc.)
- Piece-square tables (e.g. knights in center are better than on rim)
- Static position features
In contrast, modern neural network engines like Stockfish NNUE (Efficiently Updatable Neural Network) and Leela Chess Zero use learned evaluation functions that can implicitly recognize patterns, piece coordination, and positional motifs that emerge from training. These neural approaches evaluate positions by recognizing high-dimensional patterns “learned” from millions of games, rather than explicitly coded heuristics. Note that NNUE in Stockfish replaces only the evaluation function while retaining classical search (alpha-beta with all its pruning and extensions), whereas Leela uses a pure neural approach with Monte Carlo Tree Search.2526
Engine Architecture Spectrum
| Engine Type | Evaluation Method | Training Approach | Example |
|---|---|---|---|
| Classical | Hand-crafted heuristics | Human chess knowledge coded | This engine, early Stockfish |
| Hybrid (NNUE) | Neural network + search | Pre-trained on positions with classical engine evaluations | Stockfish 12+ |
| Pure Neural | Deep neural network | Self-play from random positions (tabula rasa) | AlphaZero, Leela Chess Zero |
Stockfish (modern) combines the best of both worlds, it uses the efficient search techniques described in this document (alpha-beta, transposition tables, move ordering) and replaces the hand-crafted leaf evaluation with NNUE trained on billions of positions evaluated by classical Stockfish. Importantly, NNUE only handles position evaluation-the classical search algorithm, pruning heuristics, and move ordering remain largely hand-crafted. This “bootstrap” approach leverages human chess insight encoded in both the classical evaluation and search techniques to accelerate training.
AlphaZero and Leela Chess Zero take a more radical approach, they start from zero knowledge (only the rules of chess) and learn entirely through self-play reinforcement learning. AlphaZero’s neural network learns to evaluate positions and suggest moves simultaneously, discovering chess principles independently without human input. This achieved superhuman play in just hours of training, with a more complex and natural playstyle, though requiring massive computational resources.326
$ \theta $ Mathematical View of Chess
From a mathematical perspective, chess can be formalized as an exploration of state space transformations:
Move as a Function
A move $ m $ is a function that transforms one board state to another:
The set of all legal positions $ \mathcal{G} $ forms the state space (estimated at between $ 10^{43} $ and $ 10^{47} $ reachable positions).
State Space as Vector Space
This is a conceptual abstraction useful for understanding neural network representations. Classical engines (including this one) operate on discrete structures (bitboards, integers, arrays) rather than continuous vector spaces.
Each position can be conceptually represented as a vector in a high-dimensional space:
Where:
- $ p_i \in \{0, 1, 2, \ldots, 11\} $ represents piece type on square $ i $ (0-11 for the 12 piece types, or use -1/12 for empty)
- $ c \in \{0, 1, 2, \ldots, 15\} $ encodes castling rights
- $ e \in \{-1, 0, 1, \ldots, 63\} $ is the en passant square
- $ h \in \{0, 1, \ldots, 50\} $ is the halfmove clock
Neural network engines (NNUE, Leela) use similar but more sophisticated encodings as input to their networks. Classical engines usually hand-design features such as material, mobility, king safety, pawn structure, and piece-square bonuses, then combine them explicitly in code. NNUE-style engines still start from a discrete board state, but transform it into sparse input features that a neural network can update efficiently after each move; AlphaZero-style systems instead feed richer spatial encodings into a deeper policy-and-value network.42526
Move Generation as Set Mapping
Legal move generation is a set-valued function:
The search space is better modeled as a directed graph with positions as nodes and moves as edges. In practice, transpositions and repetitions mean it is not strictly acyclic unless history-sensitive draw handling is included:20212223
Evaluation as Projection
The evaluation function projects the high-dimensional state space onto the real line:
Neural network evaluations can be viewed as learned non-linear projections:
More concretely, $ f_\theta $ is a learned nonlinear function approximator. It maps an encoded position to a scalar score, a win-draw-loss estimate, or both a value and move priors depending on the engine architecture. That is a more accurate way to think about neural evaluation than a binary separator between “good” and “bad” positions, because engine evaluation is continuous, contextual, and strongly dependent on side to move, tactical threats, and long-range coordination.426
Core Components
| Component | Purpose | Complexity |
|---|---|---|
| Board Representation | Store position state | $ O(1) $ access |
| Move Generation | Generate legal moves | $ O(n) $ per position |
| Evaluation Function | Score positions | $ O(1) $ |
| Search Algorithm | Find best move | $ O(b^d) $ exponential |
| Transposition Table | Cache positions | $ O(1) $ lookup |
The engine operates in a think-evaluate-move cycle:
Position → Generate Moves → Evaluate → Search → Best Move
↑ ↓
└──────────────────────────────────────────────┘
This loop assumes the opponent will always choose the strongest available reply within the search horizon. That worst-case assumption is exactly what makes minimax-style search robust, but it also means the engine is not explicitly optimizing for human fallibility, time pressure, or psychologically awkward positions unless extra heuristics or opponent models are added.111
Bitboard Representation
What are Bitboards?
A bitboard is a 64-bit unsigned integer where each bit represents a square on the chessboard. This representation enables highly efficient bitwise operations for move generation and board queries.7
typedef uint64_t Bitboard;
Board Mapping
The chessboard is mapped to bits in little-endian rank-file mapping:
a8 b8 c8 d8 e8 f8 g8 h8 8 | 56 57 58 59 60 61 62 63
a7 b7 c7 d7 e7 f7 g7 h7 7 | 48 49 50 51 52 53 54 55
a6 b6 c6 d6 e6 f6 g6 h6 6 | 40 41 42 43 44 45 46 47
a5 b5 c5 d5 e5 f5 g5 h5 5 | 32 33 34 35 36 37 38 39
a4 b4 c4 d4 e4 f4 g4 h4 4 | 24 25 26 27 28 29 30 31
a3 b3 c3 d3 e3 f3 g3 h3 3 | 16 17 18 19 20 21 22 23
a2 b2 c2 d2 e2 f2 g2 h2 2 | 8 9 10 11 12 13 14 15
a1 b1 c1 d1 e1 f1 g1 h1 1 | 0 1 2 3 4 5 6 7
-------------------------
a b c d e f g h
Board Structure
The engine uses piece-centric bitboards where each piece type has its own bitboard:
typedef struct {
// White pieces
Bitboard pawn_w, knight_w, bishop_w, rook_w, queen_w, king_w;
// Black pieces
Bitboard pawn_b, knight_b, bishop_b, rook_b, queen_b, king_b;
// Derived occupancy masks
Bitboard occupancy; // All pieces
Bitboard occupancyWhite; // White pieces
Bitboard occupancyBlack; // Black pieces
// Game state
int turn; // WHITE=1 or BLACK=0
int castling; // Bitfield: KQkq
int epSquare; // En passant target square
int halfmoves; // 50-move rule counter
int fullmoves; // Move number
// Cached king positions
int whiteKingSq;
int blackKingSq;
// Zobrist hash
Bitboard hash;
// Attack bitboard
Bitboard attacks;
} Board;
Basic Bitboard Operations
Bitboards are useful because the primitive operations of chess become primitive operations of the CPU. Placing pieces, removing them, finding occupancies, generating pushes, and iterating through piece lists. Instead of looping over 64 squares and branching repeatedly, the engine can often express the same logic with a handful of branch-light integer operations.
Two patterns show up constantly:
- Masking: isolate a subset of squares with
& - Iteration by least-significant bit: repeatedly extract one piece with
ctzll, then clear it withbb &= bb - 1
LSB iteration is especially important because it turns a dense 64-square board scan into a loop over only the occupied squares.
| Operation | Formula | Purpose |
|---|---|---|
| Set bit | bb |= (1ULL << square) |
Place piece |
| Clear bit | bb &= ~(1ULL << square) |
Remove piece |
| Toggle bit | bb ^= (1ULL << square) |
Flip state |
| Test bit | (bb >> square) & 1 |
Check occupancy |
| Count bits | __builtin_popcountll(bb) |
Count pieces |
| LSB index | __builtin_ctzll(bb) |
Find first piece |
Bitwise Advantages
Space Efficiency: 12 bitboards (64 bytes) store the entire position, compared to 64-element arrays (256+ bytes).
Parallel Operations: A single instruction can operate on all 64 squares simultaneously:
// Check if any pawns are on the 7th rank
bool pawnsOnSeventh = (pawn_w & RANK_7) != 0;
// Get all white pieces in one operation
Bitboard whitePieces = pawn_w | knight_w | bishop_w | rook_w | queen_w | king_w;
Example: Initial Position
White pawns on rank 2:
Bitboard pawn_w = 0x000000000000FF00;
Binary representation:
0000000000000000 (rank 8)
0000000000000000 (rank 7)
0000000000000000 (rank 6)
0000000000000000 (rank 5)
0000000000000000 (rank 4)
0000000000000000 (rank 3)
1111111111111111 (rank 2) ← Pawns here
0000000000000000 (rank 1)
Move Generation
Move Structure
Each move is encoded in a compact structure:
typedef struct {
int fromSquare; // Origin (0-63)
int toSquare; // Destination (0-63)
int promotion; // Promoted piece type (-1 if none)
int castle; // Castling type (0 if not castling)
int validation; // LEGAL, ILLEGAL, or NOT_VALIDATED
int pieceType; // Moving piece type
int score; // Move ordering heuristic
bool exhausted; // Search tracking flag
} Move;
Move Generation Pipeline
- Generate Pseudo-Legal Moves
- Apply Move on Copy Board
- Check if King is Under Attack
- Filter Illegal Moves (leaving king in check)
- Return Legal Move List
Piece Movement Patterns
Non-Sliding Pieces
Pre-computed lookup tables for knights, kings, and pawn attacks:
// Knight moves from each square
Bitboard KNIGHT_MOVEMENT[64];
// King moves from each square
Bitboard KING_MOVEMENT[64];
// Pawn attacks (split by direction and color)
Bitboard PAWN_W_ATTACKS_EAST[64];
Bitboard PAWN_W_ATTACKS_WEST[64];
Knight Movement from square $ s $:
Where moves must respect file boundaries to prevent wrapping.
Pawn Movement Logic
Pawns have the most complex movement rules:
void pawnSingleAndDblPushes(Board board, Bitboard* single, Bitboard* dbl) {
// Single push: shift pawns one rank forward
*single = board.turn ? board.pawn_w << 8 : board.pawn_b >> 8;
*single &= ~board.occupancy; // Remove blocked squares
// Double push: pawns on starting rank move two squares
*dbl = board.turn ?
(board.pawn_w & PAWN_START_WHITE) << 16 :
(board.pawn_b & PAWN_START_BLACK) >> 16;
// Can only double-push if both squares are empty
*dbl &= ~board.occupancy; // Destination must be empty
// Shift back to intermediate square, intersect with single-push
// to ensure path is clear, then shift forward again
*dbl = board.turn ? *dbl >> 8 : *dbl << 8;
*dbl &= *single; // Square in between must be empty
*dbl = board.turn ? *dbl << 8 : *dbl >> 8; // Restore to destination
}
/* Implementation comment: The intermediate shifts ensure the square between
* the starting position and destination is empty. While this could be
* optimized with a direct mask, this approach clearly expresses the logic. */
Pawn promotions generate four moves per promoting pawn (queen, rook, bishop, knight):
if (isPromoting) {
int promotions[] = {QUEEN_W, ROOK_W, BISHOP_W, KNIGHT_W};
for (int i = 0; i < 4; i++) {
Move move = {from, to, promotions[i], 0, turn ? PAWN_W : PAWN_B};
// Validate and add move
}
}
Special Moves
Castling
Castling requires checking multiple conditions:
bool canCastle =
(castling_rights & castle_flag) && // Legal right exists
!(castle_path & occupancy) && // Path is clear
!isSquareAttacked(board, kingSquare) && // Not in check
!isSquareAttacked(board, transitSquare); // Don't castle through check
En Passant
En passant captures are enabled when an opponent pawn double-pushes:
if (pawn_moved && abs(toSquare - fromSquare) == 16) {
board.epSquare = (fromSquare + toSquare) / 2; // Set target square
}
Attack Detection
To determine if a square is attacked:
bool isSquareAttacked(const Board board, int square) {
Bitboard sqBb = SQUARE_BITBOARDS[square];
Bitboard pawn = board.turn ? board.pawn_b : board.pawn_w;
Bitboard king = board.turn ? board.king_b : board.king_w;
Bitboard knight = board.turn ? board.knight_b : board.knight_w;
Bitboard bishop = board.turn ? board.bishop_b : board.bishop_w;
Bitboard rook = board.turn ? board.rook_b : board.rook_w;
Bitboard queen = board.turn ? board.queen_b : board.queen_w;
// Check queen attacks (both diagonal and orthogonal)
while (queen) {
int sq = __builtin_ctzll(queen);
// Queens attack like bishops and rooks combined
Bitboard attacks = getBishopAttacks(sq, board.occupancy);
attacks |= getRookAttacks(sq, board.occupancy);
if (attacks & sqBb) return true;
queen &= queen - 1; // Remove this queen, check next
}
// Check bishop attacks (diagonal)
while (bishop) {
int sq = __builtin_ctzll(bishop);
Bitboard attacks = getBishopAttacks(sq, board.occupancy);
if (attacks & sqBb) return true;
bishop &= bishop - 1;
}
// Check rook attacks (orthogonal)
while (rook) {
int sq = __builtin_ctzll(rook);
Bitboard attacks = getRookAttacks(sq, board.occupancy);
if (attacks & sqBb) return true;
rook &= rook - 1;
}
// Check knight attacks
while (knight) {
int sq = __builtin_ctzll(knight);
if (KNIGHT_MOVEMENT[sq] & sqBb) return true;
knight &= knight - 1;
}
// Check pawn attacks
while (pawn) {
int sq = __builtin_ctzll(pawn);
if (board.turn) {
// Check if black pawn attacks this square
if (PAWN_B_ATTACKS_EAST[sq] & sqBb) return true;
if (PAWN_B_ATTACKS_WEST[sq] & sqBb) return true;
} else {
// Check if white pawn attacks this square
if (PAWN_W_ATTACKS_EAST[sq] & sqBb) return true;
if (PAWN_W_ATTACKS_WEST[sq] & sqBb) return true;
}
pawn &= pawn - 1;
}
// Check king attacks
while (king) {
int sq = __builtin_ctzll(king);
if (KING_MOVEMENT[sq] & sqBb) return true;
king &= king - 1;
}
return false;
}
Magic Bitboards
Sliding pieces (bishops, rooks, queens) require ray-based attack generation that depends on board occupancy. A rook on d4 has different moves when the board is empty versus when pieces block its rays.
The natural approach is, for each square and occupancy pattern, compute attacks on-the-fly with ray tracing. But ray tracing is expensive. A position may need to check thousands of sliding piece attacks during search.
The solution is to pre-compute all possible attack patterns and use pseudo-perfect hashing to retrieve them in $ O(1) $ time.
Magic Bitboard Technique
Magic bitboards use a pseudo-perfect hash function (commonly called “magic numbers”) to map occupancy patterns to pre-computed attack sets. While collisions are theoretically possible, magic numbers are carefully selected offline to eliminate all collisions for the relevant occupancy patterns.
For a slider on square $ s $ with occupancy $ occ $:
Where:
- $ \text{mask}[s] $ = relevant occupancy squares (excluding board edges)
- $ \text{magic}[s] $ = pre-computed magic number
- $ \text{bits}[s] $ = number of relevant bits
Magic Number Example
For a rook on e4 (square 28) the occupancy mask is:
. . . . . . . . 0 0 0 0 0 0 0 0
. . . . 1 . . . 0 0 0 0 1 0 0 0
. . . . 1 . . . 0 0 0 0 1 0 0 0
. . . . 1 . . . 0 0 0 0 1 0 0 0
. 1 1 1 . 1 1 . 0 1 1 1 0 1 1 0 ← Rook on e4
. . . . 1 . . . 0 0 0 0 1 0 0 0
. . . . 1 . . . 0 0 0 0 1 0 0 0
. . . . . . . . 0 0 0 0 0 0 0 0
This mask has 10 relevant bits, resulting in $ 2^{10} = 1024 $ possible occupancy patterns.
Note that the diagram shows the relevant occupancy mask, not the rook’s attack squares. Edge squares are deliberately excluded from the mask.
Edge squares don’t affect which squares are blocked. For example, on the e-file, if a piece is on e7, the rook is blocked before e8 regardless of whether e8 is occupied. So e8 never needs to be in the mask
The rook’s magic number for e4:
ROOK_MAGICS[28] = 0xd14880480100080ULL;
Lookup Tables
The engine pre-computes all attack patterns during initialization:
// Attack lookup tables
Bitboard BISHOP_ATTACKS[64][512]; // 64 squares × max 512 patterns
Bitboard ROOK_ATTACKS[64][4096]; // 64 squares × max 4096 patterns
| Piece | Max Relevant Bits | Max Patterns | Table Size |
|---|---|---|---|
| Bishop | 9 | 512 | 32 KB |
| Rook | 12 | 4096 | 256 KB |
| Total | 21 | 4608 | ~288 KB |
Initialization Process
void initBishopRookAttackTables() {
for (int square = 0; square < 64; square++) {
// Initialize rook attack tables
Bitboard rookMask = rookMovement(square); // Get relevant squares
int rookRelevantBits = ROOK_RELEVANT_BITS[square];
int rookVariations = 1 << rookRelevantBits; // 2^bits patterns
for (int i = 0; i < rookVariations; i++) {
// Generate specific occupancy pattern
Bitboard occupancy = occupancyMask(i, rookRelevantBits, rookMask);
// Compute magic index
Bitboard index = (occupancy * ROOK_MAGICS[square]) >> (64 - rookRelevantBits);
// Store pre-computed attacks
ROOK_ATTACKS[square][index] = rookAttacksOnTheFly(square, occupancy);
}
// Initialize bishop attack tables
Bitboard bishopMask = bishopMovement(square); // Get relevant diagonal squares
int bishopRelevantBits = BISHOP_RELEVANT_BITS[square];
int bishopVariations = 1 << bishopRelevantBits; // 2^bits patterns
for (int i = 0; i < bishopVariations; i++) {
// Generate specific occupancy pattern for diagonals
Bitboard occupancy = occupancyMask(i, bishopRelevantBits, bishopMask);
// Compute magic index
Bitboard index = (occupancy * BISHOP_MAGICS[square]) >> (64 - bishopRelevantBits);
// Store pre-computed attacks
BISHOP_ATTACKS[square][index] = bishopAttacksOnTheFly(square, occupancy);
}
}
}
Runtime Attack Generation
At runtime, attack generation is a simple lookup:
Bitboard getRookAttacks(int square, Bitboard occupancy) {
occupancy &= ROOK_MOVEMENT[square]; // Apply mask
occupancy *= ROOK_MAGICS[square]; // Multiply by magic
occupancy >>= 64 - ROOK_RELEVANT_BITS[square]; // Shift to index
return ROOK_ATTACKS[square][occupancy]; // Table lookup
}
The time complexity is $ O(1) $, in just a few CPU cycles.
Relevant Bits per Square
The number of relevant bits varies by square position:
int ROOK_RELEVANT_BITS[64] = {
12, 11, 11, 11, 11, 11, 11, 12, // Rank 1: corners need 12 bits
11, 10, 10, 10, 10, 10, 10, 11, // Rank 2-7: edges need 11, center 10
11, 10, 10, 10, 10, 10, 10, 11,
11, 10, 10, 10, 10, 10, 10, 11,
11, 10, 10, 10, 10, 10, 10, 11,
11, 10, 10, 10, 10, 10, 10, 11,
11, 10, 10, 10, 10, 10, 10, 11,
12, 11, 11, 11, 11, 11, 11, 12 // Rank 8
};
Corner squares need more bits because they have longer rays.
Why “Magic”?
The term “magic” comes from the fact that these multipliers act as pseudo-perfect hash functions. They map all relevant occupancy patterns to unique indices with no collisions for that specific set of patterns. While not mathematically perfect hashes (collisions could occur with other bit patterns), they’re carefully selected to be collision-free for chess move generation.
Finding magic numbers requires:
- Trying random 64-bit numbers
- Testing all possible occupancy combinations for a square
- Verifying no hash collisions occur among valid patterns
- Confirming all patterns map correctly to precomputed attacks
This is done once through trial and error, and the resulting magics are hard-coded into the engine.
Performance Impact
| Approach | Attack Generation Time | Memory Usage |
|---|---|---|
| Ray tracing | ~50-100 cycles | Minimal |
| Magic bitboards | ~5-10 cycles | ~288 KB |
| Speedup | ~10-20x faster | Acceptable |
Magic bitboards are the standard technique in all modern chess engines.8
Position Evaluation
Evaluation Function Goal
The evaluation function assigns a numerical score to a position from the current side’s perspective:
$$ \text{eval}(p) = \begin{cases} +\infty & \text{if } p \text{ is checkmate for player} \ -\infty & \text{if } p \text{ is checkmate for opponent} \ 0 & \text{if } p \text{ is drawn/equal} \
0 & \text{if } p \text{ favors player} \ < 0 & \text{if } p \text{ favors opponent} \end{cases} $$
Evaluation Components
The engine uses material + piece-square tables evaluation:
Material Values
Standard centipawn values (1 pawn = 100 centipawns)69:
int PIECE_HEAD_VALUES[] = {
100, // White pawn
320, // White knight
330, // White bishop
500, // White rook
900, // White queen
2000, // White king (UNUSED)
-100, // Black pawn
-320, // Black knight
-330, // Black bishop
-500, // Black rook
-900, // Black queen
-2000 // Black king (UNUSED)
};
King values are included in the array for structural completeness (matching the 12-piece enumeration) but are never used in actual evaluation since kings cannot be captured. The game ends before king material would factor into scoring.
These values reflect:
- Bishop pair bonus: Bishops slightly more valuable than knights
- Queen value: Less than two rooks to encourage trading
Piece-Square Tables
Piece-square tables (PST) encourage pieces to occupy strong squares10:
int PAWN_W_POS[] = {
0, 0, 0, 0, 0, 0, 0, 0, // Rank 1 (pawns can't be here)
5, 10, 10, -20, -20, 10, 10, 5, // Rank 2 (starting position)
5, -5, -10, 0, 0, -10, -5, 5, // Rank 3
0, 0, 0, 20, 20, 0, 0, 0, // Rank 4 (central control)
5, 5, 10, 25, 25, 10, 5, 5, // Rank 5 (advanced pawns)
10, 10, 20, 30, 30, 20, 10, 10, // Rank 6 (near promotion)
50, 50, 50, 50, 50, 50, 50, 50, // Rank 7 (almost promoting)
0, 0, 0, 0, 0, 0, 0, 0 // Rank 8 (promotion square)
};
Knight PST penalizes rim squares, rewards center:
int KNIGHT_W_POS[] = {
-50, -40, -30, -30, -30, -30, -40, -50, // Rank 1 (back rank bad)
-40, -20, 0, 5, 5, 0, -20, -40, // Rank 2
-30, 5, 10, 15, 15, 10, 5, -30, // Rank 3
-30, 0, 15, 20, 20, 15, 0, -30, // Rank 4 (ideal squares)
-30, 5, 15, 20, 20, 15, 5, -30, // Rank 5
-30, 0, 10, 15, 15, 10, 0, -30, // Rank 6
-40, -20, 0, 0, 0, 0, -20, -40, // Rank 7
-50, -40, -30, -30, -30, -30, -40, -50 // Rank 8 (back rank bad)
};
We encode this positions into the engines based in our knowledge and ideas of chess.
Evaluation Pseudocode
int evaluate(Board board, int gameResult) {
if (gameResult == DRAW) return 0;
if (gameResult == WHITE_WIN) return MAX_EVAL;
if (gameResult == BLACK_WIN) return MIN_EVAL;
int score = 0;
// Iterate through all piece types
for (int pieceType = 0; pieceType < 12; pieceType++) {
Bitboard pieces = *(&board.pawn_w + pieceType);
while (pieces) {
int square = __builtin_ctzll(pieces); // Get LSB position
// Add material value
score += PIECE_HEAD_VALUES[pieceType];
// Add positional value
score += PIECE_SQUARE_TABLES[pieceType][square];
pieces &= pieces - 1; // Clear LSB
}
}
return score;
}
Evaluation Symmetry
Black piece-square tables are vertically flipped versions of white tables:
// Black pawn values are white values mirrored and negated
int PAWN_B_POS[64] = mirror_and_negate(PAWN_W_POS);
Other Advanced Evaluation Techniques
More sophisticated engines include:
| Feature | Description |
|---|---|
| Mobility | Count legal moves |
| King safety | Pawn shield, attack zones |
| Pawn structure | Doubled, isolated, passed pawns |
| Bishop pair | Bonus for having both bishops |
| Rook on open file | Rooks on files without pawns |
| Endgame tables | Different PSTs for endgame |
Search Algorithm
Minimax Foundation
Chess engines use minimax - the assumption that both players play optimally. This engine implements the negamax variant, which simplifies implementation by treating both players symmetrically:
The negation (-negamax) switches perspective between players: each player maximizes their own score, which is the negative of their opponent’s score. This is mathematically equivalent to classical minimax (which alternates between max and min nodes) but requires writing only one function.11112
Alpha-Beta Pruning
Alpha-beta pruning eliminates branches that cannot influence the final decision:
Pruning condition: If $ \alpha \geq \beta $, stop searching - opponent won’t allow this position.
Root (α=-∞, β=+∞)
/ | \
Move A Move B Move C
/ \ | |
... ... ... ...
At Move A: after searching left child, α = 50
Right child returns -60
No update since -60 < 50
At Move B: Returns 70
Update α = 70 (new best)
At Move C: First child returns 40
Since 40 < α=70, skip remaining children
Opponent won't allow this worse position
Alpha-Beta Implementation
static int alphabeta(Board board, int depth, int alpha, int beta, SearchContext* ctx) {
ctx->nodesSearched++;
int origAlpha = alpha;
// Transposition table probe (check if we've seen this position)
TTEntry entry = getTTEntry(board.hash);
if (board.hash == entry.zobrist && entry.depth >= depth) {
if (entry.nodeType == EXACT) {
return entry.eval; // Exact previous evaluation
} else if (entry.nodeType == LOWER) {
alpha = max(alpha, entry.eval); // At least this good
} else if (entry.nodeType == UPPER) {
beta = min(beta, entry.eval); // At most this good
}
if (alpha >= beta) {
return entry.eval; // Cutoff
}
}
// Generate all legal moves
Move moves[256];
int moveCount = legalMoves(&board, moves);
// Terminal node check (checkmate/stalemate/draw)
int result = gameResult(board, moves, moveCount);
if (result) {
int eval = evaluate(board, result);
// Prefer faster mates
if (result != DRAW)
eval += (ctx->maxDepth - depth) * (board.turn ? 1 : -1);
return eval * (board.turn ? 1 : -1);
}
// Leaf node evaluation
if (depth == 0) {
return evaluate(board, result) * (board.turn ? 1 : -1);
}
// Move ordering (see next section)
score_moves(board, entry, moves, moveCount);
int eval = MIN_EVAL;
Move bestMove;
int nextMove;
// Iterate through moves in order
while ((nextMove = select_move(moves, moveCount)) != -1) {
if (moves[nextMove].validation == LEGAL) {
Board child = board;
pushMove(&child, moves[nextMove]);
// Recursive call with negated window
int childEval = -alphabeta(child, depth - 1, -beta, -alpha, ctx);
if (childEval > eval) {
eval = childEval;
bestMove = moves[nextMove];
if (depth == ctx->maxDepth) {
ctx->bestMove = bestMove; // Update root best move
}
}
alpha = max(alpha, childEval);
// Beta cutoff
if (alpha >= beta) {
break;
}
}
}
// Store in transposition table
addTTEntry(board, eval, bestMove, depth, beta, origAlpha);
return eval;
}
Search Tree Complexity
Without pruning, the search tree has branching factor $ b \approx 35 $ (average legal moves):
| Depth | Nodes (unpruned) | Alpha-Beta (est.) | Reduction |
|---|---|---|---|
| 4 | 1,500,625 | ~50,000 | 30x |
| 6 | 1,838,265,625 | ~1,000,000 | 1800x |
| 8 | ~$ 2.25 \times 10^{12} $ | ~$ 50 \times 10^6 $ | ~45000x |
Alpha-beta can achieve $ O(b^{d/2}) $ with perfect move ordering.213
Iterative Deepening
Rather than searching directly to the target depth, the engine searches depth 1, then depth 2, then depth 3, and so on until reaching the requested depth:
int search(Board board, int depth, SearchContext* ctx) {
memset(&ctx->bestMove, 0, sizeof(Move));
ctx->nodesSearched = 0;
int eval = 0;
for (int d = 1; d <= depth; d++) {
ctx->maxDepth = d;
eval = alphabeta(board, d, MIN_EVAL, MAX_EVAL, ctx);
if (ctx->onDepthComplete)
ctx->onDepthComplete(d, eval, ctx->nodesSearched);
}
return eval;
}
This looks wasteful — why repeat work already done at shallower depths? The answer is that the overhead is negligible. Because the search tree grows geometrically, the last iteration dominates:
For $ b \approx 35 $, the overhead is about 3%. In practice it’s closer to 20% because earlier depths are cheap but not free.
The real benefit comes from move ordering. After completing depth $ d $, the best move is stored in the transposition table. At depth $ d+1 $, score_moves() retrieves it and tries it first. This means the PV move is always searched before any other move — dramatically increasing the chance of early alpha-beta cutoffs. In the worst case (random move ordering) alpha-beta is $ O(b^d) $; with the TT move tried first it approaches the $ O(b^{d/2}) $ ideal.
The iterative deepening loop also enables per-depth info lines to be emitted over UCI, so GUIs can display search progress during a long think.
Mate Distance Bonus
When finding checkmate, the engine prefers faster mates:
if (result != DRAW) {
eval += (ctx->maxDepth - depth) * (board.turn ? 1 : -1);
}
This adds a bonus proportional to how quickly the mate occurs:
- Mate in 1 ply: bonus =
maxDepth - 1 - Mate in 10 ply: bonus =
maxDepth - 10
Ensures mate-in-2 is preferred over mate-in-5. In real engines this is usually implemented with dedicated mate scores such as MATE - ply, but the idea is the same: preserve the winning result while ordering faster mates above slower ones and slower losses above immediate ones.
Quiescence Search
Advanced engines extend search with quiescence search - searching only captures/checks at leaf nodes to avoid horizon effect:
if (depth == 0) {
return quiescence_search(board, alpha, beta);
}
This prevents bad evaluations when a capture is about to dramatically change the position.14
Transposition Tables
The Re-Search Problem
Chess positions can be reached via different move orders:
e4 → e5 → Nf3 → Nc6 (Position A)
vs
Nf3 → Nc6 → e4 → e5 (Same Position A)
Without caching, the engine would re-evaluate identical positions multiple times, wasting computation.15
Search can revisit the same position through reversible move sequences, so repetition handling matters. Transposition tables reduce duplicated work, but they do not by themselves solve repetition semantics. Engines need history-aware draw detection and the halfmove clock. The relevant rules are threefold repetition and the fifty-move rule for claimable draws, with fivefold repetition and the seventy-five-move rule treated as automatic draws under current FIDE rules.20212223
Zobrist Hash as Key
Table Entry Structure
typedef struct {
Bitboard zobrist; // Position hash (key)
int eval; // Stored evaluation
Move move; // Best move from this position
int depth; // Search depth when evaluated
int nodeType; // EXACT, LOWER, or UPPER bound
} TTEntry;
Node Types
| Type | Meaning | Usage |
|---|---|---|
| EXACT | Evaluation is exact | Use directly if depth sufficient |
| LOWER | Evaluation is lower bound ($ \alpha $ cutoff) | Update $ \alpha $ |
| UPPER | Evaluation is upper bound ($ \beta $ cutoff) | Update $ \beta $ |
When a node is stored:
void addTTEntry(Board board, int eval, Move move, int depth, int beta, int alpha) {
TTEntry entry;
if (eval <= alpha)
entry.nodeType = UPPER; // Opponent can do better
else if (eval >= beta)
entry.nodeType = LOWER; // We can do better
else
entry.nodeType = EXACT; // Evaluation within window
entry.eval = eval;
entry.move = move;
entry.depth = depth;
entry.zobrist = board.hash;
TT_TABLE[entry.zobrist % TT_SIZE] = entry;
}
Table Lookup
TTEntry getTTEntry(Bitboard zobrist) {
return TT_TABLE[zobrist % TT_SIZE];
}
Collision handling: Simple replacement scheme - new entries overwrite old ones. More sophisticated engines use:
- Two-tier tables (always replace vs depth-preferred)
- Aging (prefer recent positions)
- Bucket systems (multiple entries per hash)
Table Probing in Search
TTEntry entry = getTTEntry(board.hash);
if (board.hash == entry.zobrist && entry.depth >= depth) {
if (entry.nodeType == EXACT) {
return entry.eval; // Direct return
} else if (entry.nodeType == LOWER) {
alpha = max(alpha, entry.eval);
} else if (entry.nodeType == UPPER) {
beta = min(beta, entry.eval);
}
if (alpha >= beta) {
return entry.eval; // Cutoff
}
}
Depth check: Only use stored evaluation if it was computed at equal or greater depth.
Memory Usage
const int TT_SIZE = 100000;
TTEntry TT_TABLE[100000];
With roughly 40 bytes per entry, this table uses:
Modern engines use gigabytes of transposition table memory.
Performance Impact
| Metric | Without TT | With TT | Improvement |
|---|---|---|---|
| Nodes searched | 10,000,000 | 2,000,000 | 5x reduction |
| Search depth | 6 ply | 8+ ply | +2-3 ply |
| Move time | 10s | 2s | 5x faster |
Transposition tables are essential for modern chess engines.
Zobrist Hashing
Incremental Hash Requirement
A naive position hash (e.g., hashing all piece positions) would require $ O(n) $ computation per position. With millions of positions evaluated, this is too slow.
Solution: Zobrist hashing provides $ O(1) $ incremental updates.
Zobrist Technique
Pre-compute random 64-bit numbers for each:
- Piece type on each square:
PIECES[12][64] - Castling rights:
CASTLE[16](4 castling bits → $ 2^4 = 16 $ states) - En passant square:
ENPASSANT[64] - Side to move:
WHITE_MOVE
Bitboard PIECES[12][64]; // [piece_type][square]
Bitboard CASTLE[16]; // [castling_rights]
Bitboard ENPASSANT[64]; // [ep_square]
Bitboard WHITE_MOVE;
Hash Computation
The position hash is the XOR of all applicable components. This works because each factor that changes legal play must contribute its own random key: piece-square occupancy, castling rights, en passant state, and side to move. If any of those differ, the resulting signature should differ as well, which is exactly what the transposition table needs. In careful implementations, the en passant component is only hashed when an en passant capture is actually legal; otherwise two positions that are equivalent for repetition purposes can receive different keys.516
Where $ \bigoplus $ denotes XOR.
Bitboard hash(const Board board) {
Bitboard hash = 0LL;
// XOR in castling rights
hash ^= CASTLE[board.castling];
// XOR in all pieces
for (int i = 0; i < 12; i++) {
Bitboard bb = *(&board.pawn_w + i);
while (bb) {
int square = __builtin_ctzll(bb);
hash ^= PIECES[i][square];
bb &= bb - 1;
}
}
// XOR in en passant square
if (board.epSquare != -1) {
hash ^= ENPASSANT[board.epSquare];
}
// XOR in side to move
if (board.turn == WHITE) {
hash ^= WHITE_MOVE;
}
return hash;
}
Incremental Updates
When making a move, update the hash incrementally:
// Remove piece from origin square
board.hash ^= PIECES[pieceType][fromSquare];
// Add piece to destination square
board.hash ^= PIECES[pieceType][toSquare];
// If capture occurred, remove captured piece
if (capturedPiece != EMPTY) {
board.hash ^= PIECES[capturedPiece][toSquare];
}
// Update en passant
if (board.epSquare != -1) {
board.hash ^= ENPASSANT[oldEpSquare]; // Remove old
}
if (newEpSquare != -1) {
board.hash ^= ENPASSANT[newEpSquare]; // Add new
}
// Update castling rights
board.hash ^= CASTLE[oldCastling];
board.hash ^= CASTLE[newCastling];
// Toggle side to move
board.hash ^= WHITE_MOVE;
XOR properties make this work:
- $ a \oplus a = 0 $ (XORing twice removes)
- $ a \oplus 0 = a $ (identity)
- XOR is commutative and associative
Initialization
Random numbers are generated at engine startup:
void initZobrist() {
for (int i = 0; i < 12; i++) {
for (int j = 0; j < 64; j++) {
PIECES[i][j] = randomBitboard();
}
}
for (int i = 0; i < 64; i++) {
ENPASSANT[i] = randomBitboard();
}
for (int i = 0; i < 16; i++) {
CASTLE[i] = randomBitboard();
}
WHITE_MOVE = randomBitboard();
}
Bitboard randomBitboard(void) {
Bitboard r = 0;
for (int i = 0; i < 64; i++) {
Bitboard tmp = (Bitboard)rand() % 2;
r |= tmp << i;
}
return r;
}
Hash Collisions
With 64-bit hashes, collision probability is astronomically low:
For $ n = 10^9 $ positions (deep search):
Even with billions of positions, collisions are extremely rare (less than 3 in 10,000 searches). Modern engines accept this negligible risk, as verification costs would exceed collision costs.
Why Zobrist Works
Zobrist hashing has two critical properties:
- Uniqueness: Random XOR produces uniformly distributed hashes
- Incrementality: Move updates are $ O(1) $ XOR operations
This makes it the gold standard for chess position hashing.
Move Ordering
Why Move Ordering Matters
Alpha-beta pruning effectiveness depends on move order. Searching the best move first maximizes cutoffs:
Example: At depth 6 with $ b=35 $:
- Random order: ~1.8 billion nodes
- Perfect order: ~43,000 nodes
- ~42,000x speedup
Move Ordering Heuristics
The engine uses multiple heuristics to score moves:
void score_moves(const Board board, const TTEntry entry, Move moves[], int cmoves) {
Move pvMove = {0};
if (board.hash == entry.zobrist && entry.nodeType < UPPER)
pvMove = entry.move;
for (int i = 0; i < cmoves; i++) {
Move* move = &moves[i];
// 1. Hash move (best from previous iteration) gets highest priority
if (both_moves_are_equal(pvMove, *move)) {
move->score = MAX_MOVE_SCORE; // 1000
} else {
bool isCapture = enemyOccupancy & SQUARE_BITBOARDS[move->toSquare];
bool isEnPassant = board.epSquare == move->toSquare;
if (isEnPassant) {
move->score = PAWN_EXCHANGE_VALUE; // equivalent to pawn captures pawn
} else if (isCapture) {
// 2. MVV-LVA: pre-scaled victim table minus attacker table
move->score = VICTIM_PIECES[capturedPiece] - ATTACKING_PIECES[move->pieceType];
}
// Non-captures score 0 and are searched last
}
}
}
MVV-LVA (Most Valuable Victim - Least Valuable Attacker)
Prioritize captures by:
- Victim value (capturing queen > capturing pawn)
- Attacker value (pawn captures queen > queen captures pawn)
Victim values are pre-scaled (pawn = 1600, queen = 14400) and attacker values use a separate scale (pawn = 100, queen = 500), so high-value captures always outrank low-value ones regardless of attacker:
- Pawn captures queen: $ 14400 - 100 = 14300 $
- Queen captures pawn: $ 1600 - 500 = 1100 $
This encourages trading low-value pieces for high-value pieces.17
Using lookup tables (VICTIM_PIECES[] and ATTACKING_PIECES[]) avoids computing piece values inline and keeps the scoring branch-free. The large victim scale ensures capture ordering is dominated by the victim’s value, not the attacker’s.
Move Selection
Moves are selected using partial sorting - only the best unmoved move is found each iteration:
int select_move(Move moves[], int cmoves) {
int bestScore = 0;
int indx = -1;
for (int i = 0; i < cmoves; i++) {
if (!moves[i].exhausted && moves[i].score >= bestScore) {
bestScore = moves[i].score;
indx = i;
}
}
if (indx != -1) {
moves[indx].exhausted = true;
}
return indx;
}
This is more efficient than full sorting when alpha-beta cutoffs occur early.
Hash Move Priority
The hash move (best move from transposition table) gets highest priority:
if (entry.zobrist == board.hash) {
score += 10000; // Guaranteed first
}
This creates a positive feedback loop:
- Best move from previous search gets tried first
- Causes early cutoffs, confirming it’s strong
- Gets stored back in transposition table
- Prioritized again in future searches
Impact on Search Efficiency
$ \theta $ UCI Protocol
What is UCI?
Universal Chess Interface (UCI) is a standard protocol for communication between chess engines and GUIs. It uses text-based commands over stdin/stdout.24
Basic UCI Flow
GUI: uci
Engine: id name ChessEngine
Engine: id author Felipe Lecot
Engine: uciok
GUI: isready
Engine: readyok
GUI: ucinewgame
GUI: position startpos moves e2e4 e7e5 g1f3
Engine: (sets up position)
GUI: go depth 10
Engine: info depth 1 score cp 20 nodes 150
Engine: info depth 2 score cp 15 nodes 850
Engine: ...
Engine: info depth 10 score cp 45 nodes 4823199
Engine: bestmove g1f3 ponder d8f6
UCI Commands
| Command | Purpose | Example |
|---|---|---|
uci |
Initialize UCI mode | uci |
isready |
Synchronization | isready |
ucinewgame |
Start new game | ucinewgame |
position |
Set position | position startpos moves e2e4 |
go |
Start search | go depth 10 or go movetime 5000 |
stop |
Stop search | stop |
quit |
Exit engine | quit |
Position Command
position [startpos | fen <fenstring>] [moves <move1> <move2> ...]
Examples:
position startpos
position startpos moves e2e4 e7e5
position fen rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1
Go Command Options
go [searchmoves <moves>] [depth <d>] [nodes <n>] [movetime <ms>]
[wtime <ms>] [btime <ms>] [winc <ms>] [binc <ms>]
Common search modes:
go depth 10- Search to depth 10go movetime 5000- Think for 5 secondsgo wtime 300000 btime 300000- 5-minute time controlsgo infinite- Search untilstopcommand
Info Output
Engine reports search progress:
info depth <d> seldepth <sd> score <score> nodes <n>
time <ms> nps <nodes/sec> pv <move1> <move2> ...
Example:
info depth 10 score cp 45 nodes 4823199 time 2341 nps 2060000 pv e2e4 e7e5 g1f3
cp 45- Score of 45 centipawns (0.45 pawns)mate 3- Checkmate in 3 movesnps- Nodes per second (performance metric)pv- Principal variation (best line)
Engine Implementation
char *ENGINE_NAME = "Basic Chess Engine";
char *AUTHOR = "Felipe Lecot";
const int DEFAULT_DEPTH = 7;
int main(void) {
initBitboards();
initZobrist();
initMoveGeneration();
initEvaluation();
char input[2000];
Board board = {};
while (fgets(input, sizeof(input), stdin)) {
if (strncmp(input, "uci", 3) == 0)
printEngineInfo();
else if (strncmp(input, "isready", 7) == 0)
printf("readyok\n");
else if (strncmp(input, "ucinewgame", 10) == 0)
parsePosition("position startpos", &board);
else if (strncmp(input, "position", 8) == 0)
parsePosition(input, &board);
else if (strncmp(input, "go", 2) == 0)
parseGo(input, board);
else if (strncmp(input, "quit", 4) == 0)
break;
}
}
void printEngineInfo(void) {
printf("id name %s\n", ENGINE_NAME);
printf("id author %s\n", AUTHOR);
printf("uciok\n");
fflush(stdout);
}
static void onDepthComplete(int depth, int eval, int nodes) {
double ms = (double)(clock() - go_start) * 1000 / CLOCKS_PER_SEC;
printf("info depth %d score cp %d nodes %d time %.0f\n", depth, eval, nodes, ms);
}
void parseGo(char *command, Board board) {
int depth = DEFAULT_DEPTH;
char *depthStr = strstr(command, "depth");
if (depthStr) {
int parsed = atoi(depthStr + 5);
if (parsed > 0) depth = parsed;
}
SearchContext ctx = { .onDepthComplete = onDepthComplete };
int eval = search(board, depth, &ctx);
char san[6] = {0};
moveToSan(ctx.bestMove, san);
printf("bestmove %s\n", san);
}
Move Format
UCI uses long algebraic notation:
e2e4 (e-pawn moves from e2 to e4)
e7e8q (e-pawn promotes to queen)
e1g1 (white kingside castle)
e1c1 (white queenside castle)
Castling is encoded as king captures rook square in FEN, but as king two squares in UCI.
GUI Integration
Popular chess GUIs supporting UCI:
- Arena Chess GUI
- ChessBase
- Lichess (online)
- Cute Chess
- BanksiaGUI
These GUIs handle:
- Position visualization
- Clock management
- Opening books
- Engine tournaments
Performance Optimizations
Compiler Optimizations
CFLAGS = -O3 -march=native -flto
-O3: Aggressive optimization (loop unrolling, inlining, etc.)-march=native: Use CPU-specific instructions (POPCNT, BMI2)-flto: Link-time optimization (whole-program analysis)
Bitwise Intrinsics
Modern CPUs have hardware instructions for bit manipulation:
| Operation | C Code | Intrinsic |
|---|---|---|
| Count bits | Loop | __builtin_popcountll() |
| Find LSB | Loop | __builtin_ctzll() |
| Find MSB | Loop | __builtin_clzll() |
// Count pieces on bitboard
int count = __builtin_popcountll(bitboard);
// Find first piece
int square = __builtin_ctzll(bitboard);
// Remove first piece
bitboard &= bitboard - 1; // Clear LSB trick
Lookup Tables vs Computation
Pre-compute expensive operations:
// Pre-computed
Bitboard attacks = KNIGHT_MOVEMENT[square];
// vs On-the-fly (much slower)
Bitboard attacks = computeKnightAttacks(square);
Memory-speed tradeoff: Tables use ~1 MB but provide ~100x speedup.
Move Structure Size
sizeof(Move) = 32 bytes // Compact representation
Keeping moves small improves cache locality:
- 256 moves = 8 KB (fits in L1 cache)
- Faster iteration during move ordering
Legal Move Caching
Pre-compute occupancy masks once per position:
void computeOccupancyMasks(Board* board) {
board->occupancyWhite = board->pawn_w | board->knight_w | ...;
board->occupancyBlack = board->pawn_b | board->knight_b | ...;
board->occupancy = board->occupancyWhite | board->occupancyBlack;
}
Avoids recomputing these ORs in every getRookAttacks() call.
King Square Caching
int whiteKingSq; // Cached king position
int blackKingSq;
Finding the king requires scanning a bitboard (ctzll). Caching eliminates this:
// Cached access (O(1))
int kingPos = board.whiteKingSq;
// vs Scanning (O(1) but slower constant)
int kingPos = __builtin_ctzll(board.king_w);
Static Evaluation Only at Leaves
Evaluation is only called at depth 0 or terminal nodes:
if (depth == 0 || gameOver) {
return evaluate(board, result);
}
Don’t waste time evaluating at every node - search deeper instead.
Partial Move Sorting
Instead of fully sorting 40 moves:
// Partial sort: only find best unmoved move
int select_move(Move moves[], int count) {
int bestIndex = -1;
int bestScore = MIN_INT;
for (int i = 0; i < count; i++) {
if (!moves[i].exhausted && moves[i].score > bestScore) {
bestScore = moves[i].score;
bestIndex = i;
}
}
moves[bestIndex].exhausted = true;
return bestIndex;
}
Complexity:
- Full sort: $ O(n \log n) $ upfront
- Partial sort: $ O(n) $ per move, but alpha-beta often cuts off early
If only 5 moves are tried before cutoff, partial sorting is ~8x faster.
Performance Metrics
Typical performance on modern hardware:
| Metric | Value |
|---|---|
| Nodes per second | ~2-5 million |
| Depth reached (5s) | 8-10 ply |
| Transposition hits | ~40-60% |
| Avg branching factor | ~35 moves |
| Memory usage | ~20 MB |
Scaling with Depth
| Depth | Nodes | Time | NPS |
|---|---|---|---|
| 4 | 50,000 | 0.02s | 2.5M |
| 6 | 1,000,000 | 0.4s | 2.5M |
| 8 | 50,000,000 | 20s | 2.5M |
| 10 | 2,000,000,000 | 13min | 2.5M |
Each 2-ply depth increase multiplies search time by ~50x.