Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

Kaggle Code Golf Competition

Avatar for Yuki Imajuku Yuki Imajuku
November 27, 2025
230

Kaggle Code Golf Competition

2025年11月27日に行われたMAP, Jigsaw, Code Golf 振り返り会 by 関東Kaggler会 (https://kanto-kaggler.connpass.com/event/375336/) にて発表した、NeurIPS 2025 - Google Code Golf Championship (https://www.kaggle.com/competitions/google-code-golf-2025) の振り返り資料です。

Avatar for Yuki Imajuku

Yuki Imajuku

November 27, 2025
Tweet

Transcript

  1. ࢀՃͨ͠ཧ༝ͱܦҢ ⚫︎ "3$"(*ʹ͋Δఔ౓ͷυϝΠϯ஌͕ࣝ͋ͬͨ ⚪︎ "#.$54Ͱѻ͓ͬͯΓλεΫͷงғؾ͕෼͔Δ ⚪︎ "3$(&/΍-"3$ͱ͍ͬͨ༗ӹσʔλͷଘࡏ ⚫︎ "*ΤʔδΣϯτͱͷ૬ੑ͕ྑͦ͞͏ͩͬͨ ⚪︎

    ݫີͳείΞܭࢉ͕Մೳ ⚪︎ ධՁ͕୹࣌ؒͰऴΘΔ ⚪︎ (16Λ࢖Θͳ͍ ⚫︎ ৆ۚ૯ֹʹ໨͕ᚶΜͩ "*ʹཔΓͭͭΏΔʙ͘ࢀՃ͢Δ͜ͱΛ໨ඪʹ5FBN݁੒ʂ
  2. ࢼͨ͜͠ͱʢ݄d݄ʣ "*ΤʔδΣϯτΛ࢖͏্ͰධՁؔ਺͕௒େࣄͳͷͰ੔උ ⚫︎ ຊ൪ͷධՁ؀ڥͱҰகͤ͞Δ ⚪︎ ഑෍͞Ε͍ͯΔධՁίʔυΛݩʹධՁؔ਺Λ࣮૷ ⚪︎ VWΛ࢖ͬͯ1ZUIPOόʔδϣϯΛἧ͑Δ ⚫︎ ධՁʹ͔͚Δ࣌ؒΛ୹͘͢Δ

    ⚪︎ ෳ਺ͷ໰୊ΛฒྻධՁʢέʔεฒྻ΋͋Δͱྑ͔͔ͬͨ΋ʁʣ ⚫︎ ෇ਵͯ͠ѹॖΞϧΰϦζϜ΋੔උ ⚪︎ ඪ४ϥΠϒϥϦ࢓༷ΛಡΈړͬͯෳ਺ѹॖΞϧΰϦζϜΛࢼ͢ ⚪︎ ݁Ռతʹ;PQGMJ͕࠷ڧͩͬͨʜʜ ⚫︎ ίʔυͷߋ৽Λ(JU)VC"DUJPOTͰ؅ཧ ⚪︎ ࣗಈͰ࠷ྑղΛߋ৽ˍެ։είΞूͱͷࠩ෼ΛࣗಈͰܭࢉ
  3. ϓϩϯϓτʢ$PEF(PMGͷςΫχοΫूʣ This document summarizes code implementation techniques for the ARC

    AGI Challenge, focusing on Python 3.11 code golf. The goal is to implement a function `p(grid: list[list[int]]) -> list[list[int]]` with the minimum number of characters, using only standard libraries. ### **1. Fundamental Grid Representations** Choosing the right data structure is a crucial initial step. The optimal choice depends on the task's nature. * **List of Lists (LoL):** The native format, ideal for row-based operations and slicing (`grid[y]`, `grid[y:y+h]`). However, pixel access (`grid[y][x]`) and nested loops can be verbose. * **Flattened List:** A 1D list representation is excellent for operations that scan the entire grid, as it allows for a single loop. Coordinates can be derived from an index `i` using `y = i // width` and `x = i % width`. * **Dictionary of Coordinates:** Best suited for sparse grids where most cells are the background color. It simplifies iterating over non-background objects. * **Complex Numbers for Coordinates:** A highly compact and powerful variation is to use complex numbers (`z = x + 1j*y`) as dictionary keys. This simplifies geometric operations: translation becomes `z + t`, 90° rotation is `z * 1j`, and neighbors are `z ± 1` and `z ± 1j`. ### **2. Core Grid Manipulations: The One-Liner Toolkit** Geometric transformations are fundamental in ARC. These can be implemented concisely using `zip` and slicing. | Transformation | Golfed Python 3.11 Code | Description | | :--- | :--- | :--- | | **Transpose** | `t = [*zip(*g)]` | Swaps rows and columns. | | **Rotate 90° Clockwise** | `r = [*zip(*g[::-1])]` | Achieved by a vertical flip followed by a transpose. | | **Rotate 90° Counter-Clockwise** | `r = [*zip(*g)][::-1]` | Achieved by a transpose followed by a vertical flip. | | **Rotate 180°** | `r = [r[::-1] for r in g[::-1]]` | A vertical flip followed by a horizontal flip of each row. | | **Horizontal Flip (Vertical Mirror)** | `m = [r[::-1] for r in g]` | Reverses the elements in each row. | | **Vertical Flip (Horizontal Mirror)**| `m = g[::-1]` | Reverses the order of rows. | ### **3. Data Interrogation and Analysis** Efficiently extracting features from the grid is a common first step. * **Finding Coordinates:** A list comprehension is the standard for finding the coordinates of pixels of a certain color. * All non-zero coordinates: `p=[(x,y) for y,r in enumerate(g) for x,c in enumerate(r) if c]`. * **Bounding Box Detection:** After getting the list of non-zero coordinates `p`, the bounding box can be found efficiently. * `xs, ys = zip(*p)` * `bbox = min(xs), min(ys), max(xs), max(ys)` * **Color Analysis:** The `collections.Counter` class is the most effective tool for analyzing color distributions. * To create a color frequency map: `C = Counter(itertools.chain.from_iterable(g))`. * To find the most common color (often the background): `C.most_common(1)[0][0]`. A shorter but potentially less efficient alternative for flat lists is `max(flat_list, key=flat_list.count)`. ### **4. Leveraging Standard Libraries for Advanced Patterns** **`itertools`: The Combination and Iteration Powerhouse** The `itertools` module provides highly optimized tools for complex iteration patterns. * `chain.from_iterable(g)`: The most efficient way to flatten a grid into a single iterable. * `product()`: Replaces nested `for` loops for grid traversal, saving characters from indentation: `for y,x in product(range(h), range(w)):`. * `groupby()`: Perfect for run-length encoding (RLE) or finding contiguous blocks of the same color in a row. Note that it only groups consecutive identical elements. * `permutations()` and `combinations()`: Essential for tasks involving combinatorial search, like finding color mappings or object pairings. * **Chunking Data:** Since `itertools.batched` is not available in Python 3.11, the idiom `zip(*[iter(flat_list)]*width)` is used to split a flattened list back into chunks (rows). **`collections`: Specialized Data Structures** This module offers high-performance container datatypes. * `Counter`: As mentioned, this is the go-to for counting hashable objects like colors. * `defaultdict(list)`: This is crucial for shifting from a pixel-based to an object-based representation. It allows grouping all pixel coordinates by their color with a simple list comprehension: `d=defaultdict(list);[d[c].append((x,y)) for y,r in enumerate(g) for x,c in enumerate(r) if c]`. ### **5. Leveraging Recursion** Probably the biggest boost is to take full advantage of recursion. Since you always have a mandatory `p` function, you'll always have an easy way to use it. Recursion can essentially replace a loop, and if you pass extra parameters in (and setting the default values), you have a lot of control over what gets passed through. In addition to looping, you can reuse structures in your code, such as iterating over the input matrix. You can also rotate or transpose your matrix each time, such that you perform an operation of each orientation. General recursion template for a set number of iterations: ```python p=lambda g,k=3:-k*i or p([[y for y in x]for x in zip(g[::-1])],k-1) ``` Remember you have the freedom to rearrange a lot of these elements depending on what you need, for example you can make the transformation on the way up the stack instead, or you can change the starting value or delta of `k`. You can change the behavior of the inner loop depending on the value of `k`, such that you have some special modification at the base case. This template alone is useful enough, especially so when it allows you to switchover to a shorter lambda definition. For example, you can write a flood fill like this: ```python p=lambda a,k=75:a*-k or p([[(v:=x or v&8,4-x//2)[k<1]for x in(8,)+r][:0:-1]for r in zip(*a)],k-1) ``` - Technique 1: The outer `p` function is the flood fill loop, which recurses until `k` reaches 0. `a*-k or ...` is falsy for `k≥0` (empty list), so recursion continues; when `k<0`, it’s truthy and returns the last built grid. Starting from `k=75` just guarantees enough passes. - Technique 2: Each step processes `for r in zip(*a)` (transpose = rotate 90°). The list `[(... ) for x in (8,)+r][:0:-1]` prepends a **sentinel `8`**, then **drops it and reverses**; over iterations, the transpose+reverse combo effectively sweeps from all four directions. - Technique 3: The expression `(v := x or v & 8)` is selected: - If `x==0`, it becomes `v & 8` → keeps `8` flowing through consecutive zeros. - If `x==3`, it becomes `3` → blocks `8` (since `3 & 8 == 0`), so the marker doesn’t pass walls. - The leading sentinel `(8,) + r` seeds the flow from the border side in that sweep. - Technique 4: Final remap `(v, 4 - x // 2)[k < 1]` is efficient: - For `k≥1`, it yields `v` (the flood marker). - For `k<1`, it yields `4 - x // 2` (the wall marker) to map `8 → 0` (outside-reachable zeros revert to zero), `3 → 3` (walls unchanged) and `0 → 4` (**enclosed** zeros become `4`). You can also try to use recursion to do things not only for flood fill but also for your task. ### **6. Regex for Pattern Matching and Extraction** You don't typically expect string manipulation to be helpful when transforming 2D matrices, but regex and the like can be extremely terse, mostly when combined with an `eval`. You don't even have to `import re` to take full advantage of some some of Python's string functions, such as `replace` or using `in`. With regex it's even better, allowing more complex transformations and conditions, as long as you can get your head around how many characters are between elements of a row (this is more effective on tasks where the input array is always the same size, so keep an eye out). I'd also like to give a quick shoutout to byte array literals of the form `b'abc'`, which can be a terse way of expressing integer lists. * **Turn the grid into text:** `str(g)` converts the list-of-lists to a Python literal string with consistent delimiters `", "`. `eval(str(g))` converts it back. When using `eval()` to reconstruct, `","` delimiters can be used to reduce the byte count. * **Regex scan-line test for a donut:** `re.split(r"([^0]), (0, )+\\1")` matches a substring like `"6, 0, 0, 6"` somewhere in the stringified grid. - Technique 1: `([^0])` captures a single non-zero digit (shorter than `[1-9]`). - Technique 2: `(0, )+` — a run of one or more `"0, "` entries (zeros in a row). - Technique 3: `\\1` — the same digit as the first capture (backreference). - Technique 4: `re.split` with capturing groups returns the captures in the result list; index `[1]` is the first captured digit. `int(...)` converts it, and `[[...]]` wraps to the required shape. * **Regex to replace the middle:** `re.sub('1, 0(?=, 1)','1,2',str(g))` replaces `"1, 0, 1"` with `"1, 2, 1"` throughout the grid string. - Technique 1: `(?=, 1)` matches the substring `"1, 0"` only when it is immediately followed by `", 1"`. The `(?=...)` is a positive lookahead that asserts `", 1"` comes next without consuming it. Thus the full logical target is `1, 0, 1`, but the replacement only touches the middle `0`. - Technique 2: Because the lookahead doesn’t consume the right-hand `1`, sequences like `1,0,1,0,1` yield multiple independent matches, correctly producing `1,2,1,2,1`. * **Regex for run-length:** `re.sub("(?<!1, )1(, 0)+, 1",lambda m:re.sub("0","27"[len(m[0])%2],m[0]),str(g)))` replaces runs of zeros between `1`s with `2` if the run length is even or `7` if odd. - Technique 1: `1(, 0)+, 1` matches exactly a `1` followed by one or more `", 0"` pairs, ending with `", 1"` (i.e., a horizontally bracketed zero-run). - Technique 2: `(?<!1, )` is a negative lookbehind that avoids starting the match immediately after another `"1, "`. This curbs overlapping starts on adjacent `... 1, 1 ...` and keeps the engine marching cleanly across the row. - Technique 3: The `lambda m: ...` replacement function computes the run length as `len(m[0])` (the full matched substring) and select between `"2"` and `"7"` for the replacement character. * **Regex for multiple lines:** `"m=re.findall(r"[^0], 2.{28}[^0]")[0]"` matches 2x2 grid with a top-right corner of color `2` (You need to extract the prefix and suffix separately to get the first row and second row). - Technique 1: `[^0]` matches any non-zero digit (the top edge pixel) and `2` matches the corner pixel itself. - Technique 2: `.{28}` matches any 28 characters (skip to the next row). This assumes a fixed grid width of 9, where each row contributes 9 characters with 2 for delimiters and "], [" between rows. 28 comes from 4 ("], [") + 8 x 3 (digits + ", "). * **Lookahead/lookbehind:** `(?=...)` and `(?<=...)` are zero-width assertions that check for a pattern ahead or behind the current position without consuming characters. Similarly, `(?<!...)` and `(?!...)` assert that a pattern does not appear before or after. * **Regex Conditional:** `(?(id/name)yes|no)` if group `id` or `name` matched, use `yes-pattern`, else `no-pattern`. * **Non-greedy Matching:** `*?`, `+?`, `??` and `x{m,n}?` match as few characters as possible, useful for extracting minimal substrings. * **Possessive Matching:** `*+`, `++`, `?+` and `x{m,n}+` prevent backtracking, which can be useful for performance in complex patterns. ### **7. Advanced Golfing Techniques and Idioms** To achieve the lowest character count, several advanced techniques are employed. * **Loop Flattening:** Instead of nested loops, use a single loop over a flattened range: `for i in range(h*w): y,x = i//w, i%w`. This saves a level of indentation. * **Walrus Operator (`:=`):** Use assignment expressions to reuse intermediate results within list comprehensions or other expressions. * **Boolean Arithmetic:** For integer or boolean values, `a*b` can replace `a and b`, and `a|b` can replace `a or b`. * **Ternary Operator Alternative:** The expression `[value_if_false, value_if_true][condition]` can be shorter than a full `if/else` statement. * **Assignment Tricks:** Use multiple assignment (`a=b=c=0`) and star assignment for list creation (`*A, = range(10)`). * **Bitwise Shortcuts:** `-~n` for `n+1` and `~-n` for `n-1` can save parentheses due to operator precedence. ### **8. Common ARC Patterns and Algorithms** * **Finite Domain:** The variety of inputs is limited, both by the number of test cases and the format of the inputs themselves. There are only 10 colors from 0-9, with black (0) often being used as a "null" colour. Input/output matrix sizes can often be static, or square, or limited in some other fashion. Some tasks may be less complex than initially presented, or may even have shortcuts. Sets are technically deterministically ordered for integers. * **Object Segmentation:** First, group coordinates by color using `defaultdict(list)`. Then, for each color group, run a connected components algorithm (like BFS with `collections.deque` or a recursive DFS) to identify distinct objects. * **Pattern Tiling:** Extract a repeating pattern (often using bounding box detection) and use the modulo operator to fill the output grid: `output[y][x] = pattern[y % pattern_h][x % pattern_w]`. * **Flood Fill:** A compact recursive implementation is often the shortest for code golf purposes. ```python # g: grid, x,y: start, c: target color, r: replacement def flood(g,x,y,c,r): if 0<=y<len(g) and 0<=x<len(g[0]) and g[y][x]==c and c!=r: g[y][x]=r for i,j in [(0,1),(0,-1),(1,0),(-1,0)]: flood(g,x+i,y+j,c,r) ``` ໿จࣈʢLUPLFOTʣ
  4. ࢼͨ͜͠ͱʢ݄ޙ൒ʣ ⚫︎ ৳ͼ೰ΉλεΫΛத৺ʹ"*ΤʔδΣϯτΛͻͨ͢Βճ͢ ⚪︎ $POUFYUΛվྑ ◼︎ "3$(&/͚ͩͰྑͦ͞͏ͩͬͨͷͰ-"3$ͱೖग़ྗྫΛ࡟আ ◼︎ -"3$ͷຒΊࠐΈಛ௃ྔͰࢉग़ͨ͠ྨࣅλεΫͷղ౴ίʔυ ◼︎

    ਖ਼نදݱ΍࠶ؼؔ਺Λ࢖͏Α͏໌ࣔ͢Δϓϩϯϓτ ⚪︎ 'SPNTDSBUDIͰॳظղ͕ͳ͍ঢ়ଶ͔Β$PEF(PMGͤ͞Δ ⚫︎ ਓྗͰؤுΔ ⚪︎ ಛʹਖ਼نදݱΛ࢖͏෦෼Λਓ͕ؒ୳ࡧ ⚪︎ ͍ͭͰʹ7JTVBMJ[FSΛ੔උͨ͠ ⚫︎ ม਺໊ΛϥϯμϜ୳ࡧͯ͠ѹॖޮ཰޲্ ⚪︎ λεΫ͕ѹॖίʔυ͕ͩͬͨɺ਺ඦ#ZUFT͘Β͍࡟Εͨ
  5. ײ૝ͱ·ͱΊ ⚫︎ "*ΤʔδΣϯτͷίϯςΩετ؅ཧ͸݁ߏେࣄ ⚪︎ ίʔυΰϧϑͷςΫχοΫΛೖΕΔͱ࢖ͬͯ͘ΕΔ ⚪︎ ˢҰํͰίϯςΩετ͕௕͗ͯ͢΋׬શʹ͏·͘͸ैͬͯ͘Εͳ͍ ⚪︎ ୳ࡧͷํ޲ੑΛίϯςΩετͰࢦఆͯ͋͛͠Δͱ͍͍͔΋ ⚪︎

    ݱঢ়ͷղΛͻͨ͢Βݟଓ͚ΔΑΓ·ͬ͞Βͳঢ়ଶ͔Β࢝ΊΔͷ΋ΞϦ ⚫︎ ʢࠓͷͱ͜Ζʣ୯७ͳQBZUPXJOͰ͸ͳͦ͞͏ ⚪︎ ίϯςΩετ΍ਓྗͷ٧ΊΈ͍ͨͳ෦෼ͷ޻෉΋େࣄ ⚪︎ ͱ͸͍͑--.ͷ"1*Λݺ΂͹ݺͿ΄Ͳੑೳ͸৳ͼ͍ͯ͘ͷ΋ࣄ࣮ ⚫︎ ٱʑʹ,BHHMFίϯϖʹࢀՃͰָ͖͔ͯͬͨ͠ ⚪︎ "*ͷ͓͔͛Ͱਓྗͷίετ͕ݮͬͯຊۀ͠ͳ͕ΒࢀՃ͠΍͘͢ͳͬͨ ⚪︎ νʔϜͰΏΔ͘ϫΠϫΠ͢Δͷ΋ָ͍͠