Activity

keschler

Turning Detection Into Game State

Card Metadata

I added a CARD_METADATA table covering all the Clash Royale cards, giving each one properties like elixir cost, placement type, air/ground flags, damage, and hit speed.

The bot needs to understand cards, not just recognize their names. I also added deploy_anywhere_cards.py to separate cards with unrestricted placement from those with normal rules — which came in handy later when building action masks.

Overtime and Time Handling

Normal time and overtime behave differently, especially for elixir generation, so I updated time extraction to track them separately. The code now outputs the visible timer text, an overtime flag, and total remaining seconds. This also lets the enemy elixir estimator use the correct regeneration rate depending on whether we’re in single, double, or triple elixir.

Action Space

The arena is represented as an 18 × 32 grid based on the KataCR board layout. Actions break down into: wait, play a card, or choose a grid cell. The action space includes masks for walkable ground, the river, bridges, tower areas, own-side deployment, expanded deployment when princess towers fall, building footprints, and spell-specific restrictions.

Having legal action masks is very important for reinforcement learning, because the model gets handed only valid moves instead of having to learn that placing a troop in the river is illegal.

Card-Specific Deploy Masks

get_card_deploy_mask takes a card, looks it up in CARD_METADATA, and returns the right placement mask for that specific card right now. Troops use own-ground rules, buildings use footprint-aware rules, spells have their own masks, and so on. The bot can now answer “where can I actually play this card?” correctly.

Board Features

Static channels capture things that don’t change: walkable ground, the river, bridges, tower sites for both sides. I also generated debug SVGs for these, which can be seen in the Attachment. The board geometry is easy to get subtly wrong, and being able to visualize the masks saves a lot of debugging time.

Dynamic channels capture what’s happening right now: ally/enemy ground and air presence, HP mass, threat mass, and alive tower masks. Detected units are mapped from screen coordinates into the grid and spread over a 3 × 3 area rather than a single cell, which gives the model a smoother signal. Threat mass uses damage and hit speed from the card metadata to roughly estimate how dangerous a unit is.

Global Feature Vector

Not everything fits in a spatial channel, so there’s also a global feature vector covering: elixir for both sides, remaining time, the overtime flag, tower states and HP, and one-hot encodings of the cards in hand, the next card, and enemy cards seen so far.

Game State and Runtime Cleanup

I added proper dataclasses for game state: raw detections, troop-to-health-bar matches, tower state, HUD state, the full GameState, and actions to avoid passing loose dictionaries around.

Enemy Card Tracking

The YOLO runtime now keeps ByteTrack IDs, so the bot can follow a detected unit across frames instead of seeing it as new every time.

EnemyCardTracker waits for several confident frames before confirming a detection, maps detected units to card names, records inferred enemy plays, tracks which cards have been seen, subtracts elixir when a card is played, and regenerates the estimate over time. It also cleans up stale tracks once units disappear.

Where Things Stand

Screen → HUD extraction → YOLO detections → HP estimation → GameState → static/dynamic board features → global feature vector.

TODO for next-time:

  • Integrate the estimated enemy elixir and the seen_enemy_cards into the main loop.
  • Find out what deck the bot should start with (for the beginning)
  • Find out what type of reinforcement learning to use

The next devlog will be empty, due to a hackatime issue

Attachment
Attachment
Attachment
Attachment
0
keschler
  • I connected the YOLO detector output to the main runtime.
  • For troop HP, I find the closest health bar near each troop and estimate the real HP as bar_fill × max_hp.
  • To do that, I first isolate the bar with a color mask, then measure how much of the bar is actually filled.
  • If no bar is found, I just use the troop’s max HP.
  • I also cached the elixir digit templates instead of rebuilding them every frame, which removed some unnecessary overhead.
  • I changed the YOLO input pipeline to match the original KataCR setup more closely, which improved accuracy.
  • I also enabled GPU inference, which made the detector much faster.
  • One big bottleneck was still hand detection, which was taking about 0.5s per frame.
  • To improve that, I built a separate image recognition pipeline.
  • I recorded gameplay footage, ran my current detector on it to generate labels, and created two datasets: one for the next card and one for the current hand cards.
  • After cleaning the dataset, I trained MobileNetV3 models on both.
  • I also learned a lot more about image recognition models in general, especially how much model quality depends on the dataset.
  • Along the way I got a better understanding of common AI terms and training concepts like normalization, train loss, and validation quality.
  • The results are mixed but promising.
  • The next-card model already works much better, while the hand model still needs more work.
  • Some cards, troops, and spells still do not have enough images, so I need to record more footage and extract more frames.
  • Even so, this new model brought hand recognition down to about 80ms per frame, which is a big improvement over the old approach.
Attachment
Attachment
Attachment
Attachment
0
keschler
  • cropped the card slots and matched them against card templates to detect the actual card
  • cropped the elixir number, matched it against digit templates, and estimated fractional elixir from the next bar
  • cropped the timer and matched the digits and colon to read the match time
  • cropped each tower HP area and matched the digits to read tower health values
  • reviewed source datasets and prepared a merged seed dataset
  • set up the repo structure, training environment, and detector scripts/configs
  • trained two detector baselines and added resume support for interrupted runs
    - enabled combined inference and class filtering across both detectors
    - dataset is not very good -> added pre-annotation export to improve dataset
Attachment
Attachment
0
keschler

I’m working on my first project! This is so exciting. I can’t wait to share more updates as I build.

Attachment
0