Asseto Corsa Reinforcement Learning

Hridya (try stardance utils, channel for updates) sent their compliments to the chef of Asseto Corsa Reinforcement Learning

about 2 months ago

Tagged your project as well cooked!

🔥 Hridya marked your project as well cooked! As a prize for your nicely cooked project, look out for a bonus prize in the mail :)

ved patel shipped Asseto Corsa Reinforcement Learning

about 2 months ago

Shipped this project!

Hours: 57.28

Cookies: 🍪 1657

Multiplier: 28.93 cookies/hr

Techy stuff to summarize:

Algorithm: Soft Actor-Critic (then I gave up and did PID)
Framework: TorchRL + my custom implementations
Environment: Assetto Corsa (Monaco GP Circuit) + Gymnasium wrapper
Model Input: VAE-encoded frames + Telemetry
Model Output: Gas/Brake ([-1, 1] range: 1=full gas, -1=full brake), Steering ([-1, 1] range)

Infrastructure:
AC App (socket connections)
Docusaurus docs
CLI for training/loading/inference

ved patel worked on Asseto Corsa Reinforcement Learning

about 2 months ago

11h 0m logged

I said I’d stop with the long devlogs but I actually wasn’t planning on working on this project for very long because, you know, I thought I was done.

Now the documentation shows the working demo + the end-to-end installation guide. The core loop relies on PID tuning rather than what I originally envisioned, but it gets the job done. In the future, if I still work on this, I will move all the videos to a separate CDN because it’s definitely becoming clunky now (maybe 5+ mins of video on the site). Keeping things simple with PID has actually been the move.

I said I’d stop with the long devlogs but I actually wasn’t planning on working on this project for very long because, you know, I thought I was done.

Now the documentation shows the working demo + the end-to-end installation guide. The core loop relies on standard PID tuning rather than what I originally envisioned, but it gets the job done. In the future, if I still work on this, I will move all the videos to a separate CDN because it’s definitely becoming clunky now (maybe 5+ mins of video on the site). Keeping things simple with PID has actually been the move.

Finally, attached is a video of the FINAL DEMO.

Hopefully, and I said this would be the last devlog so many times, at this point I’m so burned out I just need to get this done.

0

Log in to leave a comment

ved patel worked on Asseto Corsa Reinforcement Learning

2 months ago

4h 56m logged

Now, I’m inclined to believe the AI wasn’t performing well because of the reward shaping. Looking back at the logs, it seems that when the AI tries to explore the track (going around, crashing into walls and stuff), it actually gets LESS reward than just doing nothing. So instead of maximizing exploration and absorbing some of the bad reward, it just learned a crappy policy and avoided risk altogether.

Even though SAC encourages exploration via entropy, it still assumes that exploratory actions aren’t consistently worse than doing nothing. In my case, they were. So mathematically, the optimal policy just collapsed to poop.

Maximum Entropy Reinforcement Learning (what SAC is based on):

$SAC Objective$

As a result, performance actually DEGRADED over training.

I’ve also migrated all the AI’s outputs to the range [-1, 1] (the bounds for Tanh, the final activation function in the model’s net).

I’ve shrunk the 3 outputs for gas, brake, and steer → 2

So now the outputs are: gas/brake (1 = full gas, -1 = full brake) and steer.

Reducing the action space should also make the Q-function less noisy, which hopefully prevents that kind of collapse again.

Attached, finally, is a video of the agent (barely trained ~10 minutes, 50k steps):

0

1

Log in to leave a comment

Comments

ved patel 2 months ago

Update: this didn’t help 😢

ved patel worked on Asseto Corsa Reinforcement Learning

3 months ago

11h 30m logged

uhhh… yeah. it happened again. I lost track of time 😞. I promise this devlog is the last time I have one with crazy time on it. I wanted to attach a video of the agent actually playing the game. I really, really, really wanted results this time. my bad!!!

However, I am seeing results again. The one MAJOR problem I had with this devlog wasn’t getting this dumb AI to learn, it was keeping my PC from exploding. During training, it would start at ~30 actions per second that the agent would take, and overnight, somehow that would decrease to ~2 actions per second! And Assetto Corsa would just be running at 2 FPS!!!!!! You do NOT want to know how much damn time i spent trying to find a memory leak that wasn’t there.

The solution, surprisingly, is to INCREASE GPU usage??? Yeah, I don’t really know how that works, but previously GPU utilization would steadily trickle down until it hit a plateau of ~20%. So what I did was, once the replay buffer was big enough, the GPU would keep training on that without waiting for new data from the CPU (again, multiprocessing junk holding up my project). There were also minor enhancements along the way, but I got my computer to stop dying every time I try to train.

This training session. I swear, I’m not ending this one early to fix a stupid bug.
THIS SESSION IS THE ONE!!!!!

(i have trained this thing for a total of 480 hours. pleaseeeeee ft can we have electricity grants 🥺)

0

Log in to leave a comment

ved patel worked on Asseto Corsa Reinforcement Learning

4 months ago

4h 38m logged

Ok, now it’s working. At least, somewhat.

Finally, I implemented Torch multiprocessing. Essentially, the “learning” is now async from “collecting.” The CPU is always running the model and playing actions on the AC environment, while the GPU is constantly updating the weights based on environment observations. This was SUCH A PAIN to set up!!!!! (Look at the diagram for a better visualization; in the diagram’s case, sampling would be done at the same time as training.)

TORCHRL NEEDS BETTER SUPPORT FOR MULTIPROCESSING!!!!

Anyway, rewards are going up. So, there’s that. Hopefully, this is the run. pleaspelesapselsaeplesapelsas

0

1

Log in to leave a comment

Comments

ved patel 4 months ago

UPDATE: rewards have reached 399! It’ll hopefully only be a few weeks until this gets up and running!

ved patel worked on Asseto Corsa Reinforcement Learning

4 months ago

9h 30m logged

Ok, ok, IK the 10h looks crazy.

I REALLY, REALLY wanted a functional AI for this devlog, but I was unable to make that happen 😔.

Right now, it looks like the fundamental training script (train_core.py) is broken? It worked for car-racing, but it seems for AC it stopped working (prob because of a wack previous commit).

The fundamental problem I’m having is that it’s just not learning. I tried SAC-BC, but it seems subtle differences between recording, training, and testing (SIM2REAL, but SIM2SIM in this case?) are enough for the model to freak out. Most of the time, it cannot turn in time, and it turns less than needed.

I decided pure BC will not happen, against my wishes. Instead, I’ll be sacrificing my computer for ANOTHER 3 days (total compute is prob ~7 days by now. Does Flavortown give electricity grants?).

Anyway, when I tried with Monaco overnight, the rewards looked abysmal. Like it’s never been this bad before. It NEVER improved itself AT ALL and spiraled into a horrible, horrible policy. I think I fixed it. maybe.

Also, I got lazy trying to record ~30 laps of the track EACH TIME I WANT TO TRAIN BC. So I found a workaround using the Assetto Corsa AI to drive around for me. Essentially, I used the AC AI to drive around to record demonstrations, which I will train MY AI on. Wow.

The goal was to take this semi-garbage AI and train it online to make a good AI.

However, Monaco’s AI is kinda horrible and it kept driving into walls, so instead, I’m training on Brazil for now, cuz it’s an easier track for the model to learn AND the Brazil AI doesn’t drive into walls.

(I’ll upload the gameplay of this heinous, atrocious, awful, terrible, appalling, vile, detestable AI later, but it’s worse than u think)

PLEASE MAKE TS AI WORK!!!!!

I’VE NEVER TRIED THIS HARD TO MAKE SOMETHING WORK

PLEASEEEEEEEEEEEEEEEEEEEEEEEEEE

0

Log in to leave a comment

ved patel worked on Asseto Corsa Reinforcement Learning

5 months ago

4h 46m logged

Turns out what I said in my previous devlog was wrong. Assetto Corsa automatically maps controller inputs exponentially rather than linearly, which was causing some of my issues. Make sure your controller settings match mine to avoid the same problem (I’ll add another vid in the docs to fix this).

I’m now using SAC-BC with this actor objective: E[Q(s,a) + H(π(·|s))] + λ · E[log π(a|s)

Currently training it and the results look promising so far. If this works out, I’ll move on to porting from offline RL to online RL.

0

Log in to leave a comment

ved patel worked on Asseto Corsa Reinforcement Learning

5 months ago

3h 17m logged

We’re using CRSfD for RL with human demonstrations (mostly pre-learning from your laps). I’m still testing it; if it doesn’t work, I’ll probably try SAC(λ), but it’s not preferred since it’s more complex and I’d have to rewrite a lot.

I also updated the docs and made major changes to the CLI so it actually works this time. Previously, the scripts weren’t showing up on PyPI for some reason; that’s fixed now, so it should work.

(I’m not sure why the agent doesn’t turn on the video. It’s not an env->AC issue, the agent is just outputting extremely small steering values, and I’m not sure what’s going on.)

0

Log in to leave a comment

ved patel worked on Asseto Corsa Reinforcement Learning

5 months ago

1h 7m logged

I linked another repo to this project, the actual AC app, which communicates with my training script.

Essentially, it creates a socket between the training script and the game, allowing information to be passed back and forth. The app takes an input indicating whether to reset the environment and outputs data such as speed, velocity, position, tire temperatures, and current vehicle damage.

0

Log in to leave a comment

ved patel worked on Asseto Corsa Reinforcement Learning

5 months ago

4h 38m logged

I fixed several issues for shipping. These changes included adding a CLI for training, loading, and whatever the hell else someone might need, plus updating the documentation to match.

I also started training in AC. It’s taking a long time, so I’ll need to improve sample efficiency, likely by using RLHF.

0

Log in to leave a comment

ved patel worked on Asseto Corsa Reinforcement Learning

5 months ago

1h 50m logged

I created documentation using https://docusaurus.io/. The AC environment is almost complete; it’s now in the stress-testing phase until I’m confident it can run overnight (it will probably still break). Unfortunately, running multiple AC environments isn’t practical, AC doesn’t easily support multiple clients. The simplest workaround would be a virtual machine, but that requires too much compute, so we’re sticking with a single environment.

0

Log in to leave a comment

ved patel shipped Asseto Corsa Reinforcement Learning

5 months ago

Shipped this project!

Hours: 29.51

Cookies: 🍪 792

Multiplier: 26.85 cookies/hr

WE FINISHED STAGE 1/3 !!!

My goal is to create a Reinforcement learning agent to play Asseto Corsa and drive around the Monaco Circuit with a F1 car!

Stage 1 was to completely train an expert model on a simpler, 2d version of the game. And it’s a success.

After all the tuning, debugging, and testing in 2D, we’ve officially crossed the first milestone. Next up: the big, bad, horrible 3D version. Everything gets trickier, more dimensions, more physics, more possibilities.

ved patel worked on Asseto Corsa Reinforcement Learning

5 months ago

3h 15m logged

YAY! I finished car racing! Now I can start training with Assetto Corsa! The model works way better than expected. It’s a bit jerky right now, but that’s something I’ll fix later.

There was a big alpha issue that was stopping the model from learning anything useful, so I’m really glad I did this “test run” before moving to the 3D game.

I also saw a research paper on learning in 3D games where you first train the model to maximize pixel-wise change in a specific area, then train it on the actual reward. The researchers found this works much faster because the model already understands the 3D environment before reward training. I’ll most likely try this.

Right now, I’m working on building the Gymnasium environment for Assetto Corsa and figuring out the reward function, since this one has to be custom. Ideally, I’ll have training running on AC by Sunday, we’ll see!

1

0

Log in to leave a comment

ved patel worked on Asseto Corsa Reinforcement Learning

5 months ago

2h 30m logged

WOW! Great progress. The agent is LEARNING!

I used a pretrained VAE for the encoder, training it to compress and decompress the game frames. The most compressed layer is halved, and the encoder output is fed as input to the model.

Next, I’ll try a target VAE, because the current setup makes it static, which prevents the model from updating the encoder’s parameters. This will hopefully give performance of 800+ reward, but still, we have excellent progress with ~250 reward!

2

0

Log in to leave a comment

ved patel worked on Asseto Corsa Reinforcement Learning

6 months ago

3h 14m logged

Ok, so it didn’t work yet. The model isn’t improving as fast as I want it to, but there’s a lot of promise in the new runs. Rewards are now increasing consistently since I fixed those pesky bugs.

I also made some progress on the actual Assetto Corsa app, which is in a separate GitHub repo: https://github.com/ved-patel226/AssetoCorsaRL-APP.

Currently, the app opens a socket and allows the user to reset the car to the track. I still need to figure out how to run multiple instances and, if possible, speed up the game tick rate, ideally by 9×.

0

1

Log in to leave a comment

Comments

ved patel 6 months ago

Update: we just broke the policy break-even point for the first time (positive rewards!)

ved patel worked on Asseto Corsa Reinforcement Learning

6 months ago

3h 56m logged

I completely refactored the codebase to make it easier to port to Assetto Corsa when the time comes, and I also started training on what is hopefully the final CarRacing model. If this goes well, I can start working on Assetto Corsa.

While it is training, I’m starting work on the Assetto Corsa app that will provide telemetry for me and the AI, and allow the car to be reset.

Track the run in the new Weights & Biases report: https://api.wandb.ai/links/ved-patel226-/uh789qod

0

Log in to leave a comment

ved patel worked on Asseto Corsa Reinforcement Learning

6 months ago

3h 49m logged

Training started working! However rewards are going down😞.

I implemented noisy layers and fixed those nasty environment bugs.

Parallel environments work now!!

0

Log in to leave a comment

ved patel worked on Asseto Corsa Reinforcement Learning

6 months ago

12h 43m logged

Instead of trying to write my own RL algorithms from scratch (which wasn’t fun), I’m pivoting to using TorchRL. It still gives me a lot of flexibility with my environments and isn’t as constricting as something like Stable Baselines.

I’m still getting this nasty error, though, and SAC has little to no examples. So I’ll have to rawdog it and hope for the best.

We’re still not in the Assetto Corsa realm yet. I’ll train a good model on CarRacing first, which is a simpler 2D version of racing, then try to port it to the almighty.

Heres the progress on CarRacing so far:

0

1

Log in to leave a comment

Comments

ved patel 6 months ago

EDIT: working on parallel environments now! (this is the hard part)

0 Followers

Tagged your project as well cooked!

Shipped this project!

Comments

Comments

Shipped this project!

Comments

Comments