Set the whole thing up for shipping by creating a showcase website with mlflow and sqlite on Director4.
I also did some more processing with my 9th training series, and I started work on checkpointing the models so I can test them on a second test range before deploying in real life.
Log in to leave a comment
I migrated all the training data from the 9th run series over into the postgresql database on the central workstation I am using. There are some interesting things that I noticed in the results - for example, training with trading slippage does like so bad that it gets worse from the first epoch performance, which is very odd.
Log in to leave a comment
I didn’t really get around to logging this devlog for a bit because I was so busy with other work…
In this one I reworked my stock asset list to use lower-absolute-price stocks because I want to deploy it soon, and I can only buy integer shares of stock. I also added a training and testing pipeline to mimic this slippage caused by not being able to get 10-decimal-place exact allocations of stocks.
I ran some training runs using this new data with variations based on whether slippage is used for training.
Log in to leave a comment
I completed and transferred over data from my 8th training run where I was trying out different convolutional dilations. The image shows the mlflow graphs of the results and some of my conclusions. I also switched the database backend to use postgresql instead of sqlite (this took a really long time due to like linux directory permissions and stuff) because sqlite was running really slow for all my data.
Log in to leave a comment
I analyzed the results from my 6th large training run series and setup the parameters for the 7th.
I also wrote some code to allow for easier sequential training runs of so-called “sweeps” - so I could send one job that ran multiple sweeps for organizational purposes rather than having to manually start them after previous ones completed. At the end, I started the 7th training run series too (image is a gpu status from terminal).
Log in to leave a comment
Switched to using mlflow for enhanced logging and visualization. Also started using using torch.compile and tensor-float 32 to make training significantly faster. Sorting out torch compile Dynamo speculation divergence issues took a while.
Log in to leave a comment
Finished gigantic training run totaling 256 different models, and compiled/uploaded data from that.
Also fixed by source code to retry after hitting gpu out of memory errors.
Log in to leave a comment
New version of model with enhanced feature set is being gpu-trained right now.
Will take around 2 days to complete.
Log in to leave a comment
I’m working on my first project! This is so exciting. I can’t wait to share more updates as I build.
Log in to leave a comment