Koequet: Dataset Formatter banner

Koequet: Dataset Formatter

12 devlogs
10h 45m 36s

Initially made for the Honkai Star Rail datasets produced by Ai Hobbyist, this abomination of a program should help organise datasets and also process poor quality files!
Currently compatible with Parquet exporting (best HF format :D) and AI Hobb…

Initially made for the Honkai Star Rail datasets produced by Ai Hobbyist, this abomination of a program should help organise datasets and also process poor quality files!
Currently compatible with Parquet exporting (best HF format :D) and AI Hobbyist’s random ahh format that they use (wav and matching .lab (just a .txt) in a flat directory).
For a sample dataset produced from this program, see https://huggingface.co/datasets/CattoYT/honkers-railway!

Demo Repository

Loading README...

Kaeniya

bugfixes across the board because I was asked to reship this project, also fixed the initial session creation and maybe fixed non windows installs?

Anyways this is hopefully my last devlog for this project

Attachment
0
Kaeniya

Shipped this project!

Hours: 0.46
Cookies: 🍪 11
Multiplier: 23.31 cookies/hr

I hate python I hate ms build tools but hey now the packages should distribute correctly

Kaeniya

Corrected a dependency issue and made readme look better, thanks reviewer XD
i hate python sometimes (always)

Attachment
0
Kaeniya

After crying for whatever duration I did, the first release of now dubbed koequet is ready :D
PLEASE USE UV AND NOT THE STANDALONE EXE. ITS 1.6 GIGS 😭

Anyways the last 43 minutes was spent building that exe, few bugfixes, readme, license, and just general prep for first release and also the screenshot below of my shitcode attempt to check sorting, but then realising later on that it will check it anyways XDDDD
either way hey the new banner looks good right? M0nkrous kinda based
anyways first ship lets go!
next ship will be with a gui i hope

Attachment
Attachment
0
Kaeniya

Finally dealt with bad files, hopefully next time i make a devlog it won’t be garbage :D
Anyways asset was made, now need to find a way to export this.
All main functionality is there now at least

Attachment
0
Kaeniya

Very sleepy, been waiting in the queue to interview for RED so not much progress done. Also kinda done with this project and python as a whole. i miss rust XD
Either way, patch notes:

  • properly implemented some kind of audio processing for files marked as bad quality
  • added another setting
  • fixed session and settings saving
    Gonna ship after I make assets :D
Attachment
0
Kaeniya

Massive refactor across the board with this one
Main changes were:

  • fixing session to make it a little more consistent throughout the project
  • fixed the exporter code so now it exports a parquet file correctly
  • improved the session creation ui
  • removed the accidental dox from the parquet file by switching to relative dirs
  • general minifixes

but now the project is in a functional state and can do its original job as intended :D
likely will ship here and continue adding features
Example of a Huggingface dataset made from this project: https://huggingface.co/datasets/CattoYT/honkers-railway (hoyoverse no bully me pls)

Idk what to put for the attachment so im just going to put the raw views of the data there from https://parquet-viewer.com lol (the audio is there it just doesnt display on this website)

Attachment
0
Kaeniya

whole ton of refactoring across the board

  • decided to just use deecho+dereverb for the uvr model
  • improved compatibility across os’s (probably idk its untested)
  • refactored ManualSorter so that the:
  • u can now edit the transcripts using ur default editor
  • begun work on the exporter and unchatgpting legacy code
  • switched from using keyboard to using some weird ass os specific shit so u actually ahve to be tabbed in (epic idiocicy from me)

idk what to screenshot for the attachment icl
enjoy the very wip exporter code

Attachment
0
Kaeniya

oh hey that looks a lot better
added:

  • percentage completion
  • transcription in the bottom
  • will add capability to edit the transcription
    Maybe il get round to writing a gui soon, probably will ship first then add the gui later
    maybe using gpui-components would be cool since rust is epic and zed is also epic
Attachment
0
Kaeniya

So i went a little schizo over UVR
tried all of their de-echo and de-reverb models to see if it would work on the type of voice processing that HSR does on some of the voice lines, turns out it was not good at all
rip 43 minutes of my life ig
Gonna try rnnoise next see yall later

Attachment
1

Comments

Kaeniya
Kaeniya 3 months ago

rnn noise did nothing btw :D

Kaeniya

Finally made the tui a little nicer, i really wanna try making a gui but im terrified cuz i cant ui for shit
either way also started work on getting uvr going with this so it actually has a purpose

Attachment
0
Kaeniya

haven’t written python in ages, but i got the basis going for what i need to do
currently trying to make the ui a little more tolerable, but as of now the functionality is ‘there’.
After I fix the ‘tui’ (its so bad) il make it a little more general so you can just go through wav files and manually transcribe or pass through parakeet/whisper, or just do more with it
idk i should be preparing for school but hey who doesnt want a better way to sort audio datasets lmao

Attachment
1

Comments

Kaeniya
Kaeniya 3 months ago

WORST SS EVER LMAO

Kaeniya

I’m working on my first project! This is so exciting. I can’t wait to share more updates as I build.

Attachment
0