Reddit Stories TikTok Generator banner

Reddit Stories TikTok Generator

11 devlogs
19h 39m 1s

This is a funny project made to help me and my friend generate funny TikTok videos from Reddit online stories, written in Python, using Kokoro-82M open-source model for text-to-speech and WhisperX Base model + Alignment model for subtitle generati…

This is a funny project made to help me and my friend generate funny TikTok videos from Reddit online stories, written in Python, using Kokoro-82M open-source model for text-to-speech and WhisperX Base model + Alignment model for subtitle generation and word-level timestamps.

Demo Repository

Loading README...

alexxino

I figured out the project wasn’t really fully ready for shipment, especially from the usability side. Therefore, I set up a Docker image (so I added a way to run the app from Docker, without needing to install all the required dependencies) and I quickly coded a Gradio server to display a demo of the project. I also uploaded the repository to Hugging Face and set up Hugging Face Spaces to have a hosted demo online for users to check it out quickly.

Attachment
0
alexxino

Seeing the results on TikTok, I made a few adjustments to all the parameters for the video generator and now it has become a full blown generator. I also made some structural changes to make the code more clean and readable and fixed a few issues regarding subtitles timing. Below a cropped example of the generator’s final result!

0
alexxino

I’m seeing the first results on TikTok and I’m making changes to the format of the video according to all the analytics. I set up a new funny intro with an explosion sound effect and edited how audios are handled. I also implemented a function to add fading to the start and ending segments of an audio using Pydub.

0
alexxino

I wrote a README and added a .env.example file for helping others to quickly set up everything.

Attachment
0
alexxino

I’ve been continuing to add more tweaks and quality-of-life features. For example, I added a way to build multiple videos with one call, or automatically fetch stories from Reddit just from the link. I also added support for some more abbreviations and replacements for curse words, so that the algorithm doesn’t punish the video and I don’t have to edit them manually.

Attachment
0
alexxino

I spent some more time adjusting the video’s parameters and I added a way to handle abbreviations (I had to work it out because of the TTS model). Now everything looks good and I already started publishing some stories on TikTok! If you want to see the results, check out https://www.tiktok.com/@massive_ideas

0
alexxino

I made some more changes to the way the different parts of the videos are displayed and added support for abbreviations.

0
alexxino

I added the RedditFrameImage class to scrape postfully.app tool (free, no login) and generate an initial fake reddit story image to display at the start of videos. I also enhanced audio generation by removing leading and trailing silence (to perfectly match all the timings) and I was able to sync every component. I spent some time adjusting all the required parameters and the styles of the video’s components and refactor some of the application’s structure.

0
alexxino

I did some research for figuring out a way to generate automatic subtitles and I found out OpenAI actually has an Open Source model for speech recognition! I also looked for specific forced alignment models (audio + transcript -> timestamps), but I stuck with OpenAI’s Whisper model (the base version, to keep it fairly lightweight) as I did not find any good alternatives and it actually performed pretty well. Also, I added WhisperX to get word-for-word timestamps (to keep the engagement as high as possible!)

0
alexxino

Now that I was able to generate CapCut drafts, I needed a way to turn some text (the stories) into spoken audio. I did some research (I looked for opensource, lightweight TTS models) and found the model Kokoro-82M, which satisfied all my needs and can be ran directly on my machine. So now I’m able to start from some text to generate the video’s audio and then create a draft on CapCut for a full video. I ultimately set up Kokoro TTS and adjusted some parameters for CapCut’s open API. Next step would be generate subtitles for the video, maybe through automatic speech recognition or forced alignment?

Attachment
Attachment
0
alexxino

I finally found a library to work with CapCut video editor programmatically, so I created a first part of the program to generate a draft, add the background video, a voice audio and some subtitles and export it to CapCut. I set up the foundation for creating the actual videos and got my first draft working!

Attachment
1

Comments

alarixfr
alarixfr 3 months ago

cool project