WikiAnalyser banner

WikiAnalyser

9 devlogs
38h 55m 24s

WikiAnalyser is a lightweight C program that allows you to easily and quickly gather data from every Wikipedia article, compile it, and turn it into interactive interesting, useful, or random graphs/spreadsheets. All in under 100kb (excluding the 110gb Wikipedia dump) (:

Repository

Loading README...

haletas88

I have been forgetting to make a devlog for this but i have spent some time fixing up a couple of thing. Some took way longer than they should’ve. Anyways, I also added the page Title as an automatic field and plan to add more metadata in the future. I am going to start working on some nice GUI which should be the final thing this project needs now that the core logic is done.

Photo showing getAutomaticField function, as I have to show something but the bug fixes are to small and far between to show even though they are the main focus

Attachment
0
haletas88

Article structs are now created when is reached and passed into OnArticle. Next I hope to be able to gather the meta-data at the start of every article and parse it in so OnArticle can use title, version, etc. Linux is now also supported [:

Images below show implementation and test xml file.

Attachment
Attachment
Attachment
0
haletas88

Add a function to copy fields between articles so that we can use a different article article for each article while keeping the same fields.

Image below showing implementation

Attachment
Attachment
0
haletas88

Made ParseArticles function to call the Zig OnArticle function for every character in an article and hopefully also switch between article structs.

Image below shows ParseArticles implementation as well as a simple OnArticle test for counting the amount of e’s in an article.

Attachment
Attachment
Attachment
Attachment
0
haletas88

Created a field function that can find a field given a name and type. This way accessing fields is far easier. Hopefully this is the majority of the Zig I need for now and I can move on to making the runner, analyzer, and GUI next.

Photos below showing Implementation of the Field function as well as a simple test with output.

Attachment
Attachment
Attachment
0
haletas88

Zig can now take in the C article struct as a parameter. This allows the current article to be edited from the zig code so all that’s left is providing the buffer as a parameter and creating a C function that calls the zig function on every character of every article.

Images below show zig implementation of the struct as well as a test where article information is printed from Zig

Attachment
Attachment
Attachment
Attachment
Attachment
0
haletas88

Choose Zig as my language of choice for users to write their ForEachArticle function. I have successfully linked Zig to my C project. Currently Zig is built with the C, but hopefully I will be able to build the Zig while the C is running and hotswap the .dll and .lib.

Images of code to run a little Zig print function in C

Attachment
Attachment
Attachment
Attachment
Attachment
Attachment
0
haletas88

Created a system that allows you to add the data you want saved from every article into an article struct this will later we hooked up to GUI as well.

Note: you will need to open the first image in a new tab to actually see the code or you could check the repo on GitHub

Attachment
Attachment
Attachment
Attachment
0
haletas88

Finish implementing Cleanup with ADD, DELETE, and CLEAR for unwanted characters, strings, and containers. Plan to add presets next before working on parser.

Images below show a test where cleanup removes garbage text from Hello World although in real files it won’t be as corrupted.

Attachment
Attachment
Attachment
0