Activity

aryan

Shipped this project!

Hours: 54.66
Cookies: 🍪 1114
Multiplier: 20.38 cookies/hr

This was a project that started out with a random curious idea about how akinator works, and I initially had zero idea how mine would work, but I’m personally proud of myself for pulling it off! there were some implementations I thought of which I don’t think could be found in any tutorial (like getAll(), getPopular(), getQuestion(), and the entire journey of debugging the question asking flow). This was the first time I built something completely on my own with zero tutorials and zero borrowed logic from other systems. There were a lot of moments where I was close to hardcoding some logic, but I never wanted to, and I didn’t (which I’m glad about). These implementations taught me a lot of things about how I can automate more dynamic things that can work on themselves (like deciding what question to ask without hardcoding anything) It was very rewarding personally, more than a thousand cookies (doesn’t mean I don’t want cookies!) :D

aryan

SHIP LOGGG (attaching a video, testing 3 people: Narendra Modi, SRK, Kanye West)
majorly 2 changes, but a ton of reiterations: SPARQL query and question logic.
SPARQL wikidata query:
new query with political party/employer/special fields and alive/dead field and new sitelink/follower filter logic was too heavy and timing out on WDQS.
tried to optimise it a lot over multiple days (to avoid traffic) with some help from AI, still timed out. almost gave up.
ran it on QLever (a faster wikidata endpoint) which worked first try!
didnt have some popular qualifying people (egs. narendra modi, alysa liu, srk), so I ordered it to put certain occupations/descriptions with >60 sitelinks first, then all rest (like actor, singer/rapper, PM/president, etc)
Question asking logic:
technically worked right away with some small edits, but asked too many unnecessary questions (like asking “19th CM of Karnataka”, “nth x minister of xyz”, 4-5 times after asking “prime minister of india”. or asking for British actor and more 5-6 occupations for shah rukh khan, after some very obvious questions). added earlyWinner which would declare a character as a winner if every description based question was answered about that character.
earlyWinner broke the game tho: Kanye and Future both are rappers by description. Kanye is also a producer (Future is too, but not by wikidata description), so when rapper was answered yes to, Future’s questions were answered and won, and Kanye wasn’t even asked about.
started asking for occupations after descriptions if people were still left, but would only ask unique occupations (if Kanye is a rapper and Drake is a rapper, dont ask for rappers). But this caused characters with no commons to get removed from the pool. bad idea :/
compromise was turning out to be very unreliable too. stray commas, dates, not catching nouns (rapper wasn’t included as a noun!), so I dropped it.
I realised the reason I was using descriptions in the first place. Titles. Narendra Modi is a politician by occupation, and so is a local MLA. But Modi is prime minister of India, which is a title, not occupation. so I made a hardcoded list of titles which would be asked in place of description. and if nobody has titles, it goes back to the original logic (getQuestion & getPopular)! this worked surprisingly well :D

0
aryan
  • had to edit the SPARQL query a lot to try and optimise it to work without timing out, but every time I’ve tried so far it’s timing out. will do it at ~9AM whichever day I can (school) since it always worked at 9. the new query has more complex filters, has political party/employer too which makes it computationally quite heavy.

  • worked on the HTML/CSS, drew the Japanese text and buttons in procreate :) and the buttons switch between shadow and no shadow to show tactical-ness

  • also the name is now Akipheus because orpheus

Attachment
Attachment
0
aryan
  • added complete logic for asking description based questions. it’s now a separate “case” where getQuestion() directly renders the question and yes/no buttons. and getDesc() directly gives an array of questions instead of properties/values.

  • getDesc returns an array of objects, where each object has the person’s name, an array of questions (questions generated by organizations and nouns in a person’s description, organisation questions are before noun questions), and a “yes” count set to 0 (will elaborate later in this)

  • the people in getDesc are ranked by a function called match() which returns a ratio of values in getObj over total no. of values for a person. this way, getDesc does the ranking of questions, and the game logic can be dumb in asking the questions.

  • the game logic asks questions one by one from getDesc, and if all the questions for a person is said “yes” to, that person wins. If a question is answered “no” to, it moves to the next person. If no person has all questions answered “yes” to, the person with most yes-es wins (yes count)

  • also, the old game logic would’ve failed entirely. I did not previously use asynchronous functions, and so the code wouldn’t wait for the user to answer anything. I added async functions and awaits everywhere a question is asked, and instead of adding new eventListeners everywhere, I added a function called waitForClick() which is called everywhere for yes/no buttons.

NOTE: I used AI for waitForClick() which is a very small function when I didnt understand how to use async. but I later understood how it works completely. Didn’t use AI anywhere else :)

  • changed cleaner.py logic to only add the field of a person, since it was messing up the occupations a lot.

will run everything tomorrow (since wikidata query works in the morning) and hopefully ship!

Attachment
Attachment
Attachment
0
aryan
  • created function getQuestion() which judges which question would filter more people (average of yes/no) to return a property worth asking

  • created function special() which deals with asking if a person has a political party/employer. this required separate logic as it decides if political party/employer should be asked about.

  • created function generate() which combines everything to run the game. it displays the questions, works with yes/no button logic, and adds to “obj” (the filtering list).

  • if less than 7 (avg of yes/no) people remain, it turns to descriptions instead of properties for questions. using compromise.js to get nouns (except those in occupations) and organisation (if any).

  • HTML and small fixes

thats pretty much the entire game. gonna test/edit some stuff, add some styling, and ship tomorrow or the day after

Attachment
Attachment
Attachment
0
aryan

finally did something apart from database yay!
(very very sorry about the 11h devlog, but I promise its worth it. some of the complicated stuff required multiple revisions which added up in the time)

changes:

  • created a function getAll() which takes an input like this: {citizenshipLabel : [“India”, “NOT United States”], occupationLabel : [“actor”, “NOT scientist”] …}, which makes it much easier for the game logic to add the yes/no answers. putting “NOT “ before a value excludes it. the function converts this to an SQL query and returns all people matching. this is crucial for the logic! It returns an array of arrays, where each inner array is one person with their properties one by one (initially expected the SQL to return an object, but it didn’t so took a lot of time to debug).

  • added a property called “special” which tells if someone belongs to a political party/has an employer; employer and political party properties also added. helps filter much better. also dead/alive as a property

  • added property P, which is a popularity score of a person. uses followers and a multiplied sitelink count, where the multiplier is influenced by “tiers” of followers. fixes problem where popular people with no follower count are considered unpopular.

  • P was required for a function called getPopular(), which gives the most “popular” values of a property. kinda complicated, here’s how it works if you’re interested:

  1. Creates an array of objects called raw, taking the property of each person and the person’s associated P
  2. Creates array “unique” with list of unique values from raw
  3. Creates array “aggregate” which takes each value from unique, adds the P of all occurrences of it from raw, creates objects with bayesian averages of each (a way to normalise data with entries with lots of unpopular occurrences/few ultra-popular occurrences of a value).
  4. Returns the most popular value of the specified property among the people returned by getAll(), and compares with the filter object (the input for getAll) to make sure a value isn’t repeated

Also edited the flatten function to give every property as an array making it simpler to program getPopular. Previously multiple-value properties (citizenship & occupation) would be plain strings separated by commas. Making everything an array made it more versatile

Attachment
Attachment
0
aryan

Edited cleaner.py to utilise “batching”, so it can process ~30 rows at a time instead of one by one, significantly increasing it’s speed

Attachment
Attachment
0
aryan

2 major things:

edited table: old table (~2k people) missed out on a lot of popular people for a few reasons:
Timothée Chalamet: no “occupation” mentioned, 2M followers, 84 sitelinks
Vivek Oberoi: no “followers” mentioned, 38 sitelinks
…and more for similar reasons.

to fix this, I added the following filters:
has 1M+ followers and 15+ sitelinks OR has 2M+ followers (and no site link requirement) OR has 35+ sitelinks (and no follower requirement). And occupations is optional

cleaner.py: a script that uses an on-device LLM to return occupations and field of works for people.
although wikidata provides occupations, they’re often very niche/irrelevant and a lot which can break the game (example: Fernando Alonso’s occupation is vegetarian, Narendra Modi is a bibliographer & writer; they’re technically correct, but a player may answer incorrectly and a large number of occupations per person can make it inaccurate).

also, running the LLM required a LOT of debugging. one of the largest problems was that it started spitting out nonsense after one line of json (attaching images of the LLM’s results, and of the bad text). The problem was fixed by giving a very structured prompt since it’s an instruct model.

Attachment
Attachment
Attachment
0
aryan

the table formed had 10k+ rows for about 1.5k people. this was because multiple occupations/citizenships would create new rows. wrote a script called “flatten.py” to organise it in a way that all occupations of a person are separated by commas.
this makes it easier to clean up

Attachment
0
aryan

to make the game, I needed organised data on people. I decided to use wikidata as it is extensive and has a lot of information on well classified entities that can be queried using code (SPARQL).
initially I thought it’d be simple as I can just send a SPARQL query to wikidata for every question in the game, but later realised that it’d be highly inefficient and unethical, as each question would sift through 120M entities, and multiple times per a single game. To tackle this, I decided to query once and get a small list of people I want.

I decided on only using humans (not fictional characters), but even humans have 13M entries. An akinator-clone game wouldn’t require very niche people, so I decided to add a few “requirements”. That includes: must have a citizenship, occupation, 25+ sitelinks, 1.5M+ social media followers. With this, I got a list of 10k people (which is actually a very small number).

Also, as I was querying this in the earlier stages, a limit of 5 people, hit a timeout (wikidata has a 60s). so I had to optimise it by removing labels (occupation wouldn’t be “singer”, but Q177220, which would’ve taken a lot of time converting all IDs to labels separately). Even this required a 20-30s for 10 people, and it would hit a timeout for the entire thing. However, when I tried to run it in the morning (3AM in UTC, where most researchers use wikidata), it returned all 10k people in 9s! that meant I could add back the labels and an optional “field of work”, and save a lot of time, increase accuracy

PS. ik 6h seems like a lot for just curating a database from a SPARQL query, but this was the first time I’ve even heard of it, the syntax and logic was completely new to me, and adding filters and optimisation required edits back and forth. also, running this experimentally was done in Jupyter notebook which was tracked by hackatime but not uploaded to GitHub. I promise to post devlogs more regularly now onwards :)

Attachment
0