Wiktionary Lexeme Bot

radman.siddiki shipped Wiktionary Lexeme Bot

4 months ago

Shipped this project!

Hours: 5.85

Cookies: 🍪 78

Multiplier: 13.27 cookies/hr

Members of the Wikimedia community have been working on a project called Abstract Wikipedia (the name will soon be replaced by a better one but we do not know what it will be yet; see https://meta.wikimedia.org/wiki/Abstract_Wikipedia for details of the project). As part of the efforts, they have been adding lexicographical data to Wikidata lexemes, which involves adding senses, examples and other data about different lexicographical entities to them. The Bengali Wiktionary project currently only has a little more than 107k entries, and could greatly benefit from the work being done on Wikidata. The Bengali Wiktionary community started using a MediaWiki template and a Lua module to automatically render information about different words based on the data stored in Wikidata lexemes last year. This project aimed to take that a step further by automating the process so editors can focus on editing Wikidata and their efforts do not have to be duplicated. The bot checks the list of Wikidata lexemes to search for ones where there are senses in the bot operator’s target language and compares it against the total list of Wiktionary entries on the target Wiktionary to check which ones are missing. Then, it uses the template and module that editors have been using to automatically make new entries on Wiktionary. This turned out to be a little harder than I expected it to be as I quickly realized that many lemmas would be found across different lexemes and languages so handling them took some extra effort. That being said, I hope this helps people looking for a truly free Bengali dictionary, and in the future can be deployed to other Wiktionaries as well.

radman.siddiki worked on Wiktionary Lexeme Bot

4 months ago

1h 46m logged

It’s finally done! I just made some new entries on Bengali Wiktionary using the bot! Check out https://bn.wiktionary.org/wiki/হুড়ুম্বি, https://bn.wiktionary.org/wiki/হিমধামা etc. The full list of edits the bot has made so far (so the editors of Bengali Wiktionary can judge whether we want it to edit regularly) can be found here: https://bn.wiktionary.org/wiki/বিশেষ:অবদান/LexemeBot
The source code of the bot: https://gitlab.wikimedia.org/toolforge-repos/wiktlexbot (licensed under GPLv3)

0

Log in to leave a comment

radman.siddiki worked on Wiktionary Lexeme Bot

4 months ago

1h 40m logged

I have tweaked the schema again, to make it easier to handle cases where a lemma appears in lexemes of multiple different languages. This makes the job of showing section headings with the language’s name on Wiktionary much easier as I have updated the Lua module used on the wiki to render these to automatically add the language’s name but that also means that putting the template in the incorrect order could lead to more than one section headings bearing the same language’s name or that lemmas in the same language would not be nested together.
The new schema for lemmas appearing in more than one language:
{
’language”: “lemma”,
“ids”: [
“Lx”,
“Ly”
]
}
The schema for lemmas appearing in one language only remains unchanged.

0

Log in to leave a comment

radman.siddiki worked on Wiktionary Lexeme Bot

4 months ago

1h 8m logged

The lexeme ID-extracting script is now working! It extracts IDs from Wikidata’s GZIP-compressed JSON lexeme dumps and outputs them in a file with this schema:
[
{
“lemma”: “lemma”,
“ids”: [
“Lx”,
“Ly”
]
}
]

0

Log in to leave a comment

radman.siddiki worked on Wiktionary Lexeme Bot

4 months ago

0h 42m logged

I realized that using the Wikidata API to check which lexemes had senses for a specific language would be really inefficient so I have instead decided to use Wikidata’s JSON dumps for this. I have now added a Python script for streaming those dumps without decompressing them (because they can get very large), checking which ones have the senses and logging their IDs.

0

Log in to leave a comment

radman.siddiki worked on Wiktionary Lexeme Bot

4 months ago

0h 34m logged

So far, I have tested out the WikibaseIntegrator Python library (https://github.com/LeMyst/WikibaseIntegrator/) to figure out how to get lemmas and senses using it.

0

Log in to leave a comment

0 Followers

Shipped this project!