Activity

radman.siddiki

Shipped this project!

Hours: 2.1
Cookies: 🍪 21
Multiplier: 10.16 cookies/hr

Well, I guess it’s done then! This has been a nice learning experience, frankly. I never knew two different browser scripts could exchange messages between each other. It was also easy enough, TBH because I had previously made a browser extension for Firefox that blocked accidental inputs of . on GitHub repository pages so GitHub Codespaces would not be opened accidentally. I think the hardest part was deciding how the extension would function to best help editors without harming readers (by overpowering editors).

radman.siddiki

Users now get to save their own customized edit summaries as the default. Also, entry links now open in a new tab or window (as configured by the user at the browser level).

Attachment
0
radman.siddiki

Arabic lemmas are now being normalized correctly! I have also made it so that editors can choose to read the full wikitext of the existing entry to make sure their edits do not leave readers with less information.

Attachment
1

Comments

bhelavevicky66
bhelavevicky66 3 days ago

nice

radman.siddiki

The extension is working! It even allows the user to choose which part of the existing Wiktionary entry they want to replace! I am working on adding support for normalizing lemmas in Arabic because, for example, سَمِير in a Wikidata lexeme and سمیر in a Wiktionary entry are talking about the same thing.

Attachment
Attachment
0
radman.siddiki

Shipped this project!

Hours: 5.85
Cookies: 🍪 78
Multiplier: 13.27 cookies/hr

Members of the Wikimedia community have been working on a project called Abstract Wikipedia (the name will soon be replaced by a better one but we do not know what it will be yet; see https://meta.wikimedia.org/wiki/Abstract_Wikipedia for details of the project). As part of the efforts, they have been adding lexicographical data to Wikidata lexemes, which involves adding senses, examples and other data about different lexicographical entities to them. The Bengali Wiktionary project currently only has a little more than 107k entries, and could greatly benefit from the work being done on Wikidata. The Bengali Wiktionary community started using a MediaWiki template and a Lua module to automatically render information about different words based on the data stored in Wikidata lexemes last year. This project aimed to take that a step further by automating the process so editors can focus on editing Wikidata and their efforts do not have to be duplicated. The bot checks the list of Wikidata lexemes to search for ones where there are senses in the bot operator’s target language and compares it against the total list of Wiktionary entries on the target Wiktionary to check which ones are missing. Then, it uses the template and module that editors have been using to automatically make new entries on Wiktionary. This turned out to be a little harder than I expected it to be as I quickly realized that many lemmas would be found across different lexemes and languages so handling them took some extra effort. That being said, I hope this helps people looking for a truly free Bengali dictionary, and in the future can be deployed to other Wiktionaries as well.

radman.siddiki

It’s finally done! I just made some new entries on Bengali Wiktionary using the bot! Check out https://bn.wiktionary.org/wiki/হুড়ুম্বি, https://bn.wiktionary.org/wiki/হিমধামা etc. The full list of edits the bot has made so far (so the editors of Bengali Wiktionary can judge whether we want it to edit regularly) can be found here: https://bn.wiktionary.org/wiki/বিশেষ:অবদান/LexemeBot
The source code of the bot: https://gitlab.wikimedia.org/toolforge-repos/wiktlexbot (licensed under GPLv3)

Attachment
0
radman.siddiki

I have tweaked the schema again, to make it easier to handle cases where a lemma appears in lexemes of multiple different languages. This makes the job of showing section headings with the language’s name on Wiktionary much easier as I have updated the Lua module used on the wiki to render these to automatically add the language’s name but that also means that putting the template in the incorrect order could lead to more than one section headings bearing the same language’s name or that lemmas in the same language would not be nested together.
The new schema for lemmas appearing in more than one language:
{
’language”: “lemma”,
“ids”: [
“Lx”,
“Ly”
]
}
The schema for lemmas appearing in one language only remains unchanged.

Attachment
Attachment
0
radman.siddiki

The lexeme ID-extracting script is now working! It extracts IDs from Wikidata’s GZIP-compressed JSON lexeme dumps and outputs them in a file with this schema:
[
{
“lemma”: “lemma”,
“ids”: [
“Lx”,
“Ly”
]
}
]

Attachment
0
radman.siddiki

I realized that using the Wikidata API to check which lexemes had senses for a specific language would be really inefficient so I have instead decided to use Wikidata’s JSON dumps for this. I have now added a Python script for streaming those dumps without decompressing them (because they can get very large), checking which ones have the senses and logging their IDs.

Attachment
0