SpecScraper is a simple CLI Python tool, which scrapes the entire AQA Physics A Level specification from its website, and turns it into JSON format.
I used AI to create the Github Action file which created the executables
SpecScraper is a simple CLI Python tool, which scrapes the entire AQA Physics A Level specification from its website, and turns it into JSON format.
I used AI to create the Github Action file which created the executables
I realised that on the website the text doesn’t look so good, so I decided to scrape the HTML instead of turning it into text first.
This worked greatly as now it looks much better on the website now.
Log in to leave a comment
This is the first step to creating a todolist from my A Level spec, and I’m really happy I’ve finished this part.
Now using the JSON I’ve created, I’m going to turn it into a todolist app, but specifically towards the subjects I’m doing.
h4 tags as well as h3 tags)json to a json fileAlthough this parser was made specific to physics AQA, I tried it on Psychology, and it didn’t work because it doesn’t use tables.
Then I tried it on CS (which does use tables), but it uses multiple rows, so when I ran the code it only worked for the first rows of each table.
I need it to work for any AQA specification which uses tables (even if they are multi rows).
Instead of
subtopic: {"content": "", "opp": ""}
I should do
subtopic: [{"content": "", "opp": ""}, {"content": "", "opp": ""}, {"content": "", "opp": ""}]
(so an array of each content and opportunity row)
This was a really good hour, and I got a lot done.
Log in to leave a comment
I’ve started off with trying to scrape AQA A level Physics (hoping this will work for all AQA specs once this fully works).
So far, I’ve implemented scraping the subtopics from a topic page, which gets the contents + opportunities of each topic, when given the URL. It turns this into JSON and outputs this.
The next step is to get all the topic URLs from the page and scrape all of them.
Log in to leave a comment