SpecScraper

Vulcan worked on SpecScraper

4 months ago

0h 25m logged

I realised that on the website the text doesn’t look so good, so I decided to scrape the HTML instead of turning it into text first.

This worked greatly as now it looks much better on the website now.

0

Log in to leave a comment

Vulcan shipped SpecScraper

4 months ago

Shipped this project!

Hours: 1.67

Cookies: 🍪 9

Multiplier: 5.48 cookies/hr

This is the first step to creating a todolist from my A Level spec, and I’m really happy I’ve finished this part.

Now using the JSON I’ve created, I’m going to turn it into a todolist app, but specifically towards the subjects I’m doing.

Vulcan worked on SpecScraper

4 months ago

1h 6m logged

I made the scraper fully work for AQA Physics A level

I added code to get all the topics from a subject content page using the a tags and href
Then I scraped each topic page individually using the topic scraping code I had before
Then I realised some topics have subsubtopics instead of just subtopics, so I added code to account for those situations (looking for h4 tags as well as h3 tags)
I made it upload the json to a json file
I tested the code on AQA Physics A level and it fully worked!

Testing on another subject

Although this parser was made specific to physics AQA, I tried it on Psychology, and it didn’t work because it doesn’t use tables.
Then I tried it on CS (which does use tables), but it uses multiple rows, so when I ran the code it only worked for the first rows of each table.

Next steps

I need it to work for any AQA specification which uses tables (even if they are multi rows).
Instead of

subtopic: {"content": "", "opp": ""}

I should do

subtopic: [{"content": "", "opp": ""}, {"content": "", "opp": ""}, {"content": "", "opp": ""}]

(so an array of each content and opportunity row)

Summary

This was a really good hour, and I got a lot done.

3

0

Log in to leave a comment

Vulcan worked on SpecScraper

4 months ago

0h 34m logged

I’ve started off with trying to scrape AQA A level Physics (hoping this will work for all AQA specs once this fully works).

So far, I’ve implemented scraping the subtopics from a topic page, which gets the contents + opportunities of each topic, when given the URL. It turns this into JSON and outputs this.

The next step is to get all the topic URLs from the page and scrape all of them.

0

Log in to leave a comment

0 Followers

Shipped this project!

I made the scraper fully work for AQA Physics A level

Testing on another subject

Next steps

Summary