I made the scraper fully work for AQA Physics A level
- I added code to get all the topics from a subject content page using the a tags and href
- Then I scraped each topic page individually using the topic scraping code I had before
- Then I realised some topics have subsubtopics instead of just subtopics, so I added code to account for those situations (looking for
h4 tags as well as h3 tags)
- I made it upload the
json to a json file
- I tested the code on AQA Physics A level and it fully worked!
Testing on another subject
Although this parser was made specific to physics AQA, I tried it on Psychology, and it didn’t work because it doesn’t use tables.
Then I tried it on CS (which does use tables), but it uses multiple rows, so when I ran the code it only worked for the first rows of each table.
Next steps
I need it to work for any AQA specification which uses tables (even if they are multi rows).
Instead of
subtopic: {"content": "", "opp": ""}
I should do
subtopic: [{"content": "", "opp": ""}, {"content": "", "opp": ""}, {"content": "", "opp": ""}]
(so an array of each content and opportunity row)
Summary
This was a really good hour, and I got a lot done.