(This devlog is truncated, boring etc., read the full one in #h-channel)
Okay, this is going to be a long one, as I didn’t have flavourtown linked before now, and was making notes in slack but here we go!
I created a menu system first, but this was a cinch, so I won’t go into detail.
First, I had to figure out how to make requests to a website. I’d never done anything like that, so I heavily referred to Dotnetperls here (and have still throughout). It wasn’t too hard in the end(0). I then learnt about the try block, which can be used to catch exceptions(1).
The next day, I focused on learning xpath; I needed to learn how to traverse them to get titles of content from RSS feeds, which I managed to do after a little while.
My next challenge was how to parse article contents. RSS feeds don’t generally include an article.
I found a library for this: Smartreader. Now all I had to do was chop it(3)(4) down.
- Cycle through all characters in the string
- If it comes across <…
- If it comes across p…
- If it comes across >
- record characters until it hits in sequence, then save to an array and array.resize by the length of the array + 1 to make room for the next paragraph.
From there, I had a list of edge cases to finish the parser.
To fix my whitespace problem(7), I used the try block to try to get the length of the current paragraph when converted to a string, and if an error throws, then it would set the variable that dictates whether the current paragraph is written to false.
In(7) you can see information we don’t want, like how long ago the article was posted, the author etc. To filter this, I decided to check whether there are additional tags immediately after p specifically, by just detecting whether the next character was <. This also fixed my problem with captions appearing as text.