Vrawler
Some helper functions that I use to scrape websites
Retrive by selectors
Used to retrive html elements using css selectors
- Function:
from_selector
- Arguments:
-
hdoc
: html document as a string (if parsed convert to str with parsed.str()
) -
selectors
: string of css selectors (Ex: #myid > p > a:nth-child(1)
)
-
Example
<html lang="en">
<head>
<title>Test file</title>
</head>
<body>
<div id="mydiv">
<a href="https://www.google.com/">Google</a>
<a href="https://vlang.io/"><span>V</span> lang</a>
</div>
</body>
</html>
import vrawler
fn main() {
// Suppose this index.html == above html
html_str := read_file('/home/scraped/mysite/index.html')
spn := vrawler.from_selector(html_str, '#mydiv > a:nth-child(2) > span')
println(spn)
}
// stdout
// [<span>V</span>]