A small web crawler named aranea (Latin for spider). https://www.bananas-playground.net/projekt/aranea/
Banana 7271145682 license change and new config value | há 1 mês atrás | |
---|---|---|
documentation | há 1 mês atrás | |
lib | há 1 mês atrás | |
storage | há 2 anos atrás | |
.gitignore | há 1 mês atrás | |
CHANGELOG | há 1 mês atrás | |
COPYING | há 1 mês atrás | |
LICENSE | há 1 mês atrás | |
README.md | há 1 mês atrás | |
VERSION | há 2 anos atrás | |
cleanup.pl | há 1 mês atrás | |
config.txt | há 1 mês atrás | |
fetch.pl | há 1 mês atrás | |
parse-results.pl | há 1 mês atrás | |
setup.sql | há 2 anos atrás |
https://://www.bananas-playground.net/projekt/aranea
A small web crawler named aranea (Latin for spider). The aim is to gather unique domains to show what is out there.
It starts with a given set of URL(s) and parses them for more URLs. Stores them and fetches them too. -> fetch.pl
Each URL result (Stored result from the call) will be parsed for other URLs to follow. -> parse-results.pl
After a run cleanup will gather all the unique Domains into a table. Removes URLs from the fetch table which are already enough. -> cleanup.pl