A small web crawler named aranea (Latin for spider). https://www.bananas-playground.net/projekt/aranea/

Banana 7271145682 license change and new config value il y a 1 mois
documentation 7271145682 license change and new config value il y a 1 mois
lib 7271145682 license change and new config value il y a 1 mois
storage 24fb355861 fetch.pl il y a 2 ans
.gitignore 7271145682 license change and new config value il y a 1 mois
CHANGELOG 7271145682 license change and new config value il y a 1 mois
COPYING 7271145682 license change and new config value il y a 1 mois
LICENSE 7271145682 license change and new config value il y a 1 mois
README.md 7271145682 license change and new config value il y a 1 mois
VERSION 17aef3b5ab cleanup of the code and some paperwork il y a 2 ans
cleanup.pl 7271145682 license change and new config value il y a 1 mois
config.txt 7271145682 license change and new config value il y a 1 mois
fetch.pl 7271145682 license change and new config value il y a 1 mois
parse-results.pl 7271145682 license change and new config value il y a 1 mois
setup.sql cfdca6000e project cleanup and updated project website links il y a 2 ans

README.md

aranea

https://://www.bananas-playground.net/projekt/aranea

A small web crawler named aranea (Latin for spider). The aim is to gather unique domains to show what is out there.

Fetch

It starts with a given set of URL(s) and parses them for more URLs. Stores them and fetches them too. -> fetch.pl

Parse

Each URL result (Stored result from the call) will be parsed for other URLs to follow. -> parse-results.pl

Cleanup

After a run cleanup will gather all the unique Domains into a table. Removes URLs from the fetch table which are already enough. -> cleanup.pl