The first ALIA Top End Symposium: Powering our Territory
Invisible web
Julie Adams
What is the invisible web?
- = 'deep web' = 'hidden web'
- Invisible to search engines
- How search engines work
- *The world wide web (8 billion)*
- Size does matter*
Why
- Web is huge
- Constantly changing
- Cost
Need alternative strategies to search engines
- Think human *
- Use invisible web directories
- Collect URLs
- Use favourites
- Go to the experts/source
- Search for portals
Crawlers suck
Probability of a crawler locating a web page = 40 percent
- Need alternative strategies to search engines
Non-HTML file formats, originally designed for HTML text, .pdf, .doc, .jpg, .mp3
- Solution
Search for specific formats, use format specific search engines
- Dynamically generated pages
Spider traps, storage intensive, can't type
- Solution
Locate the source
- Password protected sites
Need passwords, can't type
- Solution
Locate the source, use libraries, register
- Large websites
Crawler does not go deep
- Solution
Use site specific search engines, search for 'database', hunt - not gather
|