Personal search engine with YaCy

May 25, 2024 | Updated May 29, 2024

What is YaCy?

YaCy is a free, open source, peer-to-peer distributed search engine. It can also be used as a personal, private search engine with some configuration. You can choose what it will crawl and when.

Why use a personal search engine?

There are many benefits that come with a personal search engine powered by YaCy. First of all, you have complete control of your data. You can also choose what is being indexed. If your instance is secured, you’re the only one who can access it if you so choose.

A personal search engine can be an interesting alternative to your browser’s bookmarks or a bookmarking service. If you use it as an alternative for bookmarks, for example, you can search through indexed page contents. On the search page, you can see filters for domain, language, and more.

Prerequisites

Installation

The website for YaCy provides installation documentation.

I find that it’s easiest to run in Docker.

docker run -d --name yacy \
-p 8090:8090 -p 8443:8443 \
-v yacy_search_server_data:/opt/yacy_search_server/DATA \
--restart unless-stopped --log-opt max-size=200m --log-opt max-file=2 \ 
yacy/yacy_search_server:latest

You should now be able to open *:8090 and see your instance’s search page.

Configuration

If you want your instance to be private, update your configuration as follows:

  1. Go to [your yacy instance url]/ConfigAccounts_p.html and ensure that “Access only with qualified account” is selected. Set a new username and password for the adminisrator user.
  2. While still on ConfigAccounts_p.html, set “Protection of all pages” to “ON”.
  3. Go to /ConfigBasic.html and set use case to “Search portal for your own web pages”.
  4. Go to /ConfigNetwork_p.html and select “Robinson Mode” then “Private Peer”.

Crawling options

Here are just a few ways you can get pages to be crawled:

Bookmarklet

Add a bookmark with the following in the URL field:

javascript: (() => { window.open(`http://localhost:8090/Crawler_p.html?crawlingDomMaxPages=10000&range=wide&intention=&sitemapURL=&crawlingQ=on&crawlingMode=url&crawlingURL=${encodeURIComponent(window.location.href)}&crawlingFile=&mustnotmatch=&crawlingFile%24file=&crawlingstart=Neuen Crawl starten&mustmatch=.*&createBookmark=on&bookmarkFolder=/crawlStart&xsstopw=on&indexMedia=on&crawlingIfOlderUnit=hour&cachePolicy=iffresh&indexText=on&crawlingIfOlderCheck=on&bookmarkTitle=&crawlingDomFilterDepth=1&crawlingDomFilterCheck=on&crawlingIfOlderNumber=1&crawlingDepth=0`, "_blank"); })()

Make sure to replace “localhost” if necessary in the bookmarklet code.

Crawl all pages you visit

You can install a userscript in a userscript manager that will send the pages you visit to YaCy. The userscript can be found here.

Import bookmarks

I was unable to find a way to import my bookmarks (Netscape style bookmarks.html files) from Firefox and Linkding, so I made a script to do it. The YaCy bookmark importer script is available on GitHub.

References