What is IC and how does it work?
Information Crawler (IC) is a self-hosted platform for the automated monitoring of blogs and specialist publications. It is designed for people who want to stay up to date with news from specific fields – such as data protection, IT law, or cybersecurity – without having to visit dozens of websites every day.
IC collects new articles automatically, summarises them using AI, and sends them as a personalised email digest – tailored to the topics that matter to you.
Register sources
Known blogs and news sources are added to the system. IC automatically detects RSS feeds or scrapes pages directly.
Automatic crawling
Every four hours, IC checks whether new articles have been published. Each new post is downloaded, cleaned, and stored.
AI analysis
A local language model creates a short summary (max. 150 words) and assigns thematic tags – fully local, without any cloud services.
Subscriptions & matching
Users create subscriptions – for specific sources, tags, or free-text keywords. IC compares each new article with subscriptions using vector search.
Email digest
Matching articles are bundled into an HTML email and delivered at the chosen frequency (every 4 hours, daily, or weekly).
Self-hosted
All data stays on your own server. No tracking, no sharing with third parties.
Local AI
Summaries and embeddings are computed locally with Ollama – no requests to OpenAI or similar services.
Protected access
Registration is by invitation (voucher) only. All connections run over HTTPS.
Email only
IC sends digest emails exclusively – no newsletter, no advertising, no tracking pixel.
IC uses two AI models that run entirely locally – via Ollama, without any connection to external services. The two models serve distinct purposes.
Each new article is converted into a numeric vector with 768 dimensions – a kind of mathematical fingerprint of its content. The same happens with the keywords of each subscription when they are saved. Both vectors are stored in Oracle 23ai.
When the digest is sent, Oracle computes the cosine similarity between the article vector and the keyword vector of each subscription. Only articles whose semantic distance falls below a configured threshold (COSINE < 0.45) are considered a match – regardless of whether the exact terms appear in the text. This also finds thematically related articles that use different phrasing.
In parallel with embedding, a local language model (LLaMA 3.2, 3 billion parameters) generates a summary of up to 150 words and up to three thematic tags for each new article. The article text is truncated to 4,000 characters for this purpose, and the model responds in structured JSON format. If the analysis fails, the beginning of the original text is shown as a fallback.
All AI computations run on your own server – no article text, subscription keyword, or user profile ever leaves your infrastructure.
IC consists of several components: a Next.js frontend, a FastAPI backend for authentication, crawling and digest logic, and Oracle 23ai as a vector database for semantic search and all application data.