

3·
4 days agoPotentially that would be a good application of federation and distributed computing
An Internet archive like distributed tool, that then feeds into local tokenization and indexing.
Alternatively a centralized service that generates indices and then locally they are queried would save a lot of energy.
There are various levels of AI here
Storing embeddings/vectors in a search index can make your searches smarter and more relevant. The embeddings squeeze related concepts closer together than pure keyword approaches, which if done well increases retrieval quality.
RAG tools and AI searches are just a layer on top of your index. When done well these can be really useful in annotating your results and speeding up finding things.
That’s useful when you’re searching say an error message and the AI is able to iterate on keywords and skim a Guthub issue about it and skip to the resolution.
Similarly it’s good when you’re researching something but don’t have the exact words, AI search can iterate and capture your intent, then run several queries based on that.
I don’t find the hallucination problem significant in practice with a lot of AI search tools, but I have found AI is vulnerable to certain types of SEO spam that a human would never fall for.
As an example most companies have a “comparison to” or “alternatives to” blogpost. The AI does not critically look at the fact that a service is hosting a blogpost shilling their own product. So asking search AI for options is actually poor quality because it will return the shilled results that appear in search first.
AI also search adds an additional silent layer of filtering, which you need to be conscious of.