do you know? There is a tool called Reader that can help you convert any URL into an input format more suitable for large language model (LLM) processing. It’s like putting a “smart coat” on web pages, making them easier to understand and use. Moreover, this service is completely free!

Today I will give you a detailed introduction on how to use this tool.

how to use

Using Reader is very simple, you just need to add a simple prefix to any URL https://r.jina.ai/. For example, if you want to https://en.wikipedia.org/wiki/Artificial_intelligenceconvert into an input more suitable for language model processing, you only need to access:

r.jina.ai/https://en.…

Moreover, Reader also has a real-time demo that you can experience for yourself:

Change log

Reader has recently added a new feature, which is support for image reading. It has the ability to add a caption to all images in a specified URL and, if the image is missing an alt tag, add it Image [idx]: as a replacement. In this way, the downstream language model can interact with the image during reasoning, summarization, etc. You can see an example here: Click to see example .

Installation guide

If you want to run this project yourself, you will need the following tools:

  • Node v18 (Note: Node version cannot exceed 18, otherwise the build may fail)
  • Firebase CLI ( npm install -g firebase-toolsinstalled via )

For the backend, you need to go into backend/functionsthe directory and install the npm dependencies:

git clone [email protected]:jina-ai/reader.git
cd backend/functions
npm install

Mode selection

Reader provides several different modes to suit different usage scenarios:

  1. Standard mode : Just add before the URL https://r.jina.ai/. This method is simple and straightforward and suitable for most situations.
  2. Streaming mode : If you find that the results in standard mode are not complete enough, you can try streaming mode. It waits for the page to fully render before serving the content. You can enable streaming mode by setting the request header:curl -H "Accept: text/event-stream" <https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page> This way, data is streamed in chunks, with each subsequent chunk containing more complete information. The last block usually provides the most complete and final results. This is useful for downstream systems that require immediate content delivery or want to process data in chunks to stagger input/output and model processing times.
  3. JSON mode : Although this mode is still in its early stages and the output JSON is not particularly “useful”, it provides three fields: urltitleand . contentYou can control the output format by setting request headers:curl -H "Accept: application/json" <https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page>

Summarize

This tool is very useful for developers who want to improve the quality of language model input, especially when processing web content. Reader makes it easier to convert web page content into a format suitable for language model processing, thereby improving model performance and the quality of output results.

Leave a Reply

Your email address will not be published. Required fields are marked *