Tips for a good search
DocSearch can work with almost any website, but we've found that some site structures yield more relevant results or faster indexing time. On this page we'll share some tips on how to make the most out of DocSearch.
Use a sitemap.xml
If you provide a sitemap in your configuration, DocSearch will use it to directly browse the pages to index. Pages are still crawled which means we extract every compliant link.
We highly recommend you add a sitemap.xml
to your website if you don't have one already. This will not only make the indexing faster, but also provide you more control over which pages to index.
Sitemaps are also considered good practice for other aspects, including SEO (more information on sitemaps).
Structure the hierarchy of information
DocSearch works better on structured documentation. Relevance of results is based on the structural hierarchy of content. In simpler terms, it means that we read the <h1>
, ..., <h6>
headings of your page to guess the hierarchy of information. This hierarchy brings contextual information to your records.
Documentation starts by explaining generic concepts first and then goes deeper into specifics. This is represented in your HTML markup by the hierarchy of headings you're using. For example, concepts discussed under a <h4>
are more specific than concepts discussed under a <h2>
in the same page. The sooner the information comes up within the page, the higher is it ranked.
DocSearch uses this structure to fine-tune the relevance of results as well as to provide potential filtering. Documentations that follow this pattern often have better relevance in their search results.
Finding the right depth of your documentation tree and how to split up your content are two of the most complex tasks. For large pages, we recommend having 4 levels (from lvl0
to lvl3
). We recommend at least three different levels.
Note that you don't have to use <hX>
tags and can use classes instead (e.g., <span class="title-X">
).
Set a unique class to the element holding the content
DocSearch extracts content based on the HTML structure. We recommend that you add a custom class
to the HTML element wrapping all your textual content. This will help narrow selectors to the relevant content.
Having such a unique identifier will make your configuration more robust as it will make sure indexed content is relevant content. We found that this is the most reliable way to exclude content in headers, sidebars, and footers that are not relevant to the search.