Excluding documents
Prevent pages being indexed by the Crawler by adding exclude rules. You can exclude specific domains, subdomains, URLs, metadata, etc.
Last updated
Prevent pages being indexed by the Crawler by adding exclude rules. You can exclude specific domains, subdomains, URLs, metadata, etc.
Last updated
Crawler Only
Unlike search settings, documents excluded via the exclude rules are not stored in the search index. Therefore changes to the rules require a re-index and are not realtime. We generally recommend to instead filter documents via Search settings.
To start, head to the Crawler rules section.
Click the ‘Create rule’ button in the top right, then choose the ‘Exclude’ option.
Select the field for your exclude rule (e.g. URL, domain, directory). The fields in the dropdown are generated based on the data structure defined in your Schema.
Select an operator to define the condition (e.g. contains, equals, does not equal, etc.) for the exclude rule.
Enter a value for the field that you want to exclude. This may be a directory name, a domain, a full URL, or something else that defines the content you want to exclude. See the examples below.
Exclude rules may take between 20 mins to a few hours to process.
To exclude a specific URL:
Select 'URL' from the first dropdown.
Select 'Equals' from the second dropdown.
Enter the full URL in the third field, e.g. "www.example.com/articles/bad-news.html"
Click 'Create Rule'.
You can exclude specific sections of your website to ensure they don't appear in search results.
For example, if you don't want to index the blog content under "www.example.com/news/blog", you can set a rule to ignore everything under the Second directory if the name of that directory is blog.
Select "Second directory" (also referred as 'dir2') field.
Select 'Equals' from the second dropdown.
Enter the directory name (e.g. "blog").
Click 'Create Rule'.
You can exclude specific file types from your search results by targeting the filename extension at the end of the URL e.g. PDF, DOC.
For example, to exclude all PDFs from your search results:
Select the URL field.
Select 'Ends with'.
Add ‘.pdf’ in the text field.
Click 'Create Rule'.
To only make a specific section of your website searchable, you can use the does not equal condition to any content that not within that section: For example, to only index the documentation within "www.example.com/docs", you can set up the following rule:
Select the URL field.
Select 'First directory'.
Select 'Does not equal'.
Add ‘docs' in the text field.
Click 'Create Rule'.