Using transformers to create meaningful search indices in Statamic

May 9th, 2025
8 min read

Statamic comes with great support for site-wide search, and is loaded with different options and customisations to really help you help your users find what they’re looking for.

When your site’s Blueprints are simple – like a Text field here, or a Markdown field there – searching text is easy – because it’s right there, ready to index and search, and you can just tell Statamic “hey, there it is, go index it and be awesome”. And it does.

What happens when your site’s Blueprints are beyond simple fields?

Like when you’re using a Bard field? How does Statamic know what to do with that structured version of your words?

Or what happens when you are using a page builder approach using a Replicator fieldset that is stored as a weird nested array containing a whole heap of content and configuration options?

And what about if a page’s layout is made up of content and other related content (such as referenced Entries) – how does Statamic know the complete content and context of the page get indexed?

Nuts, right? I’ve just been working on a project that ticks every one of these boxes though, and needed site-wide search.

Statamic’s index and search is only as good as the data you give it:

  • for your Bard content, Statamic needs to know how your content and each Set actually reads, and how it needs to be indexed, and

  • for your advanced pages with a Replicator or referenced Entries, Statamic needs to know the whole picture of your rendered page.

There two ways to approach this – and it depends on what you’re trying to do.

If you have a Bard field as part of your Blueprint, you can use Jack Sleight’s incredible Distill addon.

If you have an advanced use case, Replicators (or Grids), and referenced Entries, you may be able to use Distill, but can also render the page, and extract the content.

How to transform content for Statamic’s search index

When you configure your search index, you’ll no doubt be specifying which fields you want to have included in your index.

All of this is in your config/statamic/search.php file, under the indexes property.

Let’s say you want to include your Title in the index. You can simply specify the field as part of your fields array:

1'indexes' => [
2 
3 'default' => [
4 // ...
5 'fields' => ['title'],
6 // ...
7 ],
8]

You can use transformers to instruct Statamic to index a transformed version of a field. As a trivial example from the docs, let’s index the title field to have its first character in uppercase:

1'indexes' => [
2 
3 'default' => [
4 // ...
5 'fields' => ['title'],
6 'transformers' => [
7 'title' => function ($value) {
8 return ucfirst($value);
9 }
10 ],
11 // ...
12 ],
13]

These can be defined inline in your config file, or have their logic extracted to a separate transformer class too. As transformers could get lengthy, reusable or complex, this isn’t a bad idea:

1'indexes' => [
2 
3 'default' => [
4 // ...
5 'fields' => ['title'],
6 'transformers' => [
7 'title' => \App\SearchTransformers\TitleTransformer::class
8 ],
9 // ...
10 ],
11]

Within your class:

1namespace App\SearchTransformers;
2 
3class TitleTransformer
4{
5 public function handle($value, $field, $searchable)
6 {
7 // $value is the current value
8 // $field is the index from the transformers array
9 // $searchable is the object that $value has been plucked from
10 
11 return ucfirst($value);
12 }
13}

You can see here that we have the $value, but also the $field and $searchable - handy for complex use cases. The $searchable parameter is also available when creating this inline in your config file too.

These will get run whenever your Entry is updated, or you manually re-index the search index.

Check out the docs for further details of these. That’s a quick crash course in to transformers though.

Using Distill to create indexable content

If we have a simple Bard field, we can use Jack Sleight’s Distill addon within our transformer. Let’s create a new transformer:

1namespace App\SearchTransformers;
2 
3use JackSleight\StatamicDistill\Facades\Distill;
4 
5class BardTransformer
6{
7 public function handle($value, $field, $searchable)
8 {
9 return Distill::text($searchable->augmentedValue($field));
10 }
11}

Now why are we using the $searchable here?

When using Distill, we need to specify a Field instance so that the distiller knows how to process and handle that Bard content. The $value itself is just the raw value. The augmentedValue call gets us the Field instance with the value inside it.

What is really cool here is that the augmented value can be for a single Bard field… or even a Grid or Replicator that contains nested Bard fields.

The trouble here is that our page may be more than just Bard - and context is important for search. If the user needs to end up on a certain page for their search query, the index needs to know about it. But content may be in different fields… different types… presented to the user in a different way… or even have relationships with other Entries that get rendered to the user in a certain way. While this approach is great for a site that is purely made up of Bard (or Replicator in a Page Builder setup), if you’re having runtime-referenced content, it won’t get indexed.

Rendering the entry to create indexable content

Let’s say your page is made up of the editable Entry itself, plus includes key content from other Entries that may (or may not) have their own viewable page too. What I mean here is that these referenced Entries could be another page, but also could be a bit of an internal database - like a Lookup - that may not have explicit URLs themselves but get used on different pages.

In this instance, the related Entries should also be included in the search index.

This approach is basically going to render the page, get the HTML, and strip the tags away to get the plain text. But because this is just to get content, we want it to be a bare-bones approach to our content.

1namespace App\SearchTransformers;
2 
3use Illuminate\Support\Str;
4use Statamic\View\View;
5 
6class PageTransformer
7{
8 public function handle($value, $field, $searchable)
9 {
10 $data = $searchable->toAugmentedArray();
11 
12 $view = (new View)
13 ->template('search/_default')
14 ->layout('search/_layout')
15 ->with($data);
16 
17 return Str::squish(strip_tags($v->render()));
18 }
19}

So what is this doing?

  1. it creates an augmented array of the searchable - your main Entry - which includes everything that is needed to actually render the page, then

  2. it creates a view, and sets the template and layout, and passes the augmented data to the View, then

  3. it renders the view, strips the tags, and squishes the unnecessary spaces

We’re then left with a string of content that is now just that: content. Words. Searchable. Indexable.

Note that this setup requires a few things. Firstly, the template and layout.

The layout - search/_layout.antlers.html - is easy: it just outputs the template content. Its purpose is just that - output content, no strings attached.

1{{ template_content }}

The reason for this being so bare bones is that we don’t want to include things like the nav or footer - these should not be indexed - so this approach strips away the usual layout structure (your head, body, etc) and just simply outputs the rendered template.

The template - search/_default.antlers.html - is a little more complex. This is the template that needs to drive the render of the page. Basically it needs to be a hook in to the main rendering of your layout, as if it were for a user to actually see. You want your Page Builder to run. You want your related Entries to be called and rendered. You want only the content that is actually indexable to be visible in the markup, just like a user would see.

How that needs to work is up to you and your site.

If you’re using a Page Builder, it could be as simple as including that partial (assuming you have it as a partial… it’s a good idea for reusability):

1{{ partial:page_builder }}

Images: an opinionated approach

Depending on your configuration, you may also want/need to disable Glide processing. I’ve done this by passing an _is_text_only variable to the View, and in my image partial (all of my image handling is centralised to one reusable partial), if this variable exists, then the partial renders nothing. This did help with the overall performance especially on image-heavy pages.

I like to prefix special variables with an underscore to avoid collisions with field handles. But that’s just me.

1namespace App\SearchTransformers;
2 
3use Illuminate\Support\Str;
4use Statamic\View\View;
5 
6class PageTransformer
7{
8 public function handle($value, $field, $searchable)
9 {
10 $data = $searchable->toAugmentedArray();
11 
12 $data['_is_text_only'] = true;
13 
14 $view = (new View)
15 ->template('search/_default')
16 ->layout('search/_layout')
17 ->with($data);
18 
19 return Str::squish(strip_tags($v->render()));
20 }
21}

Then in my image partial:

1{{ if !_is_text_only }}
2Do things to render with Glide
3{{ /if }}

This approach could be used for more than just Glide - anything that you don’t really care about its output as far as your search index goes.


Both of these approaches to creating meaningful search indices in Statamic have their value for different use cases.

Perhaps your site is simple, and you just want to extract all of a Bard field’s content.

Or perhaps you have some complex and relational setup that needs to have much more context added to each index entry.

Either way, using transformers as part of your search index configuration is an incredibly powerful way to boost the context, quality and contents of your search. Just because the content is stored in a structured way, or is related to other content, doesn’t mean it is too hard to get it in to your site’s search index.

likes
reposts
comments

Comments

Reply on Bluesky to join the conversation.