Site Search and Relevance

Time

Friday, 1:00 pm CDT - Friday, 1:45 pm CDT

Description

Submitted by Martin Anderson-Clutz

Site Search and Relevance - For many sites, internal search is critical functionality and heavily relied upon by visitors. But often, there's little consensus about how and when to test the quality of results, or the best ways to optimize the results on an ongoing results. I'll share my own thoughts on the subject but would love for this to be more of a BOF-style open discussion.

Transcript

MARTIN ANDERSON:
Alright, so I will confess that I had originally not realized this was going to be more of a bustle session, so I do have a set of slides that we could go through, or we can just sort of keep it very conversational. It's kind of up to you guys. I was kind of meaning to cover just a few sort of best practices and then maybe talk about some different ways that you can manipulate search results. But maybe since it's a smaller group, why don't we maybe just go around and everybody introduce yourselves and talk about what's your experience with site search and Drupal to date. So I can start. My name is Martin Anderson Clutz. I work at Northern, which is formerly Digital Echidna, and we tend to use search API and Solr for most of our site search solutions. So that's kind of what I'm most familiar with, but definitely interested to hear what other people's experiences are. So why don't each of us, when we're done, try and hand off to the next person? So, Ralph, why don't you go ahead?

RALPH:
Alright. I'm Ralph, I don't have much experience with (UNKNOWN) yet, I just wanted to listen along and get some ideas. And I'll hand it over to Noel.

NOEL:
That's perfect because I'm exactly in the same boat. I don't have a lot of experience, not within me, when I actually have only Drupal 7 experience as far as search is concerned. But we are working on Drupal 8 sites and will be migrating from 7 sites to 8 in the very near future. And I am just interested in hearing what everybody else is doing with the search. And I really feel like I need to know a little more about it and what the options are. Now I hand it over to Jack.

JACK FRANKS:
Thank you, Jack Franks, primarily a back end Drupal developer, doing a whole bunch of stuff in almost entirely an eight now. So that's good. I love playing with search API and doing interesting things with complex indexes and full text searches and things like that. Most of my experience is in Solr there, but the big project I'm working on is all AWS hosted. So we're about to be putting that in elastic, which I don't have a lot of experience in. But yeah, I just like playing with search. It's fun to do interesting things with it. I'll pass off to Maurizio.

MAURIZIO HERNANDEZ:
Hello, everyone, this is Maurizio Hernandez (UNKNOWN) and other places online I have had some experience is particularly with Solr, both in Drupal 7 and in Drupal 8. I wish I could say there were good experiences. For the most part, they were like, you know, people expecting like Google, like results from Solr, especially around synonyms in my case, like synonyms substitution and bring in basically, really the same results for different terms. And even though I have been able to configure synonyms and stuff, the you know, the results are not exactly the same. And that's been an interesting ongoing conversation with the clients to accept that or keep fighting. So, yeah, I also came to your experiences from others. Some tried to adopt some of those into my own projects. I pass it on to Terry.

TERRY SCHILLING:
Sorry I didn't have the microphone on. I'm Terry Schilling,I have a small hosting company and I do some Drupal websites. And I'm interested in implementing Solr search because the default search functionality on the Drupal 7 and Drupal 6 sites really didn't cut it. Also, interested in seeing what people do to help because, for instance, the search engine D.O, really doesn't help a lot sometimes. And I'm interested to see if Solr can correct some of those problems, especially with a smaller site.

MARTIN ANDERSON:
Terry, do you want to pick who's next?

TERRY SCHILLING:
The only problem is I wasn't here at the beginning, so I don't know who's already done it. So,

MARTIN ANDERSON:
That's fair enough. What do we say, Aaron, do you want to go next?

AARON GRANT:
Sure. Aaron Grant, I'm going to represent Box Studios a small firm in Chicago. We've done plenty of sites with Solr Search and use search API to sort of make that connection. And we're always looking for ways to make it more useful for the users. It's always a challenge, especially as we get into more granular content and is fed by paragraphs and sort of entity driven components of a larger page.

MARTIN ANDERSON:
Cool. Daniel.

DANIEL FICKER:
Sure, yeah, I'm Dan Ficker, and currently I work for Pantheon as a customer support engineer, basically. So, if you use Pantheon and have questions, you might get me in a charter ticket, as many of my teammates. But I don't know. I haven't done too much with search recently. I have a few sites that I play around with stuff on, but not too many sites I like actively maintain more helping customers with their sites. But I have used search in the past and I was just kind of, you know, interested in mostly listening and maybe contributing a little bit about what my experience has been. So,

MARTIN ANDERSON:
Sounds great. And Brian.

BRIAN SMITH:
Hi, I'm Brian Smith. I work for Reaching Across Illinois Library System. And yeah, I mean, interested in, you know, sort of what Aaron had said earlier, really interested and in search for dealing with granular content.

MARTIN ANDERSON:
So, I mentioned, I think before a few of you had come in, that I had started one of those scratchpad sort of like shared note taking spaces. Can everybody see that in the chat or does it only show, like, stuff that happens after you've got in?

JACK FRANKS:
I don't see too much chat, so I think it's only after you get in.

MARTIN ANDERSON:
OK. So let me post that, again. Feel free to add notes in terms of stuff that you heard that you think is stuff you want to remember or, you know, if you think of things, whatever. It's a shared document. So let's go ahead there. What if I, 'cause it sounds like we've got a few people in the audience who are in the group that maybe would be interested in hearing best practices. Why don't we start there? And then as we go through, if people have like, you know, I'd like to do it different this way because of that or, you know, those kinds of things, let's definitely keep in conversation as we go through it. So...

SPEAKER:
Totally. I say go for it. If you've prepared stuff, show it off. If we want to interrupt, we'll interrupt.

MARTIN ANDERSON:
Good stuff. That's what I like to hear. So, I'll give a shout out to the Drupal Marketing Initiative for making the slide deck, which I thought was kind of nice. If you're here, you probably don't, I don't have to convince you that search is important. But from a UX standpoint, it's often used by people who don't have the domain knowledge to sort of understand the main navigation of the site. And if they're kind of looking for something in particular, they may just be like, I'm not going to bother trying to understand the structure of the site. I'm going to hit search and try and find what I'm looking for. So, a couple of places where search is is something that users are going to particularly rely on. And I saw this tweet today and I thought it was kind of relevant. So I thought, now I'll just drop that in the presentation because, yeah, it's like you can, you know, work your butt off to make this, like, awesome functionality and, you know, like great content for your website. But if people can't find it, then, you know, it's kind of a wasted effort.

So, definitely make sure that, you know, not only is there compelling content, but also that people are able to find it. So, as I mentioned off the top, my experience is more with search API and particularly search API with Solr. So the content that I have ready is really more about that. But definitely, if people have experience that they want to share or questions that they want to ask about the core search, then I think we should try and make some time for that as well. So, starting with Fields, one thing to keep in mind is that there are different types of Fields in Solr. And then which ones will get indexed for relevance versus which ones will be available for things like facets, is partly around how you tell search API to index those. So fulltext fields are the ones that are going to be indexed to relevance. And then other fields like particularly, strings or integers, are going to be available for facets. And actually within the fields interface on search API, if you go down to the bottom, there's this like classical accordion and if you open that up, it's got all of the different data types.

So we'll give you a little bit of an explanation. It's like at least twice as long as this, because it has all these like other ones with like exceptions and whatever. But I think this list is probably a good starting point in terms of understanding those basic ones. So, you know, it says here, fulltext fields are the ones that are available for search, which I think, for our discussion today is really like they contribute to relevance, whereas the other ones are more for, you know, different kinds of filtering and so on. One thing that you're going to probably want to do is look at boosting important fields, so title is definitely one that usually like a match and the title is going to be a pretty good signifier that the content, is relevant to somebody search. So you're going to want to boost up the title, but there might be other ones. So, for example, if you know that you've got an e-commerce site and people are going to come in and use the product, ID like the SKU, then you may want to boost that one higher because an exact match there is a pretty strong indicator that that's exactly what they're looking for.

So, in terms of within the fields interface, you've got this like dropdown and one just means it's kind of like. the default. If you had certain ones that you feel like should be really weak, you can set it to a lower number, but a higher number is going to boost up a match in that field in terms of how it contributes to the overall relevant score of a particular result. One thing that I've really liked using in more recent builds of Search API is this rendered HTML output as a field. It allows you to basically tell a Search API, instead of me going in a field by field and telling you like, this is the field and make it full text. And, you know, like a bunch of those other things, you can basically say, take the rendered node of my content and just index the whole thing. And so when, you know, somebody brought up paragraphs earlier, this is a really great way to get really complex content types indexed without a lot of effort. So you basically say, yep, just indexed the whole thing. And then if you want, you can layer on like you can boost certain, like semantic tags.

There are ways if you wanted to have. So one of the downsides of this is that you lose the ability to say, like, in this one field, I want a match in there to boost hire. But you theoretically could also render the whole thing and then additionally render that field as well and then just boost it that way. So, again, it can be a bit of a shortcut. Theoretically, you could also do things like, so let's just go to the next slide. So, it basically for each of your content types will give you the option to say which (UNKNOWN) you want to use. So you could have like search indexes one and then change the configuration to have like basically, everything, but then pull out one or two fields if you didn't want them indexed for the relevancy piece there. So again, to me this is like a huge timesaver in terms of getting everything set up. Referenced entities can be a bit unintuitive because by default, something's like, let's say taxonomy terms are users. What will get indexed is actually the ID as opposed to like, with a taxonomy term, the actual label.

And so it works well if you want to use that as a facet. But oftentimes people will be like, well, I had this page tagged as, you know, whatever, environmental issues. But when I search environmental issues, that doesn't come up because one of the results and is probably because what Solr has indexed is actually, the ID value of that taxonomy term. So if you want it to show up for relevance and you actually have to add that as a field and then, so let's see here. I think I've got it in here. But just taking a second. So, when you're adding a field, you actually need to to dig into that field, get the reference entity and then get the name there and put it that way, if you want that to sort of contribute to the actual relevance for the term name. And obviously, this is a taxonomy example where the same could go for like usernames or a variety of other things where you may actually want it to index the name of the referenced entity or even another field of the reference. As you can see, a list all of the different fields.

But that's a good thing to know in terms of making sure that the right information from that entity is actually what you're getting indexed. So those are the different field pieces in terms of what fields you could index and maybe some strategies for that. Anybody have any thoughts on like, fields or like, you know, things that are cool or horror stories or any of that before we move on to talking about processers?

AARON GRANT:
I've got one. Going back to your comment about the rendered HTML output and paragraphs, I tried for the life of me to do at the sort of field centric way where we're defining the specific fields that would be indexed. And I'd be writing against the wall for several hours before I realized that, using a display mode and the HDMI output was way, way more streamlined for our particular purposes.

MARTIN ANDERSON:
Nice. OK, if there's no other comments, maybe we'll keep going with processors. For anybody who's not familiar, the idea of a processor is basically, it will manipulate either the indexing or the search query that's being passed through by the user, but oftentimes it will do both. So that's kind of consistent in terms of comparing apples to apples. So, it's actually a different tab in your Search API index configuration. And we'll get to screens a little bit in the second here, so. Alright, so the first one is HTML filter, so a lot of times you're going to be indexing content that's going to have HTML tags in it. And so using this HTML filter essentially, strips all of those out. It's going to make things like your search excerpts that appear look a bit nicer. If you're doing things like phrase matching, it's going to potentially make it work better in terms of like if part of the match is, let's say, inside of old tag, if you weren't using this filter, it may not match that properly.

So, particularly for body content or that rendered HTML output. I find that this works really well. And then the other thing is part of this configuration, is that you can also say certain tags within there. You should give a boost for relevance that matches, for example, if there's like an H1 and H2 tag in there, for the sake of relevance, that should probably be more significant than if it was just sort of like within the regular body text. So, here's what that process or configuration looks like. You can just say like, enable that for every field. I find that there are probably only like one or two fields where you really need that. You have the option to say like for images, whether it should index the old attribute or title attributes. And then here's where you can do the boost by different tags. So, I think these are basically default values I tend to use and then we'll talk a little bit about that a little bit later on. So, Ignore characters is basically saying, strip out certain characters like apostrophes or different kinds of punctuation that again, might stop the indexed content that should be a match from potentially matching up to you again, like a praise, search or different kinds of queries that way.

You can also use this for search in other languages. So, for example, if you're indexing French content, it'll have a lot of accented characters. But on a mobile device, it's kind of like really annoying to put those accented characters in. So oftentimes, a native French speaker on a French site doing a search, will use unaccented characters to do their search. And so it needs to be able to match the unaccented characters to the accented ones. And transliteration is a good way to do that. So, ignore characters. This one I find pretty harmless, o I tend to apply that one pretty broadly and usually stick with the defaults. But I feel like there have been a couple of times where I've had to customize that more based on a client having very particular needs in terms of what's going to be matched or not. So the next one is the highlight. And this one I really like to use because it's a good way to give that sort of Google style of result where you've got the title and then the excerpt below, and then it's actually going to like uphold where the search terms are matched within the content of the result.

And the one caveat here is that when you're using stemming, which depending on your version of Solr may be kind of on by default. The output of this can be a little bit weird. And so what I tend to do is make an aggregated field, which is an option you can do in search API and say, basically, give it a copy of what's in the Threnody output, except set it as full text unstamped. And then that way you're getting kind of more the natural output of what that looks like without the HTML. But stemming what it does is it'll take, where it's usually, I think it works best in English, but it may work in other languages I can't recall. And it might take words like swimming, swimmer or swims and reduce that down to the base word swim. And then that way it can match all of those things. So if somebody says, you know, swim location. It can match things in the content that would be swim locations, swims at this location, you know, those kinds of things. So, it's a way of getting matches that are sort of like, you know, an exact match by letter.

But in terms of the base word or like referring to the same concepts, right? So, here's what the highlight options look like, you can say, when to highlight, you can say the size of the excerpt that should return. You can, if you wanted to use something other than strong text for that, you could do that. Or maybe you want to throw a class in there, you definitely have some options. So, again, I tend to go with basically, the defaults on these. But you may want to sort of play around with this and get, figure out what works best for your particular site. And another thing you can play around with is parse modes, and the part that's a little bit confusing to me is that, this is actually something you can figure as part of your results for you. It's not part of the search index, I guess. It makes sense in a way, but it just seems weird that it's it impacts the relevance, but it's actually part of the view configuration. So, the default that it uses, I think, is just max parser, which is pretty fault tolerant.

So if people put in kind of like weird modifiers, it's basically just going to ignore all of those. You can change it to direct query, which allows a user to do like, you know, plus this works sort of like force to only show results that match this word or have a minus to say don't show me results that have this other word. I think you can do double quotes and some of those other things. But it is easy to break in terms of, if there's something off in the query that the user provides, it'll just like return the results, even though something pretty close, like if they change one modify or something, they would see a bunch of results. So depending on the sophistication of your users, using something like direct query could be beneficial. But you might want to make that available as like an advance search or something like that. There can also be, there's an option to use fuzziness as part of the matching algorithm. So you can say, let's say up to three letters could be off per word. So, if you have a lot of issues with typos or some of those other things, this can be a way to make sure that you get results.

But obviously you'll get some false positives in there as well. And then sloppiness is a term for if you're doing, let's say, a phrase match to say there could be a certain number of words in between the words of the phrase that you're trying to match. And then the last thing I'll mention here and we'll see it in a second, is that, as you change the parse mode in that view configuration, it's actually going to give you a description of what that's going to look like. So, here are the options they comes up with in Search API for your fulltext search filter. You can see them there. And then as you change to any one of them, it'll put a description down here to give a bit of a description. So, if you're having some issues, it feels like you're not getting good results or maybe they're too stringent or not strange enough. Maybe you can play around with different parse modes and see if those work better to align with your expectations of what you're hoping to get out of your search results.

Any questions or comments about the processers? Sounds like we're good. Alright, let's talk about different strategies for actually kind of like manually curating results, because I would say notionally, you want to do as much as possible to get the overall kind of like algorithm or configuration as tight as possible in terms of like tuning the right fields and some of those other kinds of things. But inevitably, I feel like you're going to end up with a site owner who says, you know, when I search for this query, I want this other one to be on top. And it never is. It's like third or something. So how do I make sure that that one gets on top? So, I wanted to talk about some different strategies for kind of manipulating those results. So, one of them is, this one is built into Search API and it's, again, one of the processors, but it's a type specific boosting. So the idea is that you can say, you know, again, if you've got an e-commerce site, I want matches that are within products to be above any other kind of content because I really want to make sure that people are directed towards products.

And so you have a set of options for each bundle or content type that's being indexed and you can affect the road. And so, again, this is a processor and then you can see for each of your bundles, you can use the default or weigh less or up depending on how you want it to sort of influence that content. There are that type of content within your search results. Alright. There are also ways to boost recent content. So, if your site, let's say, has a lot of news content, you may say I want more recent results to be weighted higher than older content. And I put a couple of links in here. I can make these slides available. I have to think, maybe I'll put a link into the Drutopia part here, because I will confess, I've never actually had to do this myself, so I don't have any kind of like a graphic to show you. But so far, the reference I've come across seems like it involves writing some some custom code and changing the password. So by all means, if you try this out, I'd love to hear some of this experience and what it was like to set that up.

Another strategy that we have used a few times is to have like a search keywords field. And basically, you're giving the editor the chance to say, these are the keywords that we think are highly relevant to this piece of content that can help you to rank it higher against those search terms. But it also gives you the opportunity if let's say, for whatever reason, the organization has kind of like internal terminology and they're forced to use that. But people outside the organization, like the stakeholders, people coming to the site are always looking for things in using terminology that the organization is like whatever, let's say for legal reasons, unable to use. The search keywords allows you to map those together to it to some degree. So, I wouldn't say that it's like a true fix for, like, the synonyms problem that was mentioned earlier. But it's a potential way to help mitigate that to some degree. So that like unfortunately, you're having to do it on like a node by node basis, as opposed to like a true synonym fix, which would say like wherever, you know, this word gets indexed, it should also show up as relevant for this other word.

You can do that in a Solr configuration. There's like a synonym text file. But so far, I have yet to come across anything within Drupal that really has an elegant way of managing that and then passing that through to like the Solr servers. I think I saw one module where you could define things and then he would like export text files that you would have to FTP to your Solr server. But that's a bit junky, I think, in terms of like having a decent workflow. So the search keywords feel again, as a way where you can sort of like manually assign relevance to different terms. You can boost it to say like, if there's a match there because it's almost like a curated relevance, we should probably use that a higher than just sort of regular body match. Typically, you're going to want to hide that from the view mode. So, you know, as somebody looking at the article or whatever the piece of content is, typically, you're not going to want to see whatever search keywords were manually assigned. It's really something that you want to keep hidden.

But get indexed within your search index.

BRIAN SMITH:
So, is this smart, the search keywords field going to show up in the edit mode for a node, for instance, or is this all in the Search API?

MARTIN ANDERSON:
So this would show within the edit for the node. And I guess the other thing to mention is that if you're using the highlight, the search exert, that it may show up in there as well, because, again, it's part of the content that's been indexed. So keep that in mind. A couple of other caveats about this approach is, it's a way to add relevance, but not take it away. So this can't be used as a way to say, I don't want this node to show up for X search. And then the other thing is that there's no ability to really sort of like manually rank those. So you can say, I want these three pieces of content to show up for that keyword. And depending on how you have the boost settings, they may show up as the top three. But you don't really have the ability to say, I want this one one, this one two and this one three. So, a caveat there. Here's an example of what that looks like. I've also seen us implement this using more of just the text area, where you can, you know, like, comma separate them or put them on separate lines or, you know, keep it fairly free form.

I don't think it really matters too much, apart from the fact that, as I said, if it's going to show up in your search exert, then you may want to think about what's the best formatting for that. I think there's one more. Search overrides is a module that actually is designed to allow editors to be able to kind of manually curate those results. So it gives you an interface that can be used directly within kind of like your note edit form to do things like, promote or demote that piece of content against specific searches. And then also for promoting content, it gives you the ability to sort of manually rank those. So I think I've got a bit of a demo here we can do quickly. So, if I go into here, maybe this is the one that's like grab. Yeah, this is the one. OK, so, let's go to our Homepage sites. Let's do a search for a test. I'm gonna get results. So we've got, this is typical of a development set that I work on, there's lots of test content, but let's say this was a production site and we wanted this Covid testing content to appear first.

So let's go in here. Let's edit that content. And then you can see we've got the search overrides tab over here and we can say, for searches against test, we want that to be promoted. And for whatever reason, let's say it's showing up for searches against the word squid. Actually, that's wrong. That, I meant to say, we should exclude that so we can say squid and it will show out there as excluded from that. So again, it gives you that ability to either promote or exclude. And if we go ahead and save our piece of content. No. Live demos are always exciting and then let's go tasks. Search again. So we've got an error, I suspect that's not white or that's why it's not working. Let's try that one more time. Come back here. Search here. Showing those, just squid. Let's look at the record. OK, so that's why. It's because there's already promoted content there and so it's showing those ahead because those had been promoted earlier. So we could go ahead here and now manually rank those to say we want the Covid one first.

We want this changed one second. Let's say, let's go ahead and save that. And save a node. And if you go back and search results. We've got our Covid testing first. I'll change one second and then our test article for it. So those are the three promoted ones in the order that we sort of manually specified. So, again, probably not something you need on every site. It's probably the kind of thing that could be overused by some clients instead of like actually writing good content or using keywords in the proper sort of semantic ways. But it can be a good sort of like, you know, if the client positively, absolutely, has to have these five nodes in this particular order for that search, because it's the one that everybody comes to their site and searches, it's sort of a nice one to have available. One thing that I'll mention, though, is that this particular, so this module relies on a feature in solar that was released in 4.7. So I know at this point it doesn't actually work on either Aquia or Pantheon.

I keep hearing that Aquia is close to rolling out Solr 7 as an option, at which point you will be able to use this module. But I haven't heard anything recently about this being something you could use on Pantheon in terms of them using a newer version of Solr. But I forget who was. Was it, Jack?

DANIEL FICKER:
It was me. It was me, Dan.

MARTIN ANDERSON:
Oh, Dan yeah.

DANIEL FICKER:
Yeah. That's something I know that some people within Pantheon are working on, but I don't know that we have a, you know, a timeline exactly what will be working thismodel.

MARTIN ANDERSON:
And again, if it's something you absolutely had to, to have for your site, because it's sort of like critical functionality, theoretically, you can also use one of the hosted Solr services. I feel like, depending on how much volume you need, they're probably not going to be too expensive. So that might be another option to go if you're using a platform that doesn't have a new enough version of Solr and you really need to have that functionality. But as I say, it probably should be more of our last resort anyway. Alright. Let's see a couple of other concepts that I wanted to to throw out there. And again, open for discussion on these, encourage people to think of relevance as a process and not a destination. So it's not a like set up your site, check if you're getting good search results and then you never have to think about search anymore, like, really on it on a periodic basis. So like ideally monthly, but maybe quarterly, go into your analytics and see what Web site searches people are doing most often.

Go on to your site and actually run those and then have some kind of a subject matter expert. Look at those results and say like, yeah, those are probably the best results that we have. Or maybe they're like, now, the thing that's down in eighth is probably what people are actually looking for when they run that search. So I think having that, you know, subject matter expert is probably an important piece of it, because they may have a better sense of like, what content is on site for the type of stakeholder? What are they looking for? The other thing I've seen some companies do is actually have like a little bit of it, like a survey so that they can collect data on on the search results. And then as they tweak them, they can see do the sort of like satisfaction ratings for private searches. Does that go up or down as we tweak different things? So, definitely some different approaches in terms of how you can implement this idea. But again, to me, I think there's an important idea of thinking of search as something that you should be continuing to look at, you know, evaluate and sort of tweak on an ongoing basis.

And then I think this was the last thing I was going to mention here is think about giving context specific searches. So, you know, again, you can do your overall site, but you may want to have like a product search for your e-commerce site. Or if it's a municipal site, there may... you may want to provide a search that specifically for, you know, meetings and agenda and provide like different filtering options. And maybe the results on that should have, like, you know, for each of the meetings it'll display, like links to the agenda document and the minutes document and video recording, some of those other kinds of things. But by by making those more specialized, you can really provide a lot of extra value to the user in terms of, you know, making it easy to connect to, you know, the richer aspects of what it is they may be looking to accomplish. So anyway, those were some of the ideas that I wanted to potentially share. But, you know, if anybody has thoughts on any of those, maybe additional things that they do to make search work better, or even if people are like, you know, this is the problem that I run into every time and I haven't been able to figure it out.

I definitely want to open up the floor to to whatever people want to talk about.

TERRY SCHILLING:
Well, Martin, I had a question for you about the rendered HTML output function. Is that a full page render or is that on a per note or, you know, I'm thinking about blocks that would be on every page or most pages on a site. I mean, would that show up on, you know, in that rendered HTML output? If it's the whole page, then like the keywords might show up on every node reference. I mean, I don't quite understand how that works.

MARTIN ANDERSON:
So, anybody who's tested this and has a different experience, feel free to chime in. But I believe that it's sort of like the, what would be in your tweak template is like the node content. So it's almost like the output of that like node tweak template. So, it wouldn't really have like blocks per say. Is my understanding, unless you were using like a layered filter type solution, in which case you might get like inline blocks or a few in place like other blocks into your layout.

JACK FRANKS:
Yeah, that's right. It's only the entity content that gets indexed like that. A good way to test what you might have that gets rendered and then would be indexed, is if you create a view and the display mode on that is a rendered entity output. You'll see that when it renders all of the nodes that you have selected and whatever your view criteria is, that it doesn't render out that whole page. It just renders the actual entities. And and that's what gets indexed. So whatever is in the page surrounding it, it doesn't care about. It's only indexing entity content.

MARTIN ANDERSON:
Any other questions, comments?

TERRY SCHILLING:
I guess the thing that begs begs the question then, Jack, maybe and others, what would happen if you have a decouple front end and you're not using, you know, you're not using tweak templates.

JACK FRANKS:
The search index is based off of the render service output from Drupal. So if you're using a decoupled front end, the Search API doesn't care. It's going to use whatever Drupal tells it, the entity looks like when it's rendered.

MARTIN ANDERSON:
Might also be worth mentioning, because I don't think I brought attention to it when we were looking at that slide, but you can also tell Search API what user role to use when it looks at that. So you could theoretically index content that isn't publicly available or vice versa. So. Alright, I think we're almost out of time here. We probably can go over until people start showing up for the next session. But again, you know, things are open to whatever people want to talk about.

BRIAN SMITH:
Hi, yeah, I'm seeing that there's a Search API attachments contrib module that looks like it's currently in beta. Do you have any experience with that? Because we have a lot of content that, despite my pleadings, that, hey, this should be a Web page. (UNKNOWN) It's like we're sticking a PDF under this into this node.

MARTIN ANDERSON:
Yeah. So actually, I've overall found that the Search API attachments module works pretty well. With older versions of Solr, you may need to have Tika available. So that's another like Apache sort of utility that that actually does the work of like extracting out all that content. But I've found that with more recent versions of Solr, you can actually use Solr's built in indexing. I think it basically has like a lot of that Tika type extraction capability sort of built into Solr. And so, from a management standpoint, it's easy to just sort of say like, use what Solr has and then it kind of works pretty well out of the box in my experience.

JACK FRANKS:
Yeah, the search API attachments module has a list of libraries you can use in order to do that, Tika is one of them. And the built in Solr Deep Dive is another one of them. For our you know, what we're indexing, I actually haven't noticed much of a difference in using one versus the other, except that, of course, you have to set up Tika. And the built in Solr thing all you do is enable the feature and that it just works.

MARTIN ANDERSON:
OK, thank you.

TERRY SCHILLING:
Yeah. Could you gentlemen put links into Drutopia there for those that module and also anything you might have about Tika. Because I'm totally unfamiliar with.

MAURIZIO HERNANDEZ:
Also, Martin, could you please share this later? That's very useful resources there.

MARTIN ANDERSON:
Yeah, for sure. I'll make sure that I put a link into the Drutopia pad there as well. That has a link for those notes.

MAURIZIO HERNANDEZ:
Thank you.

MARTIN ANDERSON:
Yeah, no problem. I know Synonyms was brought up early on, has anybody tried to solve that in terms of making Synonyms work with sort of Drupal and like Search API or Solr?

MAURIZIO HERNANDEZ:
My experience has been tricky in the sense that the (UNKNOWN) were in this process or only this time or very time, but so far, I get some what the plane expect, but not quite there. It's being a little tricky.

MARTIN ANDERSON:
What version of Solr have you been using?

MAURIZIO HERNANDEZ:
I think it was 8.

MARTIN ANDERSON:
OK.

MAURIZIO HERNANDEZ:
Yeah, well, that is a parade that has been going for a while, so I think we started with 6 or maybe even before 6, but at the moment, is 8.

MARTIN ANDERSON:
Alright. Well, I think what I'm going to do is maybe start this presentation.

SPEAKER:
Thanks for putting it together and putting it on. Sure, yeah, you guys enjoyed it? Yeah.

MAURIZIO HERNANDEZ:
Thank you very much.

MARTIN ANDERSON:
So, yeah, by all means, keep an eye on that Drutopia pad and I'll get a link in there, but if you guys, any of you have any other thoughts you want to drop in there, please share. So, thanks, everyone.

SPEAKER:
Good job Martin, thanks.

MidCamp 2021

Site Search and Relevance

Description

Thank You to our Core Sponsors