Score:1

Is there a way to use the ingest attachment plugin with Elastic App Search

bv flag

I'm working on a portal that hosts multiple types of documentation (HTML, PDF, PPTx, DocX) and makes them all searchable in one place.

We could achieve this using the "standard" out of the box ElasticSearch and the Ingest Attachment plugin but I'm no data scientist and know very little about writing Elastic queries, so our search results are not great.

I've been recommended to use Enterprise App Search instead of trying tune my basic queries, but it would seem I can't use the ingest attachment plugin with it? As a result, I can't leverage simply sending base64 encoded content to the documents API and expect the content to be indexed.

Is there any way around this? Or would I not be able to use App Search for this??

Score:0
bv flag

The answer to this is extracting the attachment content either by leveraging the attachment pipeline as suggested in this blog post or, if you have your backend in Java like me, you can use Apache Tika to extract your content from attachments yourself.

I implemented Tika to extract the HTML content (it's very straight-forward actually)

static String getContent(String htmlContent) throws TikaException, SAXException, IOException {
    InputStream input = new ByteArrayInputStream(htmlContent.getBytes());
    ContentHandler handler = new BodyContentHandler();
    Metadata metadata = new Metadata();
    new HtmlParser().parse(input, handler, metadata, new ParseContext());
    return handler.toString();
}

For PDF files I was already using Apache PdfBox to extract some other properties so the text came for "free". Same for Office files, but that requires Apache Poi.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.