Geoportal Harvester

3386
1
04-28-2014 06:28 AM
StephenCoppola
New Contributor III
Question 1:  Can the Geoportal harvester be customize to extract metadata from MS Word, PowerPoint, PDF?
Question 2:  If so, which profile do these documents need (ISO, etc)?
Question 3:  Are there existing tools or examples that couple Geoportal Server with textual documents?

Thank you for your time and suggestions
0 Kudos
1 Reply
MartenHogeweg
Esri Contributor

Yes! You can extend the harvester as explained in this example:

Extending the Web Harvester · Esri/geoportal-server Wiki · GitHub

Using for example apache tika you could get the document information from various file types.

you would generate metadata for these docs yourself in your preferred profile (question 2). I've been using Dublin Core as there typically is only limited information available when indexing docs.

I have some code that indexes docs that I could post on GitHub (question 3). Perhaps something to work on together?

0 Kudos