With enough sophistication in analysis, there is an argument to be made that search relevance can be inferred even from very "messy" sets of data. In enterprise search, this is increasingly the case, with natural language processing and machine-learning technologies helping to mine vast amounts of data from diverse repositories across the enterprise. However, this does not mean that the search ecosystem does not benefit immensely from underlying governance practices that aim to enforce policies and lifecycles for data within their native repositories. By doing so, the underlying corpus of data is kept cleansed and prepped for search, which greatly improves performance and outcomes.
Search performance is proportional to data quality
No search tool is powerful enough to entirely filter out dirty or irrelevant data, particularly when that data is unstructured in format and imbued with human-generated meaning. The axiom repeated ad nauseam in statistics bears true: garbage in, garbage out. With most search tools on the market providing an ecosystem of connectors to access data in their native repositories (as opposed to aggregating and storing the data themselves), the onus is on the enterprise to ensure that the native repositories are operating as sources of truth rather than sources of confounding data.
For federated search tools to achieve optimal performance, attention must still be paid to the governance of the individual content repositories that the search tools are connected to. Therefore, the data that is being searched and accessed is relatively clean and managed to begin with. Failure to manage these repositories in their natural state diminishes the true potential of enterprise-wide search and discovery by inflating the volume of irrelevant/outdated data, increasing required processing power and cost, and increasing the risk of unauthorized data access. Adoption of an enterprise search and analysis tool ideally begins with an audit of the systems to be connected; ensuring that governance policies, lifecycles, and controls are already established and being properly executed within each repository.
Another challenge with enterprise search is maintaining security and access control settings that may exist within native content repositories. This is because search results need to reflect individual user roles and permissions without displaying inappropriate or sensitive results to unqualified users. Housekeeping is required first. When implementing a search and discovery tool, attention must be paid to the native security and permission settings within the individual data platforms and repositories that the search tool is connected to. Although, while some search tools have excellent functionality for inheriting and maintaining these controls, the controls themselves must be in place first to pass on. Due diligence is required in this regard because many organizations still struggle to coordinate governance policies across respective systems.
Why Content Needs to Be Secure, IT0014-003196 (January 2017)
Software Market Forecast Report: Information Management, 2015–20, IT0014-003162 (December 2016)
Paige Bartley, Senior Analyst, Information Management