what do you know today? Knowledge Focus

Data Harvesting

kapow technologies

There are more than 165 million websites on the public web today, growing at an annual rate of 35% - and a very large number of corporate websites on the private web. This collection of web based content represents an enormous repository of information with significant potential business value for companies who are able to access it.

Forward-thinking companies today are finding ways to tap into this vast web repository of information and apply it to everyday business uses that help them get more out of their existing application infrastructure to create a competitive edge. They are becoming smarter, sooner.

Examples of Web Intelligence
* Competitive and Market Data
* Pricing Intelligence
* Government Agency Information
* Public Domain Records
* Internal Operational Data

Most data is accessible from a database, but has greater value when you see it in on a web page "in the wild", because you can understand the context of that data, and more importantly, see how it relates to other pieces of data, that collectively, add more value. Kapow harvests web intelligence by interacting with the web interface to collect web content and convert it into a structured form that can be written to a database or used by an application.

In addition to internal databases, valuable business data often resides as web-based content or in other semi-structured or unstructured forms, such as PDF documents, Excel spreadsheets, or XML files. Because these data sources are not structured, they do not have APIs to enable access and reuse, locking out important sources of data for applications and users.

This concept of web harvesting can be applied equally well to web applications behind the firewall. Kapow makes it very easy to create composite data models that bring together data from public and private web sources and combine it with data from internal SQL databases.

The possibilities are endless. Having the ability to access and utilize any web-based data source, in an automated way, either on-demand or in batch mode, on a massive scale, changes the playing field.

Large scale collection of web intelligence requires a deep understanding of the technologies and tools that developers use to create web pages, and sophisticated capabilities in how you interact with those pages to collect the data you need. We incorporate numeric and text string search, advanced relational page navigation methods, table handling, and user defined rules that help define relationships between HTML content. It is even possible to open PDF attachments on a website and search them as if they were web pages.

The Data Collection Edition allows you to access all of these data types, using our powerful visual scripting environment. Web content and data that you collect can then be written to a database, published as a web service, or transformed into a format suitable for use by other applications. And, to enable the execution of robots from with your Java or .NET applications, we provide a full set of C# and Java APIs for developers.

 

 

 


Contact Us

Contact
Knowledge Focus on:
Tel: 0861-KFOCUS
Int: +27 (0) 12-460-6240
Fax: 086-545-1793
or e-mail: info@kfocus.co.za

MAP