Project at a glance
Market data with a clear economic value are spread across multiple sources such as print media, web documents, blogs and social media. In the COMET (cross-media extraction of unified high-quality marketing data) project we develop key technologies for combining and analysing these heterogeneous and multimodal data.
Automated data consolidation, classification and sentiment analysis components will support the extraction of marketing information which will help decision makers to optimise their brand and marketing strategies.
Project
COMET: Cross-media extraction of unified high-quality marketing dataTeam
Kuntschik Philipp More about Kuntschik PhilippFunding
Media Focus Schweiz GmbH, Innosuisse (Kommission für Technik und Innovation KTI)Duration
September 2013 – May 2015
Starting situation
In many media channels, e.g. print and online media, blogs and social media, market-relevant data can be found that reflect the public perception of products, their strengths and weaknesses, as well as the success of PR and marketing strategies. Manually evaluating these data sets is often not possible due to the increasing number of content sources. Therefore, in practice, business and web intelligence technologies are frequently used to automatically extract decision-relevant information from these sources.
Project goal
The COMET project develops technologies for determining, consolidating, combining, and classifying heterogeneous, multimodal data from a wide range of sources, with the aim of automatically recognising and extracting trade-relevant content and thereby making it usable.
Status
- 2 June 2014 – the COMET research paper Linked Enterprise Data for Fine Grained Named Entity Linking and Web Intelligence was presented at the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS’14) in Thessaloniki, Greece.
- 28 May 2014 – the project team has submitted the second project report to Innosuisse (Commission for Technology and Innovation CTI).
- 26 April 2014 – the article Enriching Semantic Knowledge Bases for Opinion Mining in Big Data Applications has been accepted for publication in Knowledge-Based Systems.
- 27 March 2014 – COMET progress meeting in Zurich.
- 8 January 2014 – the project team has set up weekly meetings to coordinate the development of the COMET XML exchange format and its web API.
- 30 November 2013 – web API – we have submitted the first project report to Innosuisse (Commission for Technology and Innovation CTI).
- 2 September 2013 – COMET opening meeting in Zurich.
- 1 September 2013 – official project start.
Implementation
The developed technologies support domain experts in adding relevant articles by making the results of an automated content analysis of the article available to them. Among other things, this analysis contains (i) the sentiment (positive versus negative perception) towards products, persons, or companies in the article, (ii) automatic classification of the article into familiar product segments, as well as (iii) automatically generated metadata. Furthermore, similar articles are automatically recognised and – if already available – equipped with annotations.
This procedure can considerably increase the efficiency and effectiveness of evaluation processes as well as the data quality of results. The annotated, decision-relevant documents are automatically aggregated and made available to decision makers in the form of statistics and classical press reviews. They thereby support them in optimising their PR, marketing, and branding strategy.
Although a majority of the relevant documents is available in electronic form, COMET uses a cross-media infrastructure. It captures both electronic documents and classical print media. Print media are scanned and then converted into text documents by means of OCR (optical character recognition). Recognition errors often occur during this process step, which presents a considerable challenge for the project. Similar problems arise through typing errors, non-standardised spelling and abbreviations, as well as navigation and advertising elements on web sites.
Results
A core element of the COMET project is therefore the development of algorithms that minimise the negative effects of these disturbances. Afterwards, the cleaned-up document is searched for target objects (persons, organisations, products, etc.), and the sentiment related to them as well as the sentiment of the document itself is determined. Subsequently, classification of the article is carried out by means of available text blocks and objects. Automating these working steps could considerably increase the productivity as well as the quality of the results of the business process.
Besides the current staff of the University of Applied Sciences of the Grisons listed above, the following former project staff and external people were involved in the project:
- Kevin Schnell
- Alistair Buxton
- Seraina Lutz
- Peter Metzner
- Daniel Streif
- Fabian Odoni
Research Blog
Our research blog at blog.semanticlab.net summarizes articles, methods, and applications that are relevant to the COMET project.
Publication
“Linked Entreprise Data for Fine Grained Named Entity Linking and Web Intelligence” by Albert Weichselbraun, Daniel Streiff and Arno Scharl
Parties involved
The COMET project is funded by the Swiss Federal Department of Economic Affairs (FDEA) Innosuisse (Commission for Technology and Innovation (CTI). The project partners include the university of applied sciences University of Applied Sciences of the Grisons (Swiss Institute for Information Research SII), and Media Focus – a market research company that determines independent marketing performance metrics.