How to reach the language technology (LT) market

Language technology (LT) provides the means by which communication and information systems, services and products fulfill the needs of multilingual text-processing environments. Language technology has reborn twice during the last forty years. It started as machine translation in the 1950's, then the interest turned to computational linguistics and the notion of natural language processing was invented in the 1970's. These decades, however, were mainly devoted to research. The commercial success arrived in the 1980's, where the proliferation of personal computers and the major word processing applications popped up the need for simpler language applications, like spell-checking, grammar checking, the thesaurus and hyphenation support. This need created the language industry as a sub-segment of the software industry. But it mainly happened in the Western hemisphere. In what follows, we list some points that have to be taken into consideration if we want to reach the LT market in the rest of the world.

1000 languages

There are much more than 1000 languages of the world, 160 of which are commercially interesting. Languages of our geographical area do not belong to the "most supported" ones. The big challenge is to describe these not-yet-supported languages with the help of a system which can be used for several purposes, touching new application areas.

Save 500 billion ECU

A 1991 study [Bossard, S., Etude Approfondie sur les conséquences du développement des produits de l'ingéniere linguistique, DG Xlll Study ML-77] showed that in Europe white collar workers spend up to 20 % of their time reading, writing, storing, retrieving and otherwise processing documents. It was estimated that by appropriate use of language technology the costs of document creation and handling process may be reduced by some 500 billion ECU. Another impact on the LT is the localization of software tools supporting the work in the office (word processing, spreadsheets, business graphics charters) are now in a mature stage where these tools are complex. They can be sold often to native speakers only, otherwise users are not able to handle them because of their complexity. Big software producers started to localize their products, that is to translate their menu structure, the error messages and the documentation of the software. Some functions are, however, tied to the target language, mainly in the well-known word-processors, desktop publishing systems and presentation systems which have normally been supported by the tools listed below:

Application areas

*    Spell-checking. Usually wordlist-based methods.
*    Hyphenation. Usually based on the same wordlist as for spell-checking.
*    Thesaurus. Usually a mono-lingual set of words.
*    Grammar Checking. Usually based on heuristic pattern-matching.

A EU survey 1993 [Hearn, M. & J. Freijser: The 1993 Language Engineering Directory, INK Luxembourg] identified more than 1100 language technology products including many new products in the area of speech recognition and synthesis, optical character recognition, document storage and retrieval, computer-assisted language learning, translation and others. The number of the potential office and business applications is increasing, and new applications need new type of linguistic support. Just to mention some areas
where linguistic applications have more and more importance:

* Indexing. Intelligent data base indexing.
* Search. "Noiseless" free text search.
* Replace. Intelligent 'find and replace' in word-processing.
* Extraction. Linguistic support for "real" automatic extraction and document-indexing. * Electronic secretary. Selection and categorization of faxes and emails by reading them.
*  OCR. High level correction in OCR.
*  Hand-writing. Language oracle in hand-writing recognition.
*  Segmentation. Correct segmentation of non-segmented input in spoken language systems.
*  Dictionaries. Intelligent dictionary look-up.
*  Translation. Support of the translator's activity by workbenches, etc..
*  Alignment. Synchronized handling of different language versions of translated documents.

From the above list it is clear that wordlist-based, non-linguistic software solutions cannot be applied. If the above modules are, however, based on different linguistic and/or software strategies, their integration into office and business automation systems of the near future would cause useless multiplication of very similar resources. Consequently, only high-level language engineering programs are able to cope with the problems listed.

Linguistic software technology

Linguistically sound algorithms and data. If THE system is linguistically
motivated, it does not mean that it needs to be slow.

Computationally effective implementation. No tricks, no hacking.

Multi-linguality. The application has to be as language-independent as an
application can be. This is the key of multi-linguality.

Platform-independence. It is possible, but there are no 100 % perfect

Application-independence Not for word-processors only, but for a wider
application area.

Automation with constraints The right balance between automatic methods and
user interactions have to be found. Warning: "A pen is not an alternative to
the writer. "

Fulfill the users' expectation. The users' expectations can be influenced by
good examples and demonstration systems.

Re-usability. Existing linguistic resources are for your disposal, but you
have to find it where.

Quality of design. Portability and modularity.

LT business strategy

Put greater emphasis on business. The leaders of the small LT enterprises are usually research oriented. It is, however, a profit-oriented enterprise which is expected to make money for its owners. A separated Sales/Business Manager person is needed who could pay more attention to profit-making business issues.

Supporting political climate. The current political leaders of the former Eastern-Bloc countries want their country to join the European Union. In case of a (highly unlikely) change to the opposite direction, all researches and activities related to EU countries could be forced to finish. In such circumstances a LT company should operate just in its mother country, which will be a relatively little and saturated market.

Supporting economic environment. CEE countries are still in a recession, their economies are ruled mainly by monetary arguments. Government has been following a restrictive monetary policy in order to finance the debts of the state. The will and signs of vitalization of the economy are coming now, export-oriented activities are centrally supported and funded. The stakeholder role of government has been strengthened.

Understand the business

The company must understand well the industry where they are operating. It is not the proofing tools industry - it is the language industry they are in. A large number of possible applications of their core technology is a consequence of that fact.

Advertising. Do not promise that it solves everything - you must say it's just a tool.

Protect home market share. According to Porter's diamond principle [David, F. R Concepts of Strategic Management, Macmillan (1991)] being successful on the home market is often a pre-requisite for international success.

Transform into trans-national. To move to other countries the LT company needs the knowledge of local linguistics experts. Experts in their mother tongue are needed, but there is also a need for business partners in those countries. Alliances would be desirable with potential business partners in the targeted countries.

Keep in touch with major purchasers. As it has been mentioned the cash generator blisiness is the royalty coming from big software companies that incorporate the technology from the LT firm in question. If they turn away from the company it could run into financial trouble. In terms of Porter's so-called five forces model [Grant, M. Contemporary Strategy Analysis. Blackwell (1992)], the bargaining power of the main purchasers is very strong.

Polish the current image. This image has already attracted business, loosing
it could damage the most important partner links.

Avoid losses in staff. The employees of a company of this type are hard to
replace. Computational linguistics is a highly specialized field and it
takes time to train a new specialist. If two or three of the employees would
leave the company it would danger the success of the research projects.

Standards. Keep the emerging standards before eyes.

Funding. Balance between academic grants and incomes based on products sold.

Gábor Prószéky, President of MorphoLogic in Budapest