WP VI:     Cross- disciplinary applications

The general approach for each of the four cross-discipline research programs in this Work Package is to (1) evaluate known techniques for modelling information and services in each domain (2) to analyse the used techniques for weaknesses and strengths relative to the characteristics of the domains, (3) to feed these results into the four basic research work packages, (4) to search for common improvements in methods and techniques (5) to develop a common platform for development of information and services (6) to relate the new model and approach to contemporary industrial practice.

Goals:             The aim is to choose different issues from the basic research components comprising language technology, conceptual modelling, information service engineering, information resource management, and Web application engineering for each of the cross-discipline research projects of health informatics, bio-informatics, learning with ICT to concentrate and to elaborate the issues as described above. Thus, knowledge exchange alongside a focus, and an objective test bed for the approach and techniques supporting ontology and Web service engineering will be provided.

This work package contains the following tasks.

Task VI.1:            In health informatics, the concentration will be on information modelling / ontologies related to electronic patient records, and in workflow modelling and user interface design in order to provide for better work support for health workers.

Task VI.2:            In bio-informatics, the focus is on ontology of the genome, to support retrieval of information from text about bio-informatics. This will involve an investigation of the use of ontology for integrating the immense amounts of data in biological databases.

Task VI.3:            In learning by ICT, the concentration is on articulation of knowledge. Learning is seen as a process, which leads to a shared model of knowledge about the domain being studied.

Task VI.4:            In information security, the concentration is on characterization of knowledge to be protected, and on characterization of illegal information processes. Role modelling, modelling of an actor’s knowledge of both the information and the other actors, as well as information access rules, are of special interest in this domain.

Task VI.5:  In digital libraries the focus is on modelling the overall structure of the libraries, as well as particular items, to enable effective retrieval and possibly also more advanced information services on top of the libraries.

Task VI.6:            The relevant language technology issues in this context are analysis of natural language with the purpose of meaning extraction from large text corpora, and cognitive and linguistic methods for semantic enrichment of models.

Basic research issues: because of the diversity of this workpackage, this is discussed field by field below.

Bio-informatics: Molecular biology is an information-rich and web-oriented research area. Today most of the relevant data are available through web. This includes genome sequences, protein sequences, protein structures, interaction data, microarray data, publications and experimental data. Data are continuously updated, and a large variety of small, specialised databases is the rule rather than the exception. Therefore the research focus in the field, bioinformatics, is increasingly towards the semantic web. A relevant example is the Gene Ontology project. Organised as a consortium (http://www.geneontology.org) the project is working on a structured, precisely defined, common, controlled vocabulary for describing the roles of genes and gene products in any organism. The vocabulary is organised into three categories, biological process, molecular function and cellular component. Currently more than 10.000 terms are defined. However, the Gene Ontology (GO) project covers only a very small part of all vocabularies used in molecular biology, and it has been argued that GO is more of a nomenclature or controlled vocabulary for molecular biology rather than a full-fledged gene ontology. Improved solutions and standards are strongly needed in most areas, and in particular standards for data retrieval and integration are almost non-existent. Also natural language processing is important, in particular for automatic data retrieval for publications.

Health informatics: Health services depend on – and generate – huge amounts of information in their operation, e.g., information about patients, diseases, and treatments, as well as budgets, personnel info etc. With traditional IT support the effective generation, use, and communication of this information has been a significant problem, yielding health services that are sub-optimal in terms of cost vs. stakeholder satisfaction. For instance, most doctors only spend a fraction of their time treating patients, more being spent on administrative activities and information search. The semantic web has potential for creating more effective IT support for the health services, building on meta-models / ontologies for patients, diseases, treatments, doctors’ fields of expertise etc. With systems that understand the meaning of the information they store, it becomes easier to retrieve the right information for a particular need. With systems that also understand the work-context where the information is needed (e.g., having active workflow models for doctors, nurses, and patients), it becomes possible to support health workers in an even more timely manner, providing just the information that is needed for every task. Such systems can also generate much of the required information about actions taken, thus relieving health workers of some of their administrative burdens and give more time for actual contact with the patients.

E-learning: A key issue in pedagogy is individualization, i.e., adapting the teaching to the needs of various learners. In many cases, however, IT supported education has so far focussed most on porting existing courses with traditional teaching methods onto the web, just making non-individualized teaching even more widely available. The semantic web has potential regarding the creation of more intelligent e-learning applications, providing individualization without a prohibitive increase in man-power. Some preliminary ideas:

    Make models of subjects, in terms of what knowledge the subject comprises

    Make models of courses or teaching/learning resources, in terms of what subjects they address (learning goals, topic matter, skills), as well as available teaching methods

    Make models of each student, i.e., a defined and gradually updated profile showing hers/his background knowledge, short-term and long-term learning needs, preferences in terms of teaching methods, and constraints, e.g., in terms of time and money. All of this may vary highly between, e.g., a full-time student taking a full course and an industry consultant seeking urgent update on a specific topic.

If these representations are semantically interoperable, it should be possible for an e-learning application to match them to package an optimal learning process for each student, including guidelines on how to evaluate that the individual learning needs are being met.

Information security: In current IS development, security issues are often overlooked in the analysis phase, partly due to pressure from short lead times, and partly because mainstream IS engineers lack the competence to use methods for secure systems engineering, which tend to be heavyweight and require advanced mathematical knowledge. The semantic web has potential to address some of these difficulties through reuse of models. For instance, one can

    Model the organization, its information / IT assets, and the security goals for these assets (both existing assets and planned assets).

    Model threats / attacks, ranging from technically sophisticated hacker attacks through script-kiddy attacks and misuse committed by insiders, as well as physical sabotage and social engineering attacks.

    Model requirements that address various threats, and their links to IS architectures, products, and design mechanisms to ensure various levels of security.

With semantic interoperability, a development tool should be able to match these various models to save work to the development project, for instance in expressing security requirements more quickly and with a higher degree of completeness than before. E.g., given an organization with some existing and some planned information assets, what threats must be looked into, what possible requirements can be expressed to deal with those threats, and what possible designs / products exist to ensure such security requirements.

Digital libraries: In digital libraries it is of interest to make models both of the overall library structure, and in more detail of the various items in the library, to facilitate effective retrieval of information. Moreover, one can envision that the digital libraries of the future will not only offer information to their users, but also information related services. The natural medium for accessing such digital libraries will be the web, meaning that the core technologies of the semantic web (information modelling, meta-modelling, information service engineering) are highly relevant to this application area.

Natural language technology: Natural language is the most intuitive mode for humans to access information. Hence there are strong couplings between the field of natural language technology and semantic web. NLT can enrich the semantic web by making web applications accessible via natural language interfaces. On the other hand, natural language technology can also gain from the semantic web and ontology engineering, which can act as semantic models for natural language grammars, both for specific fields of expertise and in more general cases. Since language change over time, model management becomes a key issue – the language models must be easy to update, yet older versions must also be kept in order to understand documents written in the past. NLT technology is also highly relevant for concept extraction, i.e. in the process of establishing semantic models.