Analysis of Audio Documents
Development of Methods for Analysis, Indexing and Search of Audio Documents in Multimedia Databases
Tech Area / Field
- INF-SIG/Sensors and Signal Processing/Information and Communications
- INF-SOF/Software/Information and Communications
8 Project completed
Senior Project Manager
Ryzhova T B
Belarussian State University, Belarus, Minsk
- Fraunhofer Institut Digitale Medientechnologie, Germany, Ilmenau\nCNRS / Universite Joseph Fourier / CLIPS, France, Grenoble
Project summaryProject purpose: creation of statistical methods of acoustic signal analysis and development of audio documents segmentation, indexation and retrieval methods in multimedia databases
Analysis methods, methods for indexing and retrieval of audio documents are developing rapidly nowadays. Significant results in this sphere were achieved by working out and implementing new methods and algorithms. Most successful is the implementation of graphical probability models: hierarchical Hidden Markov models, neural networks and support vector machines for the tasks of indexing audio and video content. The results of these investigations were presented at the international conferences: CBMI 2005 June 21-23, 2005, Riga, Latvia; RIAO 2004, Avignon, France, April 26-28, 2004; ICASSP 2004, May 17-21, 2004, Montreal; Seventh International Symposium on Signal Processing and its Applications, 1-4 July 2003, Paris, France; ISCA Tutorial and Research Workshop. Voice Quality: Functions, Analysis and Synthesis. August 27-29, 2003, Geneva, Switzerland.
There are several reasons for that.
First, every year national broadcasting companies collect many hours of recorded TV and radio programs as the number of TV and radio channels increases and storage means of large capacity become available. Moreover, multimedia databases become larger rapidly because of newly-added digital information. For instance, BBC TV archive for the last 45 years comprises 300.000 hours of national program records, radio program archive for 60 years comprises about 400.000 hours.
Second, a huge amount of audio and video information in digital form is available through the Internet via broadcasting channels, private or professional databases. Users today face such a large amount of multimedia content, presented by different providers, so effective access to this almost infinite amount of data can be hardly imaginable. It should be noted that managing huge data massive revealed some difficulties concerning storing and searching information in databases. To overcome these difficulties new instruments are being developed capable of indexing information.
Third, except the problems of indexing and designing the architecture of such databases another vital problem arises –information retrieval: how to formulate the query effectively and how to find the necessary information quickly. It can be text, images, video, sound, music or speech. Preliminary indexation for these tasks is necessary to fasten the processing of any kind of query.
Fourth, the processing of audio documents is more difficult than access to text data. Despite the fact, that text search should process some changes in spelling and offer different approximate solutions to users, it is easier to find a name or a string of words in a text than to classify an speaker or a word in an audio record, or a whole sentence with a big set of words. Besides, listening to audio records requires more time than reading a text. So it is important to have an opportunity to access the required parts of a document directly, rather than to listen to the whole record in order to find the desirable information.
Fifth, the algorithms for indexing acoustic signal in real-time mode with minimal hardware costs can be realized thanks to the rapid increase of computational efficiency provided by modern digital signal processing equipment, thanks to fast algorithms for digital signal processing and fast algorithms for calculation convolution product in different algebraic structures.
In multimedia data systems acoustic signals that are basic communication channels play the most important role. Audio sequences contain various types of data, like speech, music, environmental sounds. In connection with rapid development and widespread implementation of information technologies the problem of efficient transmission of huge amounts of audio documents via data channels arises, as well as the problem of their storage in local and distributed databases. Effective usage of this information requires efficient means of context search. The realization of such systems presumes complex approach, including analysis, segmentation, indexation and retrieval of audio data. These systems must provide users with effective tools that allow to find, to view, to index, to restore and to search information according to user’s query in real-time mode.
According to the above mentioned, within the project framework statistical methods, based on structural risk minimization criterion, are to be developed for training indexing systems; new models of changeability of acoustic signals based on Hidden Markov’s processes and support vector machines are to be created. Hybrid indexation systems are to be developed possessing some advantages over the systems based on Hidden Markov models only. These advantages imply more precise modeling of acoustic signal peculiarities, better context sensitivity and effectiveness.
Theoretic results of the project will be used for creating effective algorithms for segmentation, indexation and audio document search. The practical results of the project will be directed towards developing and creating software-based system for context search of information in multimedia databases and experimental testing of system possibilities.
Impact of the proposed project on progress in the research field.
During the project fulfillment new methods, algorithms and software are to be developed, implementing system approach to acoustic signal processing technology and creating the necessary basis for development of highly effective tools for audio indexing in multimedia databases.
The research group will be formed of specialists from Belarussian State University. The group will include experts on systems and devices of reconnaissance, detection, tracking and recognition of radar targets and aiming, target designation and self-homing systems. In the course of the project, BSU specialists are to develop methods, algorithms and software for analysis, indexing and retrieval of audio documents in multimedia databases.
Project participants have significant experience in development of applied signal processing systems, in creation of software for indexation and recognition of speech signals. The results of research work were published in the international magazines (IEEE Transaction. Signal Processing) and presented at the international conferences: RIAO 2004, Avignon, France, April 26-28, 2004; Biosignal 2004. Brno., Czech Republic; MEDICON and HEALTH TELEMATICS 2004. Ischia, Italy; ISSPA 2003. Paris, 2003; APBME 2003. Osaka, 2003; VOQUAL’03. August 27-29, 2003, Geneva, Switzerland.
In the framework of the proposed project the following tasks are to be fulfilled:
- Development of new effective statistic methods for acoustic signal analysis in tasks of audio indexation based on support vector machines and hidden Markov models.
- Development of new methods and algorithms for audio signal segmentation, grouping and for text independent speaker recognition in local and distributed databases.
- Development of software complex for analysis, search, indexation and visualization of audio information in multimedia databases.
Application of project results.
Scientific results, obtained during the project fulfillment, can be used for:
- Development of new technologies that would provide effective analysis, access and retrieval of information for the needs of digital TV production.
- Development of hardware and software means for access control and tracking systems.
Meeting ISTC Goals and Objectives
Project realization will allow to:
- Redirect activity of BSU research group, previously occupied in development of systems and devices of reconnaissance, detection, tracking and recognition of radar targets and aiming, target designation and self-homing systems to the solving of pacific tasks, related to development new technologies for information context retrieval.
- Broaden international integration and extend scientific relations in order to apply the achieved results in partnership projects.
- The scientific staff, participating in the project realization, can form in future the core of a research company working on the IT market in the filed of information retrieval.
Information on the work volume: Duration of the project will be 36 months. The whole project is pided into 3 connected tasks. Overall estimated cost of the project is 4680 person*days.
Role of foreign collaborators. Cooperation with the project collaborators will lead to fruitful exchange of scientific information within the scope of the project research, objective appraisal and review of scientific results, active participation in improvement and application of the achieved results.
Technical approach and methodology. Up-to-date digital signal processing methods will be used for analysis, indexing and context retrieval of audio documents: wavelet transforms, hidden Markov models (including hierarchical hidden Markov models), and support vector machines. New statistic methods for acoustic signal analysis in audio indexing tasks will be developed; effective audio signal processing methods using wavelet transforms will be developed and investigated; audio signal classification algorithms using support vector machines and hierarchical hidden Markov models will be developed. New audio data document segmentation, indexation and retrieval algorithms in local and distributed databases will be created.