Laboratory for Intelligent Multimedia Processing (IMP)

Laboratory for Intelligent Multimedia Processing (IMP)

  • Introduction

 

 

Laboratory for Intelligent Multimedia Processing (LIMP)

Director: Dr. Mohammad Mehdi Homayounpour

kkkk

 

 

History:

Laboratory for Intelligent Multimedia Processing (IMP), previously named Laboratory for Intelligent Sound and Speech Processing (LISSP), was founded in November 1996 by Dr. Mohammad Mehdi Homayounpour.

 

Motivation:

Nowadays, the need for research and development of techniques for processing of multimedia data is veryimportant and necessary.  The mission of IMP is to develop sophisticated algorithms for processing of multimedia data and development of intelligent multimedia systems.

 

Outstanding features:

The knowledge and experience provided through the lab during the past 18 years since the establishment of this laboratory enables the development ofMultimedia systems using the state-of-the-art techniques. This laboratory has been pioneer among the laboratories in Iranian universities in most of speech and text processing fields especially on Farsi speech and text.

 

Research focus/goals:

At the present, this laboratory pursues both basic and applied research projects, with emphasis on the following topics: 

Speech processing and recognition

In this field we are investigating on representation, modeling, and classification of speech signal for recognition of phonemes, isolated and continuous speech, voice activity detection, voiced/unvoiced detection and word spotting. Some of our research areas and conducted projects in this field are:

  • Bernoulli versus Markov: Investigation of state transition regime in switching-state-model based acoustic modeling
  • Time-Inhomogenous Hidden Bernoulli Model: An alternative To Hidden Markov Model for automatic speech recognition
  • Refining phoneme Segmental Boundaries using Support Vector Machine
  • Phoneme recognition using ART 2A neural network and HMM
  • Farsi continuous speech segmentation to Voiced/Unvoiced segments, syllables and phonemes
  • A Comparison of some speech/silence detection techniques in radio broadcasting using Support Vector Machines
  • Robust voice activity detection based on short time features of audio frames and spectral pattern of vowel sounds
  • Phonetic transcription and time alignment of Farsi speech databases using HMMs
  • Speech segmentation using forced alignment technique
  • Speaker independent farsi digit recognition using predictive neural networks
  • Speaker independent isolated farsi digit recognition over yelephone using HMM
  • Triangulating of Dynamic Bayesian networks for isolated digit recognition
  • Farsi connected digit Recognition over telephone using embedded re-estimation method in a HMM based System
  • Continuous Farsi number recognition over telephone using HMM/MLP hybrid system
  • Farsi continuous speech recognition using a SOM-HMM hybrid technique
  • Design and development of and IVR system
  • A neural network based local SNR estimation for estimating spectral masks
  • Some hybrid methods for feature robustness in speaker identification in adverse conditions
  • Two new weighting methods for combination of probabilities in robust multi-band Speech recognition

 

Speaker Recognition:

        Speech and keystroke behavior are two important biometrics which have attracted considerable research attentions in recent decades. Different aspects of speaker identification and verification have been studied in our laboratory and the following research topics have been considered:

  • Speaker verification using neural networks and genetic algorithms
  • Speaker verification and identification over telephone using a hybrid model of HMM and GMM
  • Speaker recognition in adverse conditions
  • Comparison of some frame-and utterance-level score normalization methods for improving the performance of speaker verification and identification systems over telephone lines
  • Speaker model and decision threshold updating in speaker verification
  • Efficient Hybrid GMM/SVM Classifier for open-set text independent speaker identification
  • Speaker identification using supervised and unsupervised neural networks
  • Study of the effects of speech coding and speech transfer over internet on speaker identification
  • Performance evaluation of linear prediction, Fourier, Wavelet, and Wigner-Ville time frequency speech representations for speaker verification
  • Performance improvement of speaker verification using verbal information
  • Audio-Visual speaker Identification using dynamic facial movements and utterance phonetic content
  • Speaker tracking using Eigen Decomposition and an index tree of reference models
  • Text Independent speaker verification using variational Gaussian mixture model
  • A real-time trained system for robust speaker verification using relative space of anchor models
  • Discrimination of voices of twins for speaker verification
  • Hybrid score normalization schemes in discriminative and generative classifiers and its application to text-independent speaker verification
  • Using linear and non-linear dimension reduction techniques for performance improvement in speaker verification
  • Speaker verification using binary tree of fuzzy support vector machine
  • Verification of user identity using user's voice and keystroke behavior
  • Robust speaker recognition using speech spectral peaks in autocorrelation domain
  • Robust Speaker verification based on multi stage vector quantization of MFCC parameters on narrow bandwidth channels
  • Robust speaker recognition against channel distortion and bandwidth reduction

 

Speech synthesis and Text to speech:

        Speech synthesis and Text to Speech systems have already been developed for many of languages. One of our focuses in IMP laboratory has been to develop algorithms and techniques for Farsi Text to Speech. Text to speech includes text processing modules, prosody module and speech synthesis module. Our laboratory activities including text processing and prosody modeling are included in Prosody modeling and Natural Language Processing sections, but those activities related to Farsi speech synthesis are as follows:

  • Farsi speech synthesis, using Harmonic Plus Noise Model (HNM)
  • Farsi speech synthesis using Hidden Markov Model (HMM)
  • Farsi speech synthesis using Unit Selection technique
  • Improvement of formant synthesizer and TP-PSOLA for Farsi speech synthesis
  • Estimation of Farsi speech synthesis parameters using machine learning methods
  • A study of Farsi language problems for Farsi speech synthesis
  • Prosody modeling for Farsi text to speech
  • Speech unit selection and production of unseen synthesis units for Farsi speech synthesis
  • Automatic determination of target cost in unit selection speech synthesis

 

Prosody Modeling:

One of the important modules in a Text to speech system is the prosody modeling. The following research topics have been considered in our laboratory for development of Farsi Text to Speech systems:

  • Pitch contour modeling using Tilt method for Farsi text to speech
  • Pitch contour modeling using Fujisaki method for Farsi text to speech
  • Duration modeling using MARS technique
  • Energy modeling using piecewise modeling method for Farsi text to speech
  • Tilt Model parameter estimation using neural networks, MARS and Support Vector Machine for pitch contour modeling in Farsi text to speech

 

Language Identification:

        Identification of speaker's language is necessary in multi-language services such as spoken machine translation, interactive voice response systems, customer relationship management, audio indexing, etc. The following topics have been considered for research in our laboratory:

  • Automatic identification of some Iranian spoken languages using statistical methods
  • A hybrid model of vector quantization and Support Vector Machine for language identification
  • Performance improvement in language identification using GMM-SVM hybrid method
  • Improvement of Language Identification Performance by Generalized phone recognizer
  • Improvement of Language Identification Performance by Aggregated phone recognizer
  • Using probabilistic Characteristic Vector Based on both phonetic and prosodic features for language identification

 

Speech coding:

Many speech coding systems have been developed. But the development of very low bit coders is yet one of the important research areas. We are also interested in this area and have conducted the following developments:

  • Design and development of a very low bit rate speech coding system using speech recognition  and synthesis techniques
  • Improved ITU-P.563 non-Intrusive speech quality assessment method for covering VOIP conditions

 

Age interval and gender identification:

        Many services may be age and gender dependent. So the recognition of age interval and gender of users seems to be important to provide better services. We have also been interested in the following research topics for development of algorithms for better gender and age interval recognition:

  • Age Identification using neural networks, Gaussian mixture model and Support Vector Machine
  • Age interval and gender identification using GMM and MLP neural network
  • Speaker age interval and gender identification based on jitters, shimmers and mean MFCC using supervised and unsupervised discriminative classification methods
  • gender identification using Support Vector Machines
  • Performance Improvement in Automatic Gender Identification Using Hierarchical Clustering
  • Home-Robot Speaker Gender Identification

 

Speaker diarization

        Speaker diarization is the process of segmentation of speech files to speakers, tracking of a certain speaker in a given file and speaker tying (tracking of a given speaker in different files which may be recorded in different environments).

  • Speech segmentation using modified Bayes Information Criterion
  • Unknown Multi-speaker speech clustering using Bayesian Information Criterion as a cluster validity score
  • Speech overlap detection using spectral features and its application in speech indexing
  • Estimating the number of speakers for speaker tracking using a hybrid of Bayesian Information Criterion and Ants Colony clustering methods
  • Unknown multi-speaker speech clustering based on a Novel Clustering Criterion

 

Speech Enhancement

Speech enhancement is one of the attractive branches in speech processing which aims to enhance the quality of noisy speech by removing its noise. Some of our most important contributions in this field are the following research activities:

  • Speech enhancement using a hybrid of spectral subtraction and genetic programming techniques
  • Reduction of musical noise in spectral subtraction with increasing of noise spectrum estimation

 

Audio Processing

        Audio signals may include speech, music, silence, environmental noise, etc. We have been interested to some research topics including indexing of audio files to speech, music and silence/noise. Some of these topics are:

  • Music genre recognition using machine learning methods
  • Speech/music recognition using support vector machines
  • Automatic speech versus music recognition for radio broadcasting
  • A comparison of SVM and its hybrid with VQ in speech-music detection
  • Speech/Music Detection in Radio Broadcasting Using Support Vector Machines
  • Speech and Environmental Noise Activity Detection in In/Out-door Robots

 

 

Vibration Analysis:

        Monitoring and diagnosis of mechanical systems such as generators, turbines, railways, trains, automobile engines, etc. are important needs that can be achieved using digital signal processing and computational intelligence techniques. Our laboratory has recently been interested to this filed and One of its contributions is as follows:  

  • Condition monitoring of electrical machines using neural nets

 

 

 

Computer Aided Language Learning:

One of the uses of speech technology is to help learning of new languages (for better pronunciation of phonemes and words or better prosody generation) or to help a child to learn better his/her maternal language or even to help people having speaking disorders. Our major contributions in this filed are:

  • Using speech technology for computer aided language learning
  • Automatic pronunciation evaluation for computer aided language learning
  • Using speech technology for determination of the degree of nativeness of language learner

 

 

Pattern Recognition:

Some of the other research topics in our  laboratory that can be in included in a more wide topic such as pattern recognition are as follows:

  • A study on performance and convergence rate of approximative reasoning in Dynamic Bayesian Networks
  • A Bayesian Network based approach for data classification using structural learning
  • Improvement of Fuzzy clustering methods for clustering of noisy databases
  • Robust weighted fuzzy C-Means clustering
  • Protein secondary structure prediction using machine learning techniques
  • A novel hybrid GMM/SVM architecture for protein secondary structure prediction
  • Intrusion detection in computer and network systems using Gaussian Mixture Models
  • Intrusion detection using Gaussian Mixture Model and its combination with Support Vector Machine

 

 

Natural Language Processing

In this section some of our research interests and conducted projects in the field of natural language processing are presented. These research topics are usually necessary for development of systems for text to speech conversion, machine translation, text classification, information retrieval, text summarization, etc.:

  • Part of speech tagging in Farsi
  • Farsi Text Normalization
  • Farsi named entity recognition
  • Word Sense Disambiguation of Farsi homographs using thesaurus and corpus
  • Letter to Sound conversion of Farsi texts using rule based and statistical methods
  • Letter to Sound System for Farsi Language using neural networks and CART trees
  • Letter to sound of Farsi named entities
  • Text to Phoneme using neural networks
  • Speech act recognition in Farsi texts
  • Emotion detection in Farsi texts
  • Keystroke saving in typing of Farsi texts using statistical language modeling and semantic information
  • Improving Farsi multiclass text classification using a Thesaurus and two-stage feature selection
  • Modeling of morphological knowledge using link grammar
  • Persian Text Normalization using classification tree and Support Vector Machine
  • Using semantic knowledge for topic classification
  • Using decision list for Farsi word sense disambiguation
  • Using Thesaurus to improve multiclass text classification
  • Detection of KasreEzafe using probabilistic context free grammar

 

Software Defined Radio (SDR)

Our laboratory is interested in some concepts in SDR including automatic modulation recognition, signal presence and active channel detection, center frequency estimation, phase and frequency synchronization, channel equalization, baud rate detection, channel coding/decoding, etc. Some of the conducted projects are:

  • Symbol rate detection using instantaneous frequency, robust against frequency offset and pulse shaping filter characteristics
  • Automatic modulation type recognition in presence of noise using Support Vector Machine and Particle Swarm Optimization

 

Equipments:

  • A set of several high performance and multi-core computers and servers, scanner and printer
  • Some important speech databases necessary for different speech processing applications
  • Some important lexicons and text corpora for development of Natural language processing systems
  • A database of modulated signals for research and development in Software defined radio and cognitive radio

 

Active Research projects:

  1. Tokenization and Normalization in Persian Language
  2. Letter to sound conversion in Persian language using Persian Language orthographic and phonetic characteristics
  3. Named Entity Recognition
  4. Persian Homograph Disambiguation using Persian Wordnet Relatives and Thesaurus
  5. Text-Independent Voice Conversion
  6. Optimizing of Speech and Speaker Recognition Algorithms to be executed on Hardwareswith Limited Memory and Computational Power
  7. Performance Improvement in Speed and Accuracy  in Brain-Computer Interface
  8. Farsi Named Entity Recognition based of Named Entity Characteristics in Farsi Texts
  9. Analysis of Modulation Type and Modulation Characteristics of digital Signals using Digital Signal Processing and Intelligent Techniques
  10. Robust Automatic Modulation Recognition using Supervectors
  11. Detection and extraction of multicarrier signals, equalization and symbol extraction
  12. Using semi-supervised methods in extractive speech summarization
  13. Computer aided phoneme pronunciation and prosody learning
  14. Active Noise Cancellation in Dynamic Environment Using Neural Networks

 

Projects:

  1. Recognition of continuous and discrete Farsi speech
  2. A study of Farsi natural language processing  difficulties in conversion of Farsi text to speech and presentation of some solutions
  3. Design and implementation of Farsi speech recognition and synthesis for very low bit rate vocoders.
  4. Design and implementation of a software for vocal conversation via internet at very low bit rates
  5. Farsi Text To Speech conversion: improvement of synthesized speech quality
  6. Improvement of phoneme recognition for word spotting in telephony conversations
  7. A study of current situation on Farsi Text To Speech
  8. Authoring a research document on Text To Speech
  9. New algorithms for indexing of audio documents
  10. Speaker identification for authentication of identity of users in Web services
  11. Automatic detection of modulation type of telecommunication signals
  12. Movement control via voice
  13. Design and development of tools, instructions and test data for evaluation of text processing and speech synthesis systems

 

People (Education, Specialty):

Lab supervisor: 

  • Dr.Mohammd Mehdi Homayounpour, Associate Professor

 

 

PhD students:

  • Mohammad Ali Keyvanrad
  • Mehdi Khademian
  • Abbas Khosravani

 

 

MSc students:

  • Mohammad Amin Mehralian
  • Hadi Valipour
  • Najme Eslami
  • FatemeAliakarian
  • Zeinab Tahajodi
  • Fateme Gholamalian
  • Hadi Hosseini
  • Hamid Reza Hakimdavoudi
  • Amir Badamchi
  • Amir Namavar
  • Amin Naemi
  • Hoda SadatJafari
  • Sara Sadeghi