SpeechLogic & NISLab Nordtalk LD/HD Measuring transaction success in spoken dialogue information systems Hans Dybkjær SpeechLogic™, Prolog Development Center A/S & Laila Dybkjær NISLab, University of Southern Denmark
SpeechLogic & NISLab Nordtalk LD/HD Assessing results? •Subjective listening –Fine and important –Not suitable for contracts –Not suited for tracing progress –Very dependent on mood of caller •Transcript walkthroughs –Fine, provides many observations –Not suitable for contracts –Not suited for tracing progress •Transaction coding –Suitable for contracts –Suitable for tracing progress? •Huge work...
SpeechLogic & NISLab Nordtalk LD/HD Project and partners •Holiday Account (“FerieKonto”) spoken dialogue service via the telephone •September 2001 – December 2002 •Supported by the Danish government •Three Danish partners: –NISLab, SDU –Prolog Development Center A/S (PDC) –ATP-huset (hosts FerieKonto and other funds) •Employers pay 700 M kr. to FerieKonto per year •About selected “general information” in old touch-tone system per year •Philips Speech Processing sub-contractor to PDC
SpeechLogic & NISLab Nordtalk LD/HD Facts on FAQ •Phase 1 called ”Vejled” in operation since September •Phase 2, FAQ, in operation medio December 2002 •Dialogue model –About 40 A4-pages –80 semantic concepts in input –100+ different information stories in output –About 800 (full) words in vocabulary –About 2500 grammar lines •Context free with synthesized attributes –450 pre-recorded phrases, many long
SpeechLogic & NISLab Nordtalk LD/HD Characteristics •System takes initiative and guides user –User may take initiative and control system •Barge-in, i.e. the user may interrupt the system –But we don’t know where, i.e. for long output we don’t know how much of logged output they have heard •Whatever the user says is recognised as something withing system vocabulary and grammar •No sound output logged, only user input
SpeechLogic & NISLab Nordtalk LD/HD Transactions •No clear definition of transaction •One dialogue may be one transaction (e.g. ticket reservation or train information) •One dialogue may contain several different transactions (e.g. frequently asked questions) •A simple way of looking at transactions: –Start –End (success, failure) •Relate these to dialogue acts
SpeechLogic & NISLab Nordtalk LD/HD Examples •Success: U: What is your fax number S: Fax number... •Failure: U:What is your fax number S: address... •Wrong = unwanted reply: S: Do you want our address? U: No. S: Our address is... (user gets unwanted information – not a transaction) •Wrong = erroneous information: S: Fax number (actually PDC’s fax is ) •(’Wrong’ is outside the transaction scheme)
SpeechLogic & NISLab Nordtalk LD/HD Dialogue acts ActExample Offer/questionShould I repeat the address? Information FeedbackIf you are an employee… AcceptYes RejectNo thanks SelectionEmployee OtherWho is most beautiful in this country?
SpeechLogic & NISLab Nordtalk LD/HD Resulting tag set TagExplanationType acceptUser accepts system offer repairCorrections | otherUnclear or null offerSystem offers information to user rejectUser rejects offer selectUser selects from offer list checkSystem makes explicit feedback discardDiscard calln/a failTransaction ends in failure- startNew task initiated successTransaction ends in success- wrongSystem responds with wrong information or topic...
SpeechLogic & NISLab Nordtalk LD/HD An example dialogue •S 0.1: - Velkommen til FerieKontos talebaserede vejledningssystem. - Her kan du få standardoplysninger om udbetaling af feriepenge. - Du kan altid bede om hjælp eller gentagelse. •S 0.2: - Sig om du - er lønmodtager eller arbejdsløs i Danmark - er frameldt folkeregistret og bor i udlandet - eller har forladt arbejdsmarkedet på grund af alder eller helbred. - PAUSE5000 •U 0.1 jeg er lønmodtager •S Hvis du er lønmodtager i Danmark eller arbejdsløs og har et FerieKonto-bevis •S skal du udfylde det og sende det til os. - Udover dig selv skal din arbejdsgiver eller din a-kasse eller socialforvaltningen skrive under. - PAUSE200 •S Hvis FerieKonto-beviset er korrekt udfyldt når vi modtager det, overfører vi feriepengene til din konto. - Ekspeditionstiden er cirka fem arbejdsdage. - Du kan dog tidligst få dine feriepenge en måned før din første feriedag. - PAUSE1000 (Opening) (Offer) (Select) (Feedback) (Information) Start Success (Information)
SpeechLogic & NISLab Nordtalk LD/HD •S Vil du høre informationen igen? - PAUSE5000 •U 1.1 nej tak •S Vil du høre FerieKontos adresse? - PAUSE5000 •U 2.1 nej tak •S 3.1- Vil du tilbage til starten af den generelle vejledning? - PAUSE5000 •Event 1 nothing recorded •S Undskyld. - Jeg hørte ikke hvad du sagde. - Sig 'ja' for at starte forfra, 'farvel' for at forlade systemet. - PAUSE5000 •U 4.1 nej tak •S Undskyld. - Systemet kan give information om udbetaling hvis du er lønmodtager eller arbejdsløs, er frameldt folkeregistret eller har forladt arbejdsmarkedet på grund af alder eller helbred. Du kan også bede om FerieKontos adresse. - Sig en af disse muligheder hvis du vil fortsætte, eller sig 'farvel' for at slutte. - PAUSE5000 •Event 2 nothing recorded •S Farvel og tak. •Event 3 hangupOffer Reject Offer Reject Offer (Other) Repair Reject Repair (Other) (Closing) (Other) (End)
SpeechLogic & NISLab Nordtalk LD/HD Another example dialogue •S PAUSE Velkommen til FerieKontos automatiske vejledning. - Du kan lade systemet foreslå emner eller du kan forsøge med dine egne ord. - Du vælger emner ved at tale til systemet. - Vælg mellem emnerne: - PAUSE200 - 'lønmodtager'. - PAUSE200 - 'På orlov'. - PAUSE200 - 'bor i udlandet'. - PAUSE200 - Eller 'forladt arbejdsmarkedet' på grund af 'alder' eller 'helbred'. - PAUSE500 - Sig et emne, sig 'gentag' eller bed om 'flere muligheder'? •U 0.1 jeg er studerende •S Spurgte du om - Hovedmenu •U 1.1studerende •S Hvis du er studerende og modtager SU, kan du godt få dine feriepenge udbetalt selv om du ikke holder fri fra studierne. - Men hvis du har arbejde ved siden af studierne, skal du holde fri fra dit arbejde for at få feriepengene udbetalt. •S Hvis du har arbejde er det din arbejdsgiver, der skal underskrive dit FerieKonto-bevis, ellers er det socialforvaltningen i din kommune. •S PAUSE500(Opening) Start Success Offer Check Repair (Information) (Request) (Information)
SpeechLogic & NISLab Nordtalk LD/HD Another, continued •S Er der andet du gerne vil spørge om? •U 2.1 skal jeg skrive lørdage på •S Fra ferieåret 2002, der starter 1. maj, er ferieugen ændret til 5 feriedage, og lørdage skal således ikke længere tælles med i ferien. •S Vil du vide noget om 'ferieåret'? •U 3.1 nej farvel •S Farvel og tak for dit opkald. •Event 1 nothing recorded •Event 2 disconnectOffer Start Success (Request) (Information) Offer (Reject + request) (Feedback) (Other) (Other) Success Start
SpeechLogic & NISLab Nordtalk LD/HD Transaction annotated data •Dataset: –Vejled: A few thousand calls –About 500 FAQ test calls •Test: 225 calls, three batches, March-May 2002 –Batch 1 primarily developers –Batch 2, 3 “invited” test persons •Operation: 217 calls, one week, September 2002 –real customers with real problems
SpeechLogic & NISLab Nordtalk LD/HD Annotation •Transcribed using Philips Transcription Station –Then transformed to XML and web •Markup was done using an annotation tool developed by PDC –interface is a browser window –annotation files stored in XML •All dialogues annotated by same, experienced coder, using the same coding scheme throughout
SpeechLogic & NISLab Nordtalk LD/HD Results table TagTest1Test2Test3TotalSetA accept discard fail offer other reject repair start success wrong Calls with fail Total no. of calls Transaction success percent Smooth call percent
SpeechLogic & NISLab Nordtalk LD/HD Results comments •Higher transactions success in test dialogues •Primary causes of failure in test sets are: –Dialogue model –Language model •Causes corrected before operation •Difference in user groups •Test users follow the dialogue, they only have artificial problems •Primary causes of failure in operational calls are: –Real customers ask for information not covered –Typical questions to be covered by FAQ •Problem with callers hanging up without saying anything in the dialogue.
SpeechLogic & NISLab Nordtalk LD/HD Smooth dialogues •More precise overview of problems and their causes and seriousness –Same topic may have fail and success in same call –Few or many repairs –distinction between unwanted and erroneous information –erroneous information is unacceptable (tomorrow is Friday, phone ) –other information than asked for may be more or less serious (fax instead of phone, fax instead of ) –misunderstanding a yes for a no is usually not so serious (repairable) but can be a nuisance –Misrecognitions –Information blocks may contain more than asked for