Speech Coach

Project information

Good speeches can have relevance for several decades or centuries. A good speech has the potential to motivate people and impact their minds and hearts forever. A good speech is centered around it’s substance, but how it is delivered is what makes a great speech.

English is an acquired language for most people in India, very few households in India use English to communicate amongst family members, friends in schools/colleges or even colleagues at workplaces.

Oral communication in English is of high priority to employers of multinational corporations in India.

Although, most people are afraid of speaking in front of a crowd, hesitant to take part in group interactions and feel nervous during interviews. While others possess the confidence, they lack speaking skills and a convenient platform to practice.


Our framework uses free-to-use and open-source products from the speech technology domain.

This framework enables its users to focus on various vocal elements of speech delivery.

The purpose of this paper was to introduce a web-based framework tailored and tested for the Indian accent/dialect that attempts to satisfy the need of a platform to rehearse speeches, gain confidence through practice, help you improve your oratory skills and deliver an articulate speech.

Overview of system architecture

Methodology: process and technologies used

Results from experimental Setup

1. Graphs of the Pitch, Intensity and Transcript aligned with time. We can zoom in to individual words or zoom out to sentences and scroll across the entire speech.

  • Intensity or loudness in decibels.
  • Frequency or pitch in Hertz.
  • Transcript aligned to time.

  • 2. Table of vocal element values such as speaking duration, number of pauses and filler words, articulation rate and intonation.

    3. Transcript of speech along with filler words and mispronounced or incorrectly recognized words highlighted in red color.

    Usage: Learning from mistakes and other orators.

    You can get results from a pre-recorded speech and the speech you just recorded simultaneously to improve on your own speech or learn by looking at results from someone else’s speech.

    Usage: Working on vocal elements of speech.

  • Intensity refers to the softness or loudness of the speech. Soft speeches tend to make the speaker seem less credible. Speeches that are too loud could displease the listeners and cause them to lose track of the message.
  • The speed at which one speaks is referred to the rate of speech. A higher rate of speech signifies a well prepared, informed and zealous speaker, whereas, a lower rate of speech signifies the opposite.
  • Variation in Pitch is one way to display enthusiasm or emphasize on important messages. Intonation helps in engaging the audience, and better expression of emotions.
  • Pauses when used correctly ensure smoother transition between different parts of speech. An unintended pause can cause the audience to lose track of the speaker’s message.
  • Filler words must be avoided while delivering a speech since the use of filler words indicates a lack of confidence and can damage the credibility of the speaker.

  • Refrences

  • “Google Speech Recognition API.” [Online]. Available: here
  • Anonymous, A Primer on Communication Studies. Creative Commons, 2012.
  • “my-voice-analysis”, my-voice-analysis, 2019. [Online]. Available: here
  • Weinberger, Steven. (2015). Speech Accent Archive. George Mason University. Retrieved from here
  • Jadoul, Y., Thompson, B., & de Boer, B. (2018). Introducing Parselmouth: A Python interface to Praat. Journal of Phonetics, 71, 1-15.ref
  • Boersma, P., & Weenink, D. (2018). Praat: doing phonetics by computer [Computer program]. Version 6.0.37, retrieved 3 February 2018 from here
  • A. Pettarin, "A Practical Introduction To The aeneas Package", Albertopettarin.it, 2020. [Online]. Available: here
  • Team

    Adhish Deshpande

    Innovation Engineer

    REDX WeSchool, Mumbai

    Rohit Pandharkar

    Chief Lab Mentor