Build Your Own Voice Calculator: A Beginner’s Guide
Creating a voice calculator is a rewarding beginner project that combines basic programming, speech recognition, and simple math parsing. This guide walks you through a minimal working Voice Calculator using web technologies (JavaScript, HTML) and the browser’s Web Speech API so you can speak calculations and get results quickly.
What you’ll build
A web page that:
- Listens for spoken math expressions (e.g., “twenty three plus seven”, “five times six”).
- Converts speech to text.
- Parses the text into a mathematical expression.
- Evaluates the expression safely and displays the result.
Tools and prerequisites
- Basic HTML, CSS, and JavaScript knowledge.
- A modern browser with Web Speech API support (Chrome/Edge).
- Optional: a text editor and local web server (not required — can open the HTML file directly).
Core approach (high level)
- Capture speech with the Web Speech API (SpeechRecognition).
- Normalize the recognized text (lowercase, remove filler words).
- Convert number words to digits and map spoken operators to symbols.
- Validate and evaluate the parsed expression safely.
- Display the result and handle edge cases (errors, unsupported phrases).
Minimal working example
Create a file named index.html and paste the following code:
<!doctype html>
<html lang=“en”>
<head>
<meta charset=“utf-8” />
<title>Voice Calculator</title>
<style>
body { font-family: system-ui, Arial; padding: 24px; max-width: 720px; margin: auto; }
button { padding: 12px 18px; font-size: 16px; }
#transcript, #result { margin-top: 12px; font-size: 18px; }
.error { color: #c00; }
</style>
</head>
<body>
<h1>Voice Calculator</h1>
<button id=“start”>Start Listening</button>
<div id=“transcript”>Transcript: <span id=“txt”>—</span></div>
<div id=“result”>Result: <span id=“res”>—</span></div>
<script>
// Feature-detect
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
if (!SpeechRecognition) {
document.getElementById(‘res’).textContent = ‘SpeechRecognition not supported in this browser.’;
document.getElementById(‘res’).classList.add(‘error’);
document.getElementById(‘start’).disabled = true;
} else {
const rec = new SpeechRecognition();
rec.lang = ‘en-US’;
rec.interimResults = false;
rec.maxAlternatives = 1;
document.getElementById(‘start’).addEventListener(‘click’, () => rec.start());
rec.addEventListener(‘result’, (ev) => {
const text = ev.results[0][0].transcript;
document.getElementById(‘txt’).textContent = text;
try {
const expr = parseSpokenExpression(text);
const value = evaluateExpression(expr);
document.getElementById(‘res’).textContent = value;
} catch (err) {
document.getElementById(‘res’).textContent = ‘Error: ‘ + err.message;
document.getElementById(‘res’).classList.add(‘error’);
}
});
rec.addEventListener(‘end’, () => { /* ready for next / });
}
// Basic number-word to numeric mapping (supports 0-999)
const SMALL = {
‘zero’:0,‘one’:1,‘two’:2,‘three’:3,‘four’:4,‘five’:5,‘six’:6,‘seven’:7,‘eight’:8,‘nine’:9,
‘ten’:10,‘eleven’:11,‘twelve’:12,‘thirteen’:13,‘fourteen’:14,‘fifteen’:15,‘sixteen’:16,‘seventeen’:17,‘eighteen’:18,‘nineteen’:19
};
const TENS = { ‘twenty’:20,‘thirty’:30,‘forty’:40,‘fifty’:50,‘sixty’:60,‘seventy’:70,‘eighty’:80,‘ninety’:90 };
function wordsToNumber(words) {
// supports numbers like “one hundred twenty three” or “forty five”
const parts = words.split(/[\s-]+/);
let total = 0, current = 0;
parts.forEach(p => {
if (SMALL[p] !== undefined) current += SMALL[p];
else if (TENS[p] !== undefined) current += TENS[p];
else if (p === ‘hundred’) current = 100;
else if (p === ‘thousand’) { current = 1000; total += current; current = 0; }
else throw new Error(‘Unknown number word: ‘ + p);
});
return total + current;
}
function normalize(text) {
return text.toLowerCase()
.replace(/what is|calculate|equals|equal to|please|hey|ok|and/g,“)
.replace(/[^a-z0-9\s.-]/g,’ ‘)
.replace(/\s+/g,’ ‘).trim();
}
function parseSpokenExpression(text) {
const cleaned = normalize(text);
// operator mapping
const ops = {
‘plus’:’+’,‘add’:’+’,‘added to’:’+’,
‘minus’:’-’,‘subtract’:’-’,‘less’:’-’,
‘times’:‘’,‘multiplied by’:‘’,‘multiply’:‘’,‘x’:‘’,
‘divided by’:’/’,‘over’:’/’,‘divide’:’/’
};
// Try simple “number operator number” first
// Build regex dynamically from ops keys
const opKeys = Object.keys(ops).sort((a,b)=>b.length-a.length).map(k=>k.replace(/ /g,‘\s+’));
const re = new RegExp(</span><span class="token script language-javascript template-string" style="color: rgb(163, 21, 21);">^(.+)\\s+(</span><span class="token script language-javascript template-string interpolation interpolation-punctuation">${</span><span class="token script language-javascript template-string interpolation">opKeys</span><span class="token script language-javascript template-string interpolation">.</span><span class="token script language-javascript template-string interpolation">join</span><span class="token script language-javascript template-string interpolation">(</span><span class="token script language-javascript template-string interpolation">'|'</span><span class="token script language-javascript template-string interpolation">)</span><span class="token script language-javascript template-string interpolation interpolation-punctuation">}</span><span class="token script language-javascript template-string" style="color: rgb(163, 21, 21);">)\\s+(.+)$</span><span class="token script language-javascript template-string template-punctuation">);
const m = cleaned.match(re);
if (!m) throw new Error(‘Could not parse expression. Try “five plus two” or “twenty three divided by seven”.’);
const left = m[1].trim();
const opWord = m[2].replace(/\s+/g,’ ‘);
const right = m[3].trim();
const op = ops[opWord] || (()=>{ throw new Error(‘Unsupported operator: ‘+opWord) })();
const leftNum = isNaN(Number(left)) ? wordsToNumber(left) : Number(left);
const rightNum = isNaN(Number(right)) ? wordsToNumber(right) : Number(right);
return </span><span class="token script language-javascript template-string interpolation interpolation-punctuation">${</span><span class="token script language-javascript template-string interpolation">leftNum</span><span class="token script language-javascript template-string interpolation interpolation-punctuation">}</span><span class="token script language-javascript template-string interpolation interpolation-punctuation">${</span><span class="token script language-javascript template-string interpolation">op</span><span class="token script language-javascript template-string interpolation interpolation-punctuation">}</span><span class="token script language-javascript template-string interpolation interpolation-punctuation">${</span><span class="token script language-javascript template-string interpolation">rightNum</span><span class="token script language-javascript template-string interpolation interpolation-punctuation">}</span><span class="token script language-javascript template-string template-punctuation">;
}
function evaluateExpression(expr) {
// Very simple safety: only digits, operators, decimal and parentheses allowed
if (!/^[0-9+-/().\s]+$/.test(expr)) throw new Error(‘Unsafe expression.’);
// eslint-disable-next-line no-eval
const val = eval(expr);
if (!isFinite(val)) throw new Error(‘Result is not finite.’);
return val;
}
</script>
</body>
</html>
How it works (brief)
- SpeechRecognition captures spoken text.
- normalize() removes filler words and punctuation.
- parseSpokenExpression() maps spoken operators to symbols and converts number words to numeric values.
- evaluateExpression() safely checks the expression string and uses eval for simplicity (acceptable for this tutorial; see notes below).
Extensions and improvements
- Support larger numbers, decimals, negatives, parentheses, and multi-term expressions.
- Replace eval with a proper math expression parser (e.g., math.js) for security and richer features.
- Add continuous listening and UI indicators (listening, processing).
- Add voice output using SpeechSynthesis for spoken results.
- Improve language support and fuzzy matching for more natural phrases.
Safety notes
- The demo uses a basic safety regex and eval — adequate for controlled inputs but not production-safe. Use a proper parser or sandboxed evaluator for real deployments.
That’s it — a working starter you can run in your browser. Modify the parsing logic to expand supported phrases and operators as you learn.