If you have more than a handful of legacy .DOC files to convert to .DOCX, doing it manually in Word is not a reasonable option. This guide covers every working method for batch conversion, with real command-line examples and honest tradeoffs.
DOC is a binary format from Office 97-2003. DOCX is the Open XML format that's been standard since Office 2007. Despite that 20-year gap, enterprises still have enormous archives of .DOC files:
Here are five methods that actually work at scale.
LibreOffice's headless mode is the most practical batch converter for most use cases. It runs on Linux, Mac, and Windows with no Word license required.
Ubuntu/Debian
sudo apt-get install libreoffice
macOS (Homebrew)
brew install --cask libreoffice
Windows: download from libreoffice.org
Convert all .doc files in a directory to .docx
libreoffice --headless --convert-to docx *.doc
Convert to a specific output directory
libreoffice --headless --convert-to docx --outdir /output/folder *.doc
Convert recursively (bash, Linux/Mac)
find /input -name "*.doc" -exec libreoffice --headless --convert-to docx --outdir /output {} \;
import subprocess
import os
from pathlib import Path
def batch_convert_doc_to_docx(input_dir: str, output_dir: str) -> dict:
"""
Batch converts all .doc files in input_dir to .docx in output_dir.
Returns dict with success/failure counts and any error messages.
"""
input_path = Path(input_dir)
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
doc_files = list(input_path.glob("*.doc"))
results = {"success": 0, "failed": 0, "errors": []}
for doc_file in doc_files:
try:
result = subprocess.run(
["libreoffice", "--headless", "--convert-to", "docx",
"--outdir", str(output_path), str(doc_file)],
capture_output=True,
text=True,
timeout=60 # 60 seconds per file
)
if result.returncode == 0:
results["success"] += 1
else:
results["failed"] += 1
results["errors"].append(f"{doc_file.name}: {result.stderr}")
except subprocess.TimeoutExpired:
results["failed"] += 1
results["errors"].append(f"{doc_file.name}: timeout")
return results
Usage
results = batch_convert_doc_to_docx("/path/to/doc/files", "/path/to/output")
print(f"Converted: {results['success']} | Failed: {results['failed']}")
Performance: LibreOffice processes roughly 10-30 DOC files per minute depending on file size and server specs. For 1,000 files, expect 30-100 minutes.
Limitations:
If you're on Windows and have Word installed, COM automation produces the highest-fidelity output. Word converts its own files, so formatting, styles, and layout are preserved exactly.
import win32com.client
import os
from pathlib import Path
def batch_convert_doc_docx_com(input_dir: str, output_dir: str) -> dict:
"""
Batch converts DOC to DOCX using Word COM automation.
Requires Windows + Microsoft Word installed.
"""
word = win32com.client.Dispatch("Word.Application")
word.Visible = False
input_path = Path(input_dir)
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
results = {"success": 0, "failed": 0, "errors": []}
# wdFormatXMLDocument = 12 (DOCX format)
DOCX_FORMAT = 12
for doc_file in input_path.glob("*.doc"):
output_file = output_path / (doc_file.stem + ".docx")
try:
doc = word.Documents.Open(str(doc_file.resolve()))
doc.SaveAs2(str(output_file.resolve()), FileFormat=DOCX_FORMAT)
doc.Close()
results["success"] += 1
except Exception as e:
results["failed"] += 1
results["errors"].append(f"{doc_file.name}: {str(e)}")
word.Quit()
return results
results = batch_convert_doc_docx_com(r"C:\input", r"C:\output")
print(f"Converted: {results['success']} | Failed: {results['failed']}")
Performance: ~5-15 files per minute (Word must open each file). Slower than LibreOffice for large batches.
Advantages:
For simple DOC files, the python-docx library combined with the olefile library can extract text and basic formatting without any external applications. This is the right approach if you're on a Linux server with no Word or LibreOffice available.
pip install python-docx olefile
import olefile
from docx import Document
from docx.shared import Pt
import struct
def extract_doc_text(doc_path: str) -> str:
"""
Extracts raw text from a .doc file using OLE stream parsing.
Preserves basic paragraph structure, not complex formatting.
"""
with olefile.OleFileIO(doc_path) as ole:
if ole.exists('WordDocument'):
stream = ole.openstream('WordDocument')
data = stream.read()
# WordDocument stream contains the raw text — basic extraction
# For production use, consider the 'antiword' CLI tool
text = data.decode('latin-1', errors='ignore')
# Filter to printable ASCII range (crude but works for most cases)
return ''.join(c for c in text if 32 <= ord(c) < 127 or c in '\n\r\t')
return ""
def simple_doc_to_docx(input_path: str, output_path: str):
text = extract_doc_text(input_path)
doc = Document()
for paragraph in text.split('\n'):
if paragraph.strip():
doc.add_paragraph(paragraph.strip())
doc.save(output_path)
Honest caveat: This method extracts text content but loses most formatting. Use it only if you need the text content and don't care about layout. For professional documents where formatting matters, use LibreOffice or COM automation.
If you have Word installed on Windows and prefer not to use Python, PowerShell achieves the same result as COM automation with a simpler script:
Batch-Convert-Doc-to-Docx.ps1
param(
[string]$InputDir = "C:\input",
[string]$OutputDir = "C:\output"
)
New-Item -ItemType Directory -Force -Path $OutputDir | Out-Null
$word = New-Object -ComObject Word.Application
$word.Visible = $false
$docFiles = Get-ChildItem -Path $InputDir -Filter "*.doc"
$success = 0
$failed = 0
foreach ($file in $docFiles) {
$outputPath = Join-Path $OutputDir ($file.BaseName + ".docx")
try {
$doc = $word.Documents.Open($file.FullName)
$doc.SaveAs([ref]$outputPath, [ref]12) # 12 = wdFormatXMLDocument
$doc.Close()
$success++
Write-Host "Converted: $($file.Name)"
}
catch {
$failed++
Write-Host "Failed: $($file.Name) — $($_.Exception.Message)"
}
}
$word.Quit()
Write-Host "Done. Success: $success | Failed: $failed"
Run with: powershell -ExecutionPolicy Bypass -File Batch-Convert-Doc-to-Docx.ps1 -InputDir "C:\docs" -OutputDir "C:\converted"
For teams without developer resources, several online tools support batch conversion:
| Tool | Batch limit | Privacy | Cost | |------|-------------|---------|------| | CloudConvert | 25 files/day free | Files processed server-side | Free tier / $9/mo | | Zamzar | 5 files at once | Files processed server-side | Free tier / $24/mo | | ILovePDF | 10 files per operation | Files processed server-side | Free tier / $6/mo | | Smallpdf | 2 operations/day free | Files processed server-side | Free tier / $9/mo |
Privacy consideration: All online tools upload your files to their servers. For legal documents, HR records, or other sensitive content, a local solution (LibreOffice, COM, or python-docx) is the safer choice.
For a batch of 500 DOC files, typical times:
| Method | Time | Server Required | Word License | Formatting Fidelity | |--------|------|-----------------|--------------|---------------------| | LibreOffice headless | 15-50 min | Any OS | No | Good (95%+) | | COM Automation | 35-100 min | Windows | Yes | Excellent (99%+) | | python-docx + olefile | 2-5 min | Any OS | No | Text only | | PowerShell + Word | 35-100 min | Windows | Yes | Excellent (99%+) | | Online tools | N/A at 500 scale | No | No | Good (varies) |
For most enterprise use cases, LibreOffice headless is the right default: free, cross-platform, fast, and good-enough formatting fidelity. Switch to COM automation only when you need Word-perfect output or have complex formatting requirements.
If your DOC files contain VBA macros, converting to DOCX will strip them (DOCX files support macros as DOCM only). To preserve macros:
LibreOffice: convert to DOCM instead of DOCX
libreoffice --headless --convert-to "docm:MS Word 2007 XML (Macros)" *.doc
Or with Python COM:
FileFormat 13 = DOCM (macro-enabled DOCX)
doc.SaveAs2(str(output_file.resolve()), FileFormat=13)
See our full guide on macro-safe document conversion for a deep-dive on macro preservation across formats.
COM automation handles these if you supply the password:
doc = word.Documents.Open(str(doc_file.resolve()), PasswordDocument="yourpassword")
LibreOffice requires the --infilter parameter:
libreoffice --headless --convert-to docx --infilter="Microsoft Word 2007-2019 XML" --passwd "yourpassword" file.doc
Add a recovery flag to LibreOffice:
libreoffice --headless --convert-to docx --norestore file.doc
LibreOffice is open-source, processes files locally (nothing uploaded to external servers), and is trusted by government agencies worldwide including the German federal government. It is safe for sensitive legal documents.
The DOC format supports features that DOCX doesn't (and vice versa). In practice, 95%+ of content is preserved correctly by LibreOffice. Complex custom styles, certain OLE objects, and some legacy form fields may render differently. COM automation via Word has 99%+ fidelity.
DOCM to DOCX conversion strips macros — this is by design (DOCX cannot hold macros). If you need to preserve macros, keep the output format as DOCM. See our macro preservation guide for a full breakdown.
Yes. LibreOffice headless runs without a GUI and works in CI/CD environments. Docker image: linuxserver/libreoffice or unoconv/unoconv.
If you're building a document conversion product and want a research-backed breakdown of the market, pricing, and SEO strategy for the niche — the macro-safe and legacy-format conversion wedge has the weakest competition and most validated demand we've found.
Macro-Safe Converter Launch Kit: keyword matrix, competitive landscape, pricing model, landing page copy, and programmatic SEO plan. Built for founders who want to own this niche.
Last updated: April 2026
Macro-Safe Converter preserves VBA macros through XLSM conversions. One-time kit — no subscription.
Get the Kit — $9 one-time →