You hit save on a report, attach the file to an email, and send it off. What you might not realize is that the file carries a digital footprint of exactly who wrote it, how long they spent editing it, and even the names of previous drafts. This is especially true if you use open-source office suites like LibreOffice or Apache OpenOffice, which are built on the OpenDocument Format (ODF). While these tools offer great freedom, their default settings often embed more personal data than most users intend to share.
Cleaning this metadata isn't just about being tidy; it’s about protecting your privacy and complying with regulations like GDPR. Whether you are sending a contract to a client, submitting a resume, or publishing a public report, hidden fields can reveal sensitive context. The good news is that scrubbing an ODF file doesn't require expensive enterprise software. You can do it manually through the application menus, use browser-based tools for a quick fix, or dive into the XML structure for total control.
What Is Inside an ODF File?
To understand what you are deleting, you need to know where the data lives. An OpenDocument file-whether it is a .odt text document, .ods spreadsheet, or .odp presentation-is technically a ZIP archive containing several XML files. When you unzip one of these files, you will find specific components responsible for storing metadata.
- meta.xml: This is the primary hub for document properties. It stores the author's name (
dc:creator), the initial creator, creation dates, editing duration, and revision cycles. - content.xml: While mostly holding the visible text, this file also contains tracked changes, comments, and annotations, which include author names and timestamps.
- settings.xml: Stores application-specific settings and user preferences applied to the document.
- manifest.rdf: Handles resource description framework (RDF) metadata, which can sometimes contain additional identity links.
When people talk about "cleaning" a file, they usually mean stripping out the PII (Personally Identifiable Information) from meta.xml and removing the edit history from content.xml. If you leave these alone, anyone with access to the file can inspect them and see details you thought were private.
Manual Cleaning via LibreOffice and OpenOffice Menus
The most accessible way to remove metadata is using the built-in features of the office suite itself. Both LibreOffice and Apache OpenOffice provide similar dialogs, though the exact wording may vary slightly depending on your version.
Start by opening your document and navigating to File > Properties. This dialog box is divided into several tabs, each handling different types of data.
- The General Tab: Here you will see statistics like "Total Editing Time" and "Revision Number." Click the Reset button to zero out these counters. Crucially, look for a checkbox labeled Apply User Data. Unchecking this ensures that the document does not pull the current user's profile information (like your name and initials) when saved next time.
- The Description Tab: Clear out any text in fields such as Title, Subject, Keywords, and Comments. These are often filled in automatically or left over from templates.
- The Custom Properties Tab: This is a common hiding spot for sensitive info. Templates or add-ins might inject custom fields here, such as project codes, client IDs, or internal reference numbers. Delete any entries you don't want to share.
After adjusting these settings, click OK. However, changing properties is only half the battle. You must also address structural traces like comments and tracked changes, which reside in the content stream rather than the property sheet.
Removing Hidden Structural Data
Metadata isn't limited to the property sheet. If you have collaborated on a document, it likely contains layers of hidden interaction data that survive a simple property reset.
Tracked Changes and Comments: In Writer or Calc, go to Edit > Changes > Accept or Reject. Review all pending changes. If you accept or reject them, the record of *who* made the change remains in the history unless you explicitly clear it. In some versions, accepting all changes removes the markup but keeps the log. To be safe, ensure no residual annotations remain. Similarly, check for comments by right-clicking any comment indicator and selecting Delete Comment. Comments store the author's name and timestamp directly in the XML.
Hidden Content: Authors often hide paragraphs or sheets during drafting. In Writer, enable View > Hidden Paragraphs to see if any confidential notes are lurking. In Calc, check Format > Sheet > Hide/Show to ensure no hidden sheets containing raw data or formulas are attached to the file. These elements travel with the document even if they aren't visible on screen.
Versions: LibreOffice has a feature called Versions (found under File > Versions). This saves snapshots of the document inside the main file. If you have used this feature, older versions containing earlier drafts or deleted sections are embedded in the package. Open the Versions dialog and delete any stored backups before sharing the file.
Preventing Metadata Accumulation
Cleaning a file after the fact is reactive. A better approach is to stop the metadata from accumulating in the first place. You can configure your office suite to minimize the data it writes to every document.
Go to Tools > Options > LibreOffice > User Data (or Tools > Options > OpenOffice.org > User Data). Here you can define the default author name, initials, and company. For maximum privacy, you can leave these fields blank or fill them with generic pseudonyms. Any new document created will inherit these values instead of your real identity.
Additionally, check the security settings. Under Tools > Options > LibreOffice > Security > Options, look for a setting labeled Remove personal information on saving. Enabling this option instructs the software to strip user names from changes and comments and reset editing statistics every time you save. Note that this setting anonymizes the data but does not always delete the structural objects themselves (like the comment box), so a manual inspection is still recommended for high-stakes documents.
Automated and Browser-Based Solutions
If you are dealing with hundreds of files, or if you simply want a faster workflow without navigating multiple menus, automated tools offer a significant advantage. Batch processing scripts using Python libraries like odfpy can programmatically strip metadata, but they require technical setup.
For most users, a browser-based tool provides the best balance of speed and privacy. Unlike desktop software that requires installation, online utilities run directly in your web browser. The critical factor here is ensuring the tool processes files locally. You want a solution where the document never leaves your device, meaning no data is uploaded to a remote server for processing.
For instance, Vaulternal's Metadata Remover operates entirely client-side using WebAssembly. This means the heavy lifting of unzipping the ODF package, parsing the XML, and rewriting the files happens within your browser's memory. This architecture is particularly useful for legal professionals, consultants, or journalists who handle sensitive drafts and cannot risk uploading confidential files to third-party servers. It supports ODT, ODS, and ODP formats alongside standard Office files, allowing you to strip core properties, custom fields, and optionally tracked changes in a single step.
Advanced Manual Cleaning via ZIP Extraction
For power users who need absolute certainty that every byte of metadata is gone, manual editing of the ODF package is the gold standard. Since an ODF file is just a ZIP archive, you can manipulate it directly.
- Rename your file extension from
.odtto.zip. - Extract the contents to a folder.
- Open
meta.xmlin a plain text editor. Look for tags like<dc:creator>,<meta:initial-creator>, and<meta:editing-duration>. Delete the content inside these tags or remove the tags entirely. - Open
content.xml. Search foroffice:annotation(comments) andtext:s/text:deleted(tracked changes). Remove these blocks carefully. - Re-zip the files, ensuring the root level contains the XML files, not a subfolder.
- Rename the extension back to
.odt.
Warning: This method is risky. If you break the XML syntax or alter the manifest checksums, the document may become corrupted or fail to open. Always keep a backup copy before attempting this. Furthermore, if the document was digitally signed, modifying the internal XML will invalidate the signature.
Comparison of Cleaning Methods
| Method | Effort Level | Thoroughness | Best For |
|---|---|---|---|
| GUI Properties Dialog | Low | Medium (misses comments/changes) | Quick individual files |
| Browser-Based Tool | Very Low | High (configurable) | Non-technical users, batch needs |
| Manual ZIP/XML Edit | High | Maximum | Developers, forensic sanitization |
| Python Scripts (odfpy) | Medium (setup required) | High | Automated enterprise workflows |
Frequently Asked Questions
Does resetting properties in LibreOffice remove all metadata?
No. Resetting properties via File > Properties clears general statistics like editing time and revision numbers, and allows you to clear description fields. However, it does not automatically remove tracked changes, comments, hidden paragraphs, or embedded versions. These structural elements must be addressed separately to ensure complete privacy.
Is it safe to use online tools to clean ODF metadata?
It depends on the tool. Many online converters upload your file to a remote server, process it, and send it back, which poses a privacy risk for confidential documents. To stay safe, use client-side tools that process files locally in your browser using JavaScript or WebAssembly. Verify this by checking your browser's network tab to ensure no files are uploaded during the cleaning process.
Can I prevent LibreOffice from adding my name to new documents?
Yes. Go to Tools > Options > LibreOffice > User Data and leave the name and initials fields blank, or enter generic pseudonyms. Additionally, enabling "Remove personal information on saving" in the Security options will help strip user identifiers from future saves, although manual checks are still recommended for external distribution.
What is the difference between meta.xml and content.xml in an ODF file?
meta.xml stores document-level properties such as the author, creation date, and keywords. content.xml holds the actual body of the document, including text, images, and importantly, structural annotations like comments and tracked changes. Effective cleaning requires addressing both files.
Do tracked changes disappear if I accept them?
Accepting tracked changes applies the edits to the text, but the history of who made those changes and when can remain in the document's XML structure. To fully sanitize a document, you should either accept/reject all changes and then verify no residual metadata remains, or use a dedicated cleaning tool that strips annotation data explicitly.