Clean up all the things
From Gender and Tech Resources
Revision as of 20:41, 17 July 2015 by Lilith2 (Talk | contribs) (→Permanently delete files (including data in RAM or swap))
Granny bit her lip. She was never quite certain about children, thinking of them-when she thought about them at all-as coming somewhere between animals and people. She understood babies. You put milk in one end and kept the other as clean as possible. Adults were even easier, because they did the feeding and cleaning themselves. But in between was a world of experience that she had never really inquired about. As far as she was aware, you just tried to stop them catching anything fatal and hoped that it would all turn out all right.” ― Terry Pratchett, Equal Rites
Contents
The dirt on it
Metadata
Metadata is data about data.
To put it bluntly, metadata is hidden data that can fuck you over. Fuck you over real hard and rough like, savvy? Often defined as "data about data," metadata is information about a specific file that’s often included within the file itself but that’s often not readily visible or modifiable to the end-user when z is viewing the file in the standard application that z would typically use to view the file. In other words, metadata provides background information about a file. Chances are that every document you create, every digital photograph you take, every music file you download, and so on, all have little bits of metadata which can leak vital information about your identity. ~ The dangers of metadata, 2008[1]
Metadata is collected by corporations for psychological manipulation -- persuasion or advertising.
Metadata also plays a number of important roles in computer forensics:
- It can provide corroborating information about the document data itself.
- It can reveal information that someone tried to hide, delete, or obscure.
- It can be used to automatically correlate documents from different sources.
And the Snowden leaks (see timeline masters of the internet) revealed a massive surveillance program including interception of email and other internet communications and phone call tapping. Some of it appears illegal, while other documents show the US spying on friendly nations during various international summits, and on its citizens. The programs are enabled by two US laws, the Patriot Act and the FISA Amendments Act (FAA), and a side dish called Executive Order 12.333.
Upstream collection, Hemisphere and XKeyScore by way of wealthycluster2 gobble up our metadata, and with interconnected systems such as by ICReach that data can be shared and associated with other data. There are dozens of clever analyses you can perform with such linked databases. I'm sure that is what they're doing right now. If I can think of it, so did they.
And it is not only the NSA and the other agencies from the five-eyes countries, these techniques are being used by many countries to intimidate and control their populations.
Headers
When your browser requests a page from a web server, the browser sends information about itself along with the request in value strings. These headers include values to indicate browser type (Firefox, Opera, Mozilla, etc.), browser version, and underlying platform (Windows XP, Linux, Mac OS X, etc.). See http://www.ericgiguere.com/tools/http-header-viewer.html (javascript) for the data your browser sends. The web server then uses this information to select an appropriate page format for the browser, since different browsers (and even different versions of the same browser) have varying incompatibilities in their support for HTML and JavaScript.
Sometimes the web server misinterprets or fails to recognize this information and sends you an incorrectly formatted page. The server may even deny you access to its pages, whether it's for political reasons or you're using a browser that the site disapproves of or because its pages have only been tested for use with specific browser versions. The solution is to fool the server by having your browser masquerade as another browser.
E-mail messages also contain a header containing the IP address of the sender as well as other information that may get attached to the header along the way, such as spam ratings that anti-spam software running on your e-mail server may apply to the message and other information added by the server. E-mail clients use this information to help identify spam messages [2].
According to RFC 821, an e-mail client is to send its domain name in the Helo/EHLO command http://www.samlogic.net/articles/smtp-commands-reference.htm which includes IP address. Masquerading ensues.
Removing metadata
Images
photos contain hidden information, including the GPS coordinates of the location they were taken at, the date and time, camera shutter setting details, and possibly even the name of the program you used to edit them. This type of metadata can be useful, but you may want to remove it from your photos before sharing them online.
exiftool
jhead
imagemagick
exiv2
mat
MAT is a toolbox composed of a GUI application, a CLI application and a library, to anonymise/remove metadata https://mat.boum.org/.
Documents
Document metadata is information about one or more aspects of a document, spreadsheet, pdf file, that is not always visible to the person creating them, but can be found by the person who receives them next. Comments, track changes, hidden text, markups, properties, attachments and bookmarks are all examples of document metadata. Metadata removal software identifies and removes the metadata contained within a document so it cannot be shared.
hexedit
- Be sure to backup your data before operating any hexedit code.
- In general, switch to ASCII mode, turn off “read only” mode, and start searching through the file. For navigation and commands see http://linux.die.net/man/1/hexedit.
For example, when scrubbing pdf’s look for "created" (metadata appears in the PDF file more than once). If and when you find metadata, change to fake data or delete. Then repeat your search again for the terms "create", "creation", "modified", and "modify", and similarly either replace or delete the dates, once again being sure to repeat each search so that any potential multiple instances of the field can be located and modified or blanked out.
vi
You can also use vi as a hex editor. It isn’t a real "hex mode", what happens is that vi’s buffer is streamed through the external program xxd
, but it works well for some cases of scrubbing.
Open a file in vi as usual, hit escape and to switch into hex mode type:
:%!xxd
And when your done, to exit from hex mode, hit escape again and type
:%!xxd -r
pdftk and sed
For pdftk commands see http://linux.die.net/man/1/pdftk.
To look at the metadata that Adobe Reader does not show by default:
$ pdftk filename.pdf dump_data
To alter the metadata first put the metadata in a file:
$ pdftk filename.pdf dump_data output pdf-metadata
Open the pdf-metadata file and remove the data you wish scrubbed:
Save the pdf-metadata file. Now you can use that data to scrub the metadata from your file:
$ pdftk filename.pdf update_info pdf-metadata output filename-no-metadata.pdf
And check the result with:
$ pdftk filename-no-metadata.pdf dump_data
The creation date is gone and a new modification date has appeared. And the iText gives away the use of pdftk. These infokeys can be removed with sed:
$ sed -i 's/iText\ 2\.1\.7\ by\ 1T3XT//;s/D:20120409144213+02'\00'\//' filename-no-metadata.pdf
The \
are for breaking out of the single quoted string then escaping the single quote.
PdfID0
and PdfID1
are file identifiers. They are an md5 of various info about the file so that it has a unique string to identify the doc without having to use the filename. If you want to scrub those too, use sed as above. There’s no geeky tricks needed for cleaning those two from the metadata with sed, it’s pretty straightforward.
mat
Once any metadata is detected by mat, "State" will be marked as "Dirty". You can double click the file to see detected metadata and click on the "Clean" button to empty all private metadata fields from the file.
You can also run mat from the command-line. Without options, the default action is to remove metadata from files.
To check files in a directory and its subdirectories:
$ mat -c .
To check metadata detected:
$ mat -d
To clean up all files and stores original files as '*.bak" files:
$ mat -b .
Masquerading
Browser
opera
The User-Agent Switcher extension adds a toolbar button and a menu to switch between predefined user-agents: https://addons.opera.com/en/extensions/details/user-agent-switcher/
You can add your own user-agents, and even mimic being a webspider. For a list of convincing user-agents see http://www.user-agents.org/index.shtml
chromium
A User-Agent Switcher is also available for Chromium: https://chrome.google.com/webstore/detail/user-agent-switcher-for-c/djflhoibgkdhkhhcedjiklpkjnoahfmg
firefox
A User-Agent Switcher for Firefox: https://addons.mozilla.org/en-US/firefox/addon/user-agent-switcher/
thunderbird
To view the header information in a Thunderbird e-mail message, select the message, then click on the View menu and select Headers > All. The header information for the message will replace the message in the Thunderbird window.
- The
Return-path
is allegedly the e-mail address of the sender, although that is not a reliable method for identifying the sender because most spammers use any return e-mail address that they can find on spammer’s lists. - The
Received
line is a bit more reliable, because that contains the IP address of the location from where the spam message was sent. That is, of course, unless the spammer hacked into an e-mail server or is using a relay server to disguise the true source of the message.
Technically the IP address value in the hello string is not relevant for sending/receiving the mail, but because it might be used for spam scoring or simply out of courtesy I recommend entering a valid IP/hostname, like 127.0.0.1.
To change it globally (for all accounts) in Thunderbird Edit > Preferences > Advanced > Config Editor:
Create (or edit) the entry named "mail.smtpserver.default.hello_argument" (If you need to create it, use right-click > New > String):
Change the value to the desired IP or hostname (FQDN):
To change it per SMTP server create (or edit) the entry named mail.smtpserver.smtp<number>.hello_argument
where <number> is the ID for the SMTP server you would like to apply the setting to. Type "mail.smtpserver.smtp" in the search box up top in the Thunderbird configuration editor to see which ones are available and which ID they have. If you need to create the entry, use right-click > New > String.
TorBirdy is a plugin for Thunderbird. It tries to anonymize your connection (you need to have tor installed, see installing and configuring tor) and deletes and changes several information fields: https://trac.torproject.org/projects/tor/wiki/torbirdy/changes. When not installing tor and torbirdy, the list has some excellent examples of other strings you may wish to change in the Config Editor.
TorBirdy enforces the preferences it sets and attempts to change them using Thunderbird's settings or the configuration editor will not work as all such changes will be discarded when Thunderbird restarts. This is because the tor project believes that these preferences should not be changed, whether deliberately, by mistake, or due to another extension, as doing so can compromise your anonymity. There are however some preferences that can be changed and they can be accessed through TorBirdy's preferences dialog. Please note that if you are not an advanced user, you should NOT change any setting unless you are very sure of what you are doing. The preferences that TorBirdy changes are restored to their original values when it is uninstalled or disabled.
- Read more in Before using TorBirdy https://trac.torproject.org/projects/tor/wiki/torbirdy#BeforeusingTorBirdy.
- Also note the known issues https://trac.torproject.org/projects/tor/wiki/torbirdy#KnownTorBirdyIssues.
mutt
Mutt doesn't really speak SMTP, see the mutt mail concept http://dev.mutt.org/trac/wiki/MailConcept, and although a minimal SMTP is provided these days, the preferred way is still sending via a Mail Transfer Agent (MTA) such as sendmail, exim or postfix.
sendmail
exim
postfix
Freeing up disk space
apt
command-line
To delete downloaded packages (.deb) already installed (and no longer needed):
$ sudo apt-get clean
To remove all stored archives in your cache for packages that can not be downloaded anymore (thus packages that are no longer in the repository or that have a newer version in the repository):
$ sudo apt-get autoclean
To remove unnecessary packages (After uninstalling an app there could be packages you don't need anymore):
$ sudo apt-get autoremove
To delete old kernel versions:
$ sudo apt-get remove --purge linux-image-X.X.XX-XX-generic
If you don't know which kernel version to remove:
$ dpkg --get-selections | grep linux-image
bleachbit
Shredding files and deleting data
Even when you erase everything on your hard disk, sometimes it is possible to recover (pieces of) data with forensics software and/or hardware. If that data is confidential, delete files and data securely so that no-one will recover them. Solid State Drives (SSD) may have introduced dramatic changes to the principles of computer forensics ...
When encrypting and compressing files, clear-text versions that existed before you compress/encrypt the file or clear-text copies that are created after you decrypt/decompress it remain on your hard drive. There may also be "temp" files left behind. Unless you purge — not just delete — those clear-text files.
Echoes of your personal data — swap files, temp files, hibernation files, erased files, browser artifacts, etc — are likely to remain on any computer that you use to access (encrypted) data. It is a trivial task to extract those echos. A hidden access trap. Purge – not just delete – echoes.
Shredding files
shred
Linux, FreeBSD and many other *nix systems come with a command line tool called shred installed. The shred command can be useful for destroying files so that its contents are very difficult to recover, even using high-sensitivity data recovery equipment. It repeatedly overwrites the data and the associated file or device names with random data. When used without options, shred will overwrite given files or devices 25 times. A device can be a partition or an entire HDD, USB key drive, etc.
$ shred [option(s)] file(s)_or_devices(s)
For example
$ shred filename1 filename2
will shred both files, and
$ shred /dev/hda4
will shred the fourth partition on the first HDD.
By default, shred does not delete files or partitions after overwriting them. Overwritten files can be deleted by using the -u option.
$ shred -u filename1 filename2
This both frees up the disk space for other data and makes it harder to reconstruct the shredded data.
Shred relies on the assumption that the filesystem overwrites data in place. But journal filesystems like Ext3 and ReiserFS, RAID-based filesystems, compressed filesystems, and filesystems that cache data in temporary locations do not satisfy this assumption. Plus that copies of files can be retained in filesystem backups and on remote mirrors. Shredding partitions is therefor more reliable than shredding files.
And even when shredding partitions, most HDDs map out bad sectors invisibly to application programs and utilities, and that includes shred. Sensitive data in such sectors will not be destroyed by shred.
Making deleted data hard to recover
dd
A hack that might work is to write zeroes or random data to a file on the drive until it fills up all of the available space, then delete it:
$ dd if=/dev/urandom of=/path/filename1
Then delete:
$ rm /path/filename1
This also works on partitions:
$ dd if=/dev/urandom of=/dev/sda4
Then delete
$ fdisk /dev/sda4 Command (m for help): d Partition number (1-4): 4
Permanently delete files (including data in RAM or swap)
secure-delete tools
The Secure-Delete package comes with four commands:
srm Secure remove; used for deleting files or directories currently on your hard disk; smem Secure memory wiper; used to wipe traces of data from your computer’s memory (RAM); sfill Secure free space wiper; used to wipe all traces of data from the free space on your disk; sswap Secure swap wiper; used to wipe all traces of data from your swap partition.
srm
(secure remove) is a more advanced version of the “shred” command. It uses a combination of random data, zeros, and special values developed by cryptographer Peter Gutmann to make files irrecoverable. The shred
tool allows you to specify the number of passes and the secure-delete
tools use a default of 38 passes. It will also assign a random value for the filename, hiding that key piece of evidence:
$ srm filename
Removing a directory and all its subdirectories (recursive):
$ srm -r directory/
smem
(secure memory wipe) removes residual traces of data that remain in memory. It is relatively easy for someone with the right tools to figure out what you had stored in RAM, which may be the contents of important files, internet activity, or whatever else it is you do with your computer. smem
is slow. There are options to speed things up, but they increase risk by performing fewer overwrite passes.
Invoke with:
$ smem
sfill
(secure free space wipe) wipes all the free space on your disk where past files have existed. This is particularly useful if you are getting rid of a hard disk for good; you can boot a LiveCD, delete everything on the disk, and then use sfill to make sure that nothing is recoverable (as root):
# sfill mountpoint/
NOTE: If you have /home/
on a separate partition and you try /home/hilarious/mistake
as mountpoint, sfill
will happily wipe the freespace on which the mistake
directory resides (the entire /home/
partition).
sswap
(secure swap wipe) wipes swap partitions. Swap partitions store data of running programs when RAM is filled up.
Find your mounted swap devices by running:
$ cat /proc/swaps
Or look in your /etc/fstab
file for filesystems of type swap
. It can be /dev/sda5
or /dev/dm-1
, etc.
Disable the swap partition:
$ sudo swapoff /dev/sda6
Wipe:
$ sudo sswap /dev/sda6
Re-enable swap:
$ sudo swapon /dev/sda6
bleachbit
Removing malware
And then of course, there is the possibility of people having visited without explicit invitation, without explicit consent, that may have left things lying about in odd places. And burglars leaving a payload or two to maintain access for continued pillaging and plundering of your private space. Or you may have downloaded something nasty somewhere.
Image exploit cleaning
PDF exploit cleaning
pdf2ps and ps2pdf
pdftops
I think my machine is infected. Now what?
Related
References
- ↑ The dangers of metadata, 2008 http://www.textfiles.com/uploads/diz-usp3.txt
- ↑ Spam Filtering for Mail Exchangers http://www.tldp.org/HOWTO/Spam-Filtering-for-MX/techniques.html