Latest Tweets

utf-8-py: a script that fixes ownCloud non-UTF8 filenames issues

The utf8.py script

Some months ago I had to face an annoying issue that affected the ownCloud client during the folder-synchronization process. As a result of that I wrote a trivial python script that helped me fix rename the non-UTF8 filenames using the UTF-8 encoding. Today I had to deal with the very same issue, so I decided to add some functionality to the original script I wrote.

Pre-requisites

This script has been written in Python 2.7. This is what you will need in order to execute the script:

  • Python 2.7.
  • The conmv utility. (# apt-get install convmv).
  • The Python Chardet module (# apt-get install python-chardet).
  • The script itself, utf8.py.

Using the script

./utf8.py -d PATH [-t THRESHOLD][-l LOG][-r ]

-d PATH:

The directory to analyse and, if the -r flag is given, to fix (i.e., all the files and directories inside the PATH directory will be renamed according to the UTF-8 encoding standard).

-t THRESHOLD

The Chardet module has a value called “confidence”.  This value offers a quantized factor for any particular detected charset. By using the -t flag, one can set the minimal value for confidence that a particular detected charset must match before attempting to rename the file or directory using UTF-8. This is a numerical value in the range [0..1]. Default value: 0.8.

-l LOG

By default, the script will create a logfile in the same directory where it is executed called utf8-log.txt. Passing this flag, one can choose where the logfile should be and its name.

-r

By default, the execution of the utf8.py script is a dry-run; i.e., the files and directoris of PATH will not be renamed. Therefore, by passing the script this flag, the files and directories inside PATH will be renamed.

Examples

This command will generate a log file under /tmp/analysis.log for the directory /home/data, detecting any non-UTF8 charset with a default confidence of 0.8. No file or directory renaming will take place, so the directory /home/data will remain unchanged:

./utf8.py -d /home/data -l /tmp/analysis.log

This command will rename any file and directory under /home/data that has a minimal value of 0.95 for confidence, the rest will not be renamed:

./utf8.py -d /home/data -l /tmp/renamed.log -t 0.95 -r

Download the script

You can get the latest version for this script right here.