Subscribe
Accepted Solution

Unicode utf8 multibyte characters

Hello,

We are running ontap 7.3.3 and are having an issue with some filenames / directories that use foreign characters. Once such a directory/file is created with an rsync, i cannot remove it anymore. I can remove a file with the rm command on the filer itself, but do not know how to remove a directory. (rmdir does not exist in ontap console)

This is my language setting at the moment :

vol status /vol/vol1 -l
         Volume Language
         vol1 C (POSIX)

The characters that are having problems are those :

--- In ontap console ---

ls /vol/vol1/     :    René

But in Linux I see this : René

Do i need to enable utf8 encoding on the volume ? Would this be enough ?

Greetings .. Richard

Re: Unicode utf8 multibyte characters

It does not look NetApp related to me. Seems that rsync creates file names in UTF-8 encoding but system where you try to read it defaults to ISO-8859-1.

In Linux where you see "garbage" try: ls /path/to/mount | iconv -f UTF-8

You should see "normal" file names.

Re: Unicode utf8 multibyte characters

No, this does not show any improvement.

ls  | iconv -f UTF-8
René

Re: Unicode utf8 multibyte characters

vol lang vol0    

Volume language is C (POSIX)

Translation Versions
    OEM Character set is ascii|cp1|Fri Oct  2 00:00:53 CEST 1998

    NFS Character set is iso-8859-1|iso-8859-1|Fri Oct  2 00:00:53 CEST 1998

vol lang vol1

Volume language is C (POSIX)

Translation Versions
    OEM Character set is ascii|cp1|Fri Oct  2 00:00:53 CEST 1998

    NFS Character set is iso-8859-1|iso-8859-1|Fri Oct  2 00:00:53 CEST 1998

Should this be utf-8 ? We only use nfsv4 for our clients.

Re: Unicode utf8 multibyte characters

Since 7.3.3 we support UTF-8 for volumes mounted over NFSv4. You change the volumes to UTF-8.

Re: Unicode utf8 multibyte characters

Yes, I get the same effect when my terminal emulation (in this case, PuTTY) is set to use translation.

So I created file name René. Then I set PuTTY to use UTF-8 translation. I can correctly display this filename:

sles10:~/tst # ls

René

sles10:~/tst # ls | xxd

0000000: 5265 6ec3 a90a Ren...

sles10:~/tst # ls | iconv -f UTF-8

René

Now I changed PuTTY to use ISO-8859-15 for translation (meaning – PuTTY assumes remote system is using this character set). I get the same result as yours:

sles10:~/tst # ls

René

sles10:~/tst # ls | iconv -f UTF-8

René

In both cases iconv does not actually do any conversion because system already is set to use UTF-8 by default.

I would check settings for whatever terminal program you are using. If directly on Linux console, it could also be set up incorrectly.

Re: Unicode utf8 multibyte characters

Well, it looks like you are right.

If I change the putty translation parameters, i can also reproduce it. But i also found the solution.

It seems when i mount the nfs volume without parameters, it works like you said.

But when i use parameters, i cannot get it to work.

I will show :

-------

path : /vol/vol1/homes (qtree)

When I go into this mountpoint :

server:/vol/vol1/homes on /mnt/nfs type nfs4 (rw,user=root,nosuid,nodev,sec=krb5,clientaddr=1.1.1.1,addr=1.1.1.1)

?????????? ? ? ? ?                ? rené
?????????? ? ? ? ?                ? René

When I go into this mountpoint :

server:/vol/vol1 on /mnt/server type nfs (rw,sec=krb5,addr=1.1.1.1)

-rw-r--r-- 1 root root    0 2010-08-03 20:09 rené
drwx------ 2 root root 4096 2010-07-28 15:59 René

------

It looks like the way you mount the volume, this makes a difference. I will look into this, but i have a solution.

Greetings.

Re: Unicode utf8 multibyte characters

I think it is the nodev mount options.

man mount :

nodev  Do  not interpret character or block special devices on the file system.

Re: Unicode utf8 multibyte characters

I think it is the nodev mount options.

No. The filesystem that works is mounted using NFSv3; the filesystem that does not work is mounted using NFSv4. NFSv4 mandates UTF-8 for all protocol strings which means, both server and client have to translate file names from local character set to UTF-8. Unfortunately I could not find any clear (actually, any at all) explanation how vol lang option on NetApp interacts with NFSv4 (anyone from NetApp to chime in here? Please ... ) but I believe this is your problem - in your case vol lang is set to plain en while it should be set to indicate your clients are actually using UTF-8.

Beware, changing language after files have been created may result in loss of file access.

Re: Unicode utf8 multibyte characters

You are right again.

I have nfs3 and 4 mounts.

Well, you can set the language for the volume with the "vol lang" command.

I am going to test this.

I will keep you up to date.

But any helpfull answers are welcome.

Greetings