Network and Storage Protocols

Unicode utf8 multibyte characters

rsmits1074
16,801 Views

Hello,

We are running ontap 7.3.3 and are having an issue with some filenames / directories that use foreign characters. Once such a directory/file is created with an rsync, i cannot remove it anymore. I can remove a file with the rm command on the filer itself, but do not know how to remove a directory. (rmdir does not exist in ontap console)

This is my language setting at the moment :

vol status /vol/vol1 -l
         Volume Language
         vol1 C (POSIX)

The characters that are having problems are those :

--- In ontap console ---

ls /vol/vol1/     :    René

But in Linux I see this : René

Do i need to enable utf8 encoding on the volume ? Would this be enough ?

Greetings .. Richard

1 ACCEPTED SOLUTION

aborzenkov
16,801 Views

Yes, I get the same effect when my terminal emulation (in this case, PuTTY) is set to use translation.

So I created file name René. Then I set PuTTY to use UTF-8 translation. I can correctly display this filename:

sles10:~/tst # ls

René

sles10:~/tst # ls | xxd

0000000: 5265 6ec3 a90a Ren...

sles10:~/tst # ls | iconv -f UTF-8

René

Now I changed PuTTY to use ISO-8859-15 for translation (meaning – PuTTY assumes remote system is using this character set). I get the same result as yours:

sles10:~/tst # ls

René

sles10:~/tst # ls | iconv -f UTF-8

René

In both cases iconv does not actually do any conversion because system already is set to use UTF-8 by default.

I would check settings for whatever terminal program you are using. If directly on Linux console, it could also be set up incorrectly.

View solution in original post

9 REPLIES 9

aborzenkov
16,802 Views

It does not look NetApp related to me. Seems that rsync creates file names in UTF-8 encoding but system where you try to read it defaults to ISO-8859-1.

In Linux where you see "garbage" try: ls /path/to/mount | iconv -f UTF-8

You should see "normal" file names.

rsmits1074
16,802 Views

No, this does not show any improvement.

ls  | iconv -f UTF-8
René

aborzenkov
16,802 Views

Yes, I get the same effect when my terminal emulation (in this case, PuTTY) is set to use translation.

So I created file name René. Then I set PuTTY to use UTF-8 translation. I can correctly display this filename:

sles10:~/tst # ls

René

sles10:~/tst # ls | xxd

0000000: 5265 6ec3 a90a Ren...

sles10:~/tst # ls | iconv -f UTF-8

René

Now I changed PuTTY to use ISO-8859-15 for translation (meaning – PuTTY assumes remote system is using this character set). I get the same result as yours:

sles10:~/tst # ls

René

sles10:~/tst # ls | iconv -f UTF-8

René

In both cases iconv does not actually do any conversion because system already is set to use UTF-8 by default.

I would check settings for whatever terminal program you are using. If directly on Linux console, it could also be set up incorrectly.

rsmits1074
16,802 Views

Well, it looks like you are right.

If I change the putty translation parameters, i can also reproduce it. But i also found the solution.

It seems when i mount the nfs volume without parameters, it works like you said.

But when i use parameters, i cannot get it to work.

I will show :

-------

path : /vol/vol1/homes (qtree)

When I go into this mountpoint :

server:/vol/vol1/homes on /mnt/nfs type nfs4 (rw,user=root,nosuid,nodev,sec=krb5,clientaddr=1.1.1.1,addr=1.1.1.1)

?????????? ? ? ? ?                ? rené
?????????? ? ? ? ?                ? René

When I go into this mountpoint :

server:/vol/vol1 on /mnt/server type nfs (rw,sec=krb5,addr=1.1.1.1)

-rw-r--r-- 1 root root    0 2010-08-03 20:09 rené
drwx------ 2 root root 4096 2010-07-28 15:59 René

------

It looks like the way you mount the volume, this makes a difference. I will look into this, but i have a solution.

Greetings.

rsmits1074
16,802 Views

I think it is the nodev mount options.

man mount :

nodev  Do  not interpret character or block special devices on the file system.

aborzenkov
16,802 Views

I think it is the nodev mount options.

No. The filesystem that works is mounted using NFSv3; the filesystem that does not work is mounted using NFSv4. NFSv4 mandates UTF-8 for all protocol strings which means, both server and client have to translate file names from local character set to UTF-8. Unfortunately I could not find any clear (actually, any at all) explanation how vol lang option on NetApp interacts with NFSv4 (anyone from NetApp to chime in here? Please ... ) but I believe this is your problem - in your case vol lang is set to plain en while it should be set to indicate your clients are actually using UTF-8.

Beware, changing language after files have been created may result in loss of file access.

rsmits1074
16,802 Views

You are right again.

I have nfs3 and 4 mounts.

Well, you can set the language for the volume with the "vol lang" command.

I am going to test this.

I will keep you up to date.

But any helpfull answers are welcome.

Greetings

rsmits1074
16,801 Views

vol lang vol0    

Volume language is C (POSIX)

Translation Versions
    OEM Character set is ascii|cp1|Fri Oct  2 00:00:53 CEST 1998

    NFS Character set is iso-8859-1|iso-8859-1|Fri Oct  2 00:00:53 CEST 1998

vol lang vol1

Volume language is C (POSIX)

Translation Versions
    OEM Character set is ascii|cp1|Fri Oct  2 00:00:53 CEST 1998

    NFS Character set is iso-8859-1|iso-8859-1|Fri Oct  2 00:00:53 CEST 1998

Should this be utf-8 ? We only use nfsv4 for our clients.

bikash
16,801 Views

Since 7.3.3 we support UTF-8 for volumes mounted over NFSv4. You change the volumes to UTF-8.

Public