ONTAP Language / Unicode (UTF-8/UTF-16) Questions

markkulacz · ‎2011-03-02

Questions regarding ONTAP handling of different unicode encodings with different file protocols.

Dont ask why Im looking for this info. Sometimes my curiosity drags me down these paths. Since installing NetApp systems is my job, I figure it cant hurt to understand the impact of the configuration settings I chose. This post was also placed on the support community.

If CIFS (using SMB 2.0) is UTF-16, then is ONTAP writing directory and file names using UTF-16?

If NFS v4 is UTF-8, does ONTAP use UTF-8 for writing directory and file names from NFS v4?

When a file name is “converted” to UTF-X on CIFS access, it is suggested this is a "one time" conversion, and the name of file also converted on disk. Is this correct?

If CIFS is using UTF-16, why are the ONTAP volume language codes specified as “en.UTF-8”? If the file and directory names on disk are UTF-8, then does ONTAP always translate UTF-8 to UTF-16 when communicating with a CIFS client?

For environments that are predominantly (or exclusively) NFS, and include many NFS v2/v3, is it ill-advised to turn the convert_ucode option on? It would mean excessive conversion.

In the statement (which was found on the web) – “Data ONTAP converts a directory format from pre-Data ONTAP 4.0 to Unicode”, what is the “pre-Data ONTAP 4.0 format”?

Does the wafl.create_ucode option override the Volume-level create_ucode setting?

Does the wafl.create_ucode option apply to just directories (as suggested in the one doc I found on it), or files and directories?

The NetApp documentation says that when an iSCSI license is added, the volume "create_ucode" option is set to true.

When an iSCSI license is added, Since each volume has a create_ucode, does this mean each volume is updated? Does the system (wafl) create_ucode option also get updated?

PARTIAL ANSWER - I just did a quick test, actually the wafl.create_ucode remains OFF when a iSCSI licesne is added.

markkulacz · ‎2011-04-15

With ONTAP 7.3.5.1 (on a FAS320 install), and using the 8.0 simulator, I have observed that if you license either FCP or iSCSI, the vol0 options for create_ucode and convert_ucode are both set to "off" before the license add, and remain "off" after the license add. This does not align with what is written in NetApp documentation.

For example, the following page "How Data ONTAP Implements a FIbre Channel SAN" (http://now.netapp.com/NOW/knowledge/docs/ontap/rel727/html/ontap/bsag/3ov-f2.htm) states the following -

How Data ONTAP Implements a FIbre Channel SAN

How Data ONTAP supports FCP with clustered systems

Enabled options for cluster configurations

Clustered storage systems in a Fibre Channel SAN require that the following options are enabled to guarantee that takeover and giveback occur quickly enough so that they do not interfere with host requests to the LUNs. These options are automatically enabled (value set to on) when the FCP service is turned on. Do not change them.

vol options create_ucode
cf.takeover.on_panic

After enabling the iscsi service and/or enabling FCP, the options still remain "off".

Since the values of create_ucode and covnert_ucode are inherited from vol0, all new volumes created also have a create_ucode and a convert_ucode of "off".

Since NetApp has not described anywhere WHY this is best practice, its not clear what the impact of it is. Will something not work? WIll this simply lead to a potential performance problem? Must the LUN bes destroyed and re-created if the LUN was initially created on a volume with these options set to "off"? If the LUNs are never accessed with CIFS or NFS, does the LUN remain in a non-unicode format?

The following page on Guides for creating volumes that contain LUNs (http://now.netapp.com/NOW/knowledge/docs/ontap/rel707/html/ontap/bsagfcp/c2cr-f9.htm) provides a couple of additional pieces of information, but it still falls short of explaining WHY this is a requirement.

Guides for creating volumes that contain LUNs

If multiple hosts share the same volume, create a qtree on the volume to store all LUNs for the same host.
Ensure that the volume option create_ucode is enabled.

Data ONTAP requires that the path of a volume or qtree containing a LUN is in the Unicode format. This option is On by default when you create a volume, but it is important to verify that any existing volumes still have this option enabled before creating LUNs in them.

For detailed procedures, see Verifying and modifying the volume option create_ucode.

The character format of the metadata of a LUN object shouldnt matter for iSCSI or FCP access to the LUN. All iSCSI and FCP hosts see is a block storage device, with a SCSI logical unit ID that is assigned with the lun map command. However, NAS protocols can make a LUN available to a host if the NAS protocols are licensed and enabled on the storage system.

My guess is that there is some NetApp Snap software that attaches to volumes with LUNs on them at the volume/qtree level, and performs some type of LUN snapshot mangement/read/writes. In that case, the character format of the LUN metadata might come into play. And on any NFS or CIFS access to the LUN object (file), the metadata will be converted (if convert_ucode is "on").

aborzenkov · ‎2011-04-15

My guess is that there is some NetApp Snap software that attaches to volumes with LUNs on them at the volume/qtree level, and performs some type of LUN snapshot mangement/read/writes.

My best guess is that the reason was SnapDrive (for Windows) which used SMB to manage LUNs and so required LUN name in UNICODE. It is not clear whether HTTP(S) access method still has the same requirement or not.

UNICODE conversion was in general required to provide CIFS access; well known puzzle is file in snapshot which cannot be accessed from CIFS even though the same file in AFS can and permissions were not changed.

Darkstar · ‎2011-04-18

I hope I can answer some of your questions. Note that these answers are not official but are based on my understanding of these options, and info I gathered from NetApp workshops and by talking to NetApp people. I might be wrong on some of them.

If CIFS (using SMB 2.0) is UTF-16, then is ONTAP writing directory and file names using UTF-16?

no, OnTap always uses UTF-8 as it occupies far less space (>90% of characters use only 1 byte). UTF-8 and UTF-16 are 100% compatible and can be converted into each other with no loss of information, only UTF-8 encoded text is smaller for most western languages that's why it is used. Also, SMB 2.0 uses UCS2 and not UTF-16.

If NFS v4 is UTF-8, does ONTAP use UTF-8 for writing directory and file names from NFS v4?

yes, if you set create_ucode and convert_ucode to on. Otherwise it will be converted

When a file name is “converted” to UTF-X on CIFS access, it is suggested this is a "one time" conversion, and the name of file also converted on disk. Is this correct?

That is what convert_ucode is for. If you have it set to off, the filename is converted everytime a CIFS client accesses it

If CIFS is using UTF-16, why are the ONTAP volume language codes specified as “en.UTF-8”? If the file and directory names on disk are UTF-8, then does ONTAP always translate UTF-8 to UTF-16 when communicating with a CIFS client?

the "en" is the "old-format" file name, the UTF-8 is used for NFS accesses. This has nothing to do with create_ucode and convert_ucode, which defines the way file names are stored in WAFL (if one of these 2 volume options is ON, OnTap saves 2 file names for each file, the "old" one and an UTF-8 encoded one which gets used on UTF-8 accesses). AFAIK the "old-format" file name is not removed but retained, but I'm not sure

For environments that are predominantly (or exclusively) NFS, and include many NFS v2/v3, is it ill-advised to turn the convert_ucode option on? It would mean excessive conversion.

It shouldn't matter since the ".UTF-8" on the volume language defines the NFS character set, and if there's an UTF8 file name (as created by create_ucode or convert_ucode), that one will be sent to the client. Otherwise it will be converted on-the-fly.

In the statement (which was found on the web) – “Data ONTAP converts a directory format from pre-Data ONTAP 4.0 to Unicode”, what is the “pre-Data ONTAP 4.0 format”?

The "old" file name (i.e. the non-unicode one)

Hope that helps

-Michael

aborzenkov · ‎2011-04-18

When a file name is “converted” to UTF-X on CIFS access, it is suggested this is a "one time" conversion, and the name of file also converted on disk. Is this correct?

That is what convert_ucode is for. If you have it set to off, the filename is converted everytime a CIFS client accesses it

That's not what I have learned. AFAIK name is converted on first access and stored persistently. If name can't be stored (e.g. because volume/qtree is read-only) conversion fails and access i sdenied.

Darkstar · ‎2011-04-18

I admit don't know about what error happens if conversion fails when convert_ucode is set to on.

The point I was trying to make is, that with convert_ucode=off, the file name has to be converted every time. If you set convert_ucode=on, the converted name will be stored (and you might get an error if the volume is read only)

At least that's how I understood it.

-Michael