cancel
Showing results for 
Search instead for 
Did you mean: 

locale questions

Former Member
0 Kudos

Running on DS 12.1.0.0

1) The customer is trying to read from an Oracle DB with NLS_CHARACTERSET set to WE8DEC.

We donu2019t list WE8DEC as a Datastore code page. Do you know if the Oracle clientu2019s NLS_LANG is set to UTF-8 if the data will be transcoded between WE8DEC and UTF-8 as it goes from the Server to the client? If so, then I could set the Datastore code page to UTF-8 and read the data. From what I see on the Web, WE8DEC is a multi-byte code page.

If the above wonu2019t work, what are my options?

2) The customer will have multiple Oracle sources and the NLS_CHARACTERSET is not the same on all servers. Iu2019m only going to have one NLS_LANG setting for the Oracle client on the Job Server machine. Iu2019m hoping that setting NLS_LANG to UTF-8 or AL32UTF8 will cause the data to get transcoded from the server code page to the Oracle client code page during the transmission. Iu2019ll just have to set my Oracle Datastores to UTF-8.

If the above will work, is there a limit to which code pages can be converted automatically to UTF-8? If that doesnu2019t work, the only other option I can think of is to NLS_LANG every time I want to read from a different NLS_CHARACTERSET.

3) Some of the Oracle servers may have different languages or territories. Iu2019m thinking I must change the NLS_LANG setting before I pull from a server with a different language or territory. Is this correct? If so, Iu2019m thinking I could use the set_env() to set the NLS_LANG environment before extracting from a Datastore that needs a specific language. I would only be able to pull from Oracle Datastores with the corresponding language while the environment was set. If I wanted to pull from Datastores with a different code page Iu2019d have to use set_env() again before doing so.

Would this work if I try to change the NLS_LANG setting using set_env() within the same Job? Iu2019m pretty sure the Datastore connections are tested at the beginning of the job, but I donu2019t know if those connections are dropped and then re-established once the Data Flow starts. If new connections are not made when the Data Flow starts, then my assumption is that Iu2019ll be restricted to one Oracle client locale setting for the life of the job.

4) How do you set the Oracle client locale on AIX? Is it simply setting the environment variable NLS_LANG or is there more to it?

Accepted Solutions (1)

Accepted Solutions (1)

werner_daehn
Active Contributor
0 Kudos

Short:set the NLS_LANG environment variable to UTF-8, set the Oracle datastore codepages property to UTF-8 as well and you will be fine.

Mid: Read this https://wiki.sdn.sap.com:443/wiki/display/BOBJ/Multiple+Codepages

Longer:

The database codepage defines what characters the database can store. If it is set to ASCII it does not matter what you do, you cannot store ÖÄÜ chars. If it is set to WE8DEC (Western Europe) it cannot store chinese chars.

The session codepage is the codepage the client tools talk to the database. If the session codepage is different from the database codepage, the SQL*Net layer tries to translate the chars as much as it can. So if your database codepage is UTF-8 and the session codepage is WE8DEC, the client character Ä is a one byte number and will be transcoded into a UTF-8 two byte number representing an Ä in this codepage. Obviously, if your database has a smaller codepage than the session codepage you are delivering all chars correctly but the database cannot store them. As a result, set the session codepage to utf-8. All source databases will be able to convert their characters to utf-8 as unicode can deal with all chars of the world. And the target database should be running in utf-8 as well otherwise it might not be able to store these characters.

The datastore codepage has to be identical to the NLS_LANG setting, e.g. utf-8=utf-8 or WE8DEC=iso-8859-P1. Otherwise we get the number 0xC0 from the database session which is an Ä in WE8DEC but the datastore believes that is an e.g. unicode char - that would be a mess.

The session codepage is set via the environment variable NLS_LANG, best would be to add that "export NLS_LANG=..." in the al_env.sh script.

Note: The engine runs in utf-16 mode if any unicode codepage is specified, so do not be surprised that your datastores are all utf-8 but the engine runs in utf-16 for no obvious reasons. That's done for performacne reasons.

Answers (0)