Skip to Content
author's profile photo Former Member
Former Member

locale questions

Running on DS

1) The customer is trying to read from an Oracle DB with NLS_CHARACTERSET set to WE8DEC.

We donu2019t list WE8DEC as a Datastore code page. Do you know if the Oracle clientu2019s NLS_LANG is set to UTF-8 if the data will be transcoded between WE8DEC and UTF-8 as it goes from the Server to the client? If so, then I could set the Datastore code page to UTF-8 and read the data. From what I see on the Web, WE8DEC is a multi-byte code page.

If the above wonu2019t work, what are my options?

2) The customer will have multiple Oracle sources and the NLS_CHARACTERSET is not the same on all servers. Iu2019m only going to have one NLS_LANG setting for the Oracle client on the Job Server machine. Iu2019m hoping that setting NLS_LANG to UTF-8 or AL32UTF8 will cause the data to get transcoded from the server code page to the Oracle client code page during the transmission. Iu2019ll just have to set my Oracle Datastores to UTF-8.

If the above will work, is there a limit to which code pages can be converted automatically to UTF-8? If that doesnu2019t work, the only other option I can think of is to NLS_LANG every time I want to read from a different NLS_CHARACTERSET.

3) Some of the Oracle servers may have different languages or territories. Iu2019m thinking I must change the NLS_LANG setting before I pull from a server with a different language or territory. Is this correct? If so, Iu2019m thinking I could use the set_env() to set the NLS_LANG environment before extracting from a Datastore that needs a specific language. I would only be able to pull from Oracle Datastores with the corresponding language while the environment was set. If I wanted to pull from Datastores with a different code page Iu2019d have to use set_env() again before doing so.

Would this work if I try to change the NLS_LANG setting using set_env() within the same Job? Iu2019m pretty sure the Datastore connections are tested at the beginning of the job, but I donu2019t know if those connections are dropped and then re-established once the Data Flow starts. If new connections are not made when the Data Flow starts, then my assumption is that Iu2019ll be restricted to one Oracle client locale setting for the life of the job.

4) How do you set the Oracle client locale on AIX? Is it simply setting the environment variable NLS_LANG or is there more to it?

Add a comment
10|10000 characters needed characters exceeded

Assigned Tags

Related questions

1 Answer

  • Best Answer
    Posted on Mar 19, 2009 at 08:49 AM

    Short:set the NLS_LANG environment variable to UTF-8, set the Oracle datastore codepages property to UTF-8 as well and you will be fine.

    Mid: Read this


    The database codepage defines what characters the database can store. If it is set to ASCII it does not matter what you do, you cannot store ÖÄÜ chars. If it is set to WE8DEC (Western Europe) it cannot store chinese chars.

    The session codepage is the codepage the client tools talk to the database. If the session codepage is different from the database codepage, the SQL*Net layer tries to translate the chars as much as it can. So if your database codepage is UTF-8 and the session codepage is WE8DEC, the client character Ä is a one byte number and will be transcoded into a UTF-8 two byte number representing an Ä in this codepage. Obviously, if your database has a smaller codepage than the session codepage you are delivering all chars correctly but the database cannot store them. As a result, set the session codepage to utf-8. All source databases will be able to convert their characters to utf-8 as unicode can deal with all chars of the world. And the target database should be running in utf-8 as well otherwise it might not be able to store these characters.

    The datastore codepage has to be identical to the NLS_LANG setting, e.g. utf-8=utf-8 or WE8DEC=iso-8859-P1. Otherwise we get the number 0xC0 from the database session which is an Ä in WE8DEC but the datastore believes that is an e.g. unicode char - that would be a mess.

    The session codepage is set via the environment variable NLS_LANG, best would be to add that "export NLS_LANG=..." in the script.

    Note: The engine runs in utf-16 mode if any unicode codepage is specified, so do not be surprised that your datastores are all utf-8 but the engine runs in utf-16 for no obvious reasons. That's done for performacne reasons.

    Add a comment
    10|10000 characters needed characters exceeded

Before answering

You should only submit an answer when you are proposing a solution to the poster's problem. If you want the poster to clarify the question or provide more information, please leave a comment instead, requesting additional details. When answering, please include specifics, such as step-by-step instructions, context for the solution, and links to useful resources. Also, please make sure that you answer complies with our Rules of Engagement.
You must be Logged in to submit an answer.

Up to 10 attachments (including images) can be used with a maximum of 1.0 MB each and 10.5 MB total.