Skip to Content
author's profile photo Former Member
Former Member

Problem in SAX Java mapping

Hi,

I'm using SAX Java mapping in one scenario. Problem is when I get some Croatina characters, like Đ or u0160,

output XML is not valid. XML Spy complains, IE complains and so on. Customer is sure that data ( XML in CLOB field in Oracle DB) is UTF-8? What could be a problem?

What I'm doing is reading entire XML into string with help of BufferedReader, then do some manipulation and write String into byte array with:

			byte[] bytes = file.toString().getBytes("UTF-8");
			saxParser.parse(new ByteArrayInputStream(bytes), handler);

and then of course parse XML. readLine method reads data and problematic is "Ä�" - ￯0 - 0xC490.

For this character XML Spy doesn't complain, IE also. After conversion, this character looks like "Ä?" - 0xC43F, and this is not good any more. Why?

Add a comment
10|10000 characters needed characters exceeded

Assigned Tags

Related questions

3 Answers

  • Best Answer
    Posted on Feb 26, 2009 at 12:23 PM

    What is file?

    You could simple use:

    saxParser.parse(in, handler);

    where in comes from method parameter from the method

    public void execute(InputStream in, OutputStream out)

    Add a comment
    10|10000 characters needed characters exceeded

    • Former Member Former Member

      Hi Stefan,

      I've finally done it. Code as foollws:

      	public void execute(InputStream in, OutputStream out)
      			throws com.sap.aii.mapping.api.StreamTransformationException {
      		DefaultHandler handler = this;
      		SAXParserFactory factory = SAXParserFactory.newInstance();
      		try {
      			SAXParser saxParser = factory.newSAXParser();
      			fStreamOut = new BufferedWriter(new OutputStreamWriter(out, "UTF-8"));
      			encoding = "UTF-8";
      			if (map != null) {
      				mappingTrace = (MappingTrace) map
      						.get(StreamTransformationConstants.MAPPING_TRACE);
      			}
      			InputStreamReader is = new InputStreamReader(in, "UTF8");
      			BufferedReader reader = new BufferedReader(is);
      			StringBuffer file = new StringBuffer();
      			String line = new String();
      			try {
      				while ((line = reader.readLine()) != null) {
      					file.append(line);
      				}
      			} catch (IOException e) {
      				e.printStackTrace();
      			} finally {
      				try {
      					in.close();
      				} catch (IOException e) {
      					e.printStackTrace();
      				}
      			}
      			Date d4 = new Date();
      			file = replaceREGEX(
      					"<\\?xml version=\"1\\.0\" encoding=\"UTF-8\"\\?>", "",
      					file);
      			char[] cArray = file.toString().toCharArray();
      			Date filedat = new Date();;
      			SimpleDateFormat df = new SimpleDateFormat("yyyyMMdd_HHmmss_SSS");
      			String fName = df.format(filedat) + "_El_Invoice.xml";
      			Writer out1 = new BufferedWriter(new OutputStreamWriter(
      					new FileOutputStream(fName), "UTF8"));
      			try {
      				out1.write(file.toString().toCharArray());
      				out1.close();
      			} catch (UnsupportedEncodingException e) {
      			} catch (IOException e) {
      			}
      
      			saxParser.parse(fName, handler);
      			File outFile = new File(fName);
      			outFile.delete();
      		} catch (Throwable t) {
      			if (mappingTrace != null) {
      				mappingTrace.addInfo(t.toString());
      			}
      			t.printStackTrace();
      		}
      	}

      problem was also in method for writing in output stream, so I've changed it:

      	private void printOutPut(String sOP) {
      		try {
                        //    fStreamOut.write(sOP.getBytes());
      			fStreamOut.write(sOP);
      		} catch (IOException e) {
      			e.notify();
      		}
      	}

  • author's profile photo Former Member
    Former Member
    Posted on Feb 26, 2009 at 01:29 AM

    Its a result of the characters being read as CData...

    Change parser to dom......

    Regards

    Ravi Raman

    Add a comment
    10|10000 characters needed characters exceeded

  • author's profile photo Former Member
    Former Member
    Posted on Feb 26, 2009 at 01:47 PM

    try getBytes() without charset.

    and check the source fileencoding it could be that the signs are allready UTF-8

    or try with getBytes("ISO 8859-2");

    Add a comment
    10|10000 characters needed characters exceeded

    • Former Member

      getBytes() without encoding : Character conversion error: "Malformed UTF-8 char -- is an XML encoding declaration missing?" (line number may be too low).

      getBytes("ISO 8859-2") :java.io.UnsupportedEncodingException: ISO 8859-2

      getBytes("ISO-8859-2") : Character conversion error: "Malformed UTF-8 char -- is an XML encoding declaration missing?" (line number may be too low).

      doesn't work ...:-(

Before answering

You should only submit an answer when you are proposing a solution to the poster's problem. If you want the poster to clarify the question or provide more information, please leave a comment instead, requesting additional details. When answering, please include specifics, such as step-by-step instructions, context for the solution, and links to useful resources. Also, please make sure that you answer complies with our Rules of Engagement.
You must be Logged in to submit an answer.

Up to 10 attachments (including images) can be used with a maximum of 1.0 MB each and 10.5 MB total.