cancel
Showing results for 
Search instead for 
Did you mean: 

Inexplicable SQL syntax error when using SAP HANA JDBC driver - why?

When using the HANA JDBC driver (downloaded from here) to write data to a HANA DB table from Apache Spark I'm getting a 'sql syntax error: incorrect syntax near "TEXT"' error which appears completely unrelated to what I'm doing.

Following the "JDBC To Other Databases" Spark documentation and the "Calling HANA Views from Apache Spark" blog post I tried

(df.write
.format('jdbc')
.option('url', f'jdbc:sap://{HANA_HOST}:{HANA_PORT}')
.option('dbtable', f'{HANA_SCHEMA}.{HANA_TABLE}')
.option('user', HANA_USER)
.option('password', HANA_PWD)
.mode('overwrite')
.save())

with a Spark dataframe that I created by loading a csv file via

df = (spark.read
.option('delimiter', ',')
.option('header', True)
.csv(file))

However, I'm getting this exception when running the Spark job:

com.sap.db.jdbc.exceptions.JDBCDriverException: SAP DBTech JDBC: [257] (at 60): sql syntax error: incorrect syntax near "TEXT": line 1 col 60 (at pos 60)
at com.sap.db.jdbc.exceptions.SQLExceptionSapDB._newInstance(SQLExceptionSapDB.java:209)
at com.sap.db.jdbc.exceptions.SQLExceptionSapDB.newInstance(SQLExceptionSapDB.java:42)
at com.sap.db.jdbc.packet.HReplyPacket._buildExceptionChain(HReplyPacket.java:837)
at com.sap.db.jdbc.packet.HReplyPacket.getSQLExceptionChain(HReplyPacket.java:195)
at com.sap.db.jdbc.packet.HPartInfo.getSQLExceptionChain(HPartInfo.java:39)
at com.sap.db.jdbc.ConnectionSapDB._receive(ConnectionSapDB.java:5436)
at com.sap.db.jdbc.ConnectionSapDB.exchange(ConnectionSapDB.java:2528)
at com.sap.db.jdbc.StatementSapDB._executeDirect(StatementSapDB.java:2249)
at com.sap.db.jdbc.StatementSapDB._execute(StatementSapDB.java:2223)
at com.sap.db.jdbc.StatementSapDB._execute(StatementSapDB.java:2184)
at com.sap.db.jdbc.StatementSapDB._executeUpdate(StatementSapDB.java:2152)
at com.sap.db.jdbc.StatementSapDB.executeUpdate(StatementSapDB.java:253)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.executeStatement(JdbcUtils.scala:1060)
...

I'm using spark-3.3.0 with Python (pyspark-3.3.0) and the ngdbc-2.13.9 JDBC driver.

print(df)
df.show()

displays

DataFrame[a: string, b: string, c: string]
+---+---+---+
| a| b| c|
+---+---+---+
| 1| 2| 3|
| 2| 3| 4|
| 3| 4| 5|
+---+---+---+

The target table is created by doing

CREATE TABLE "<HANA_SCHEMA>"."<HANA_TABLE>"(
"a" NVARCHAR(10),
"b" NVARCHAR(10),
"c" NVARCHAR(10)
);

The 'incorrect syntax near "TEXT"' error message seems to indicate that the DB connection is working but something's wrong with the generated SQL INSERT statement. I've tried running the job with "DEBUG" log level but that didn't provide any additional clues.

Why am I getting an 'incorrect syntax near "TEXT"' error which appears to be completely unrelated to what I'm doing?

Accepted Solutions (1)

Accepted Solutions (1)

richard_jones5
Explorer

I suspect Apache Spark is generating a SQL statement containing the TEXT keyword. TEXT is a type in HANA OnPrem, but is not supported in CloudEdition.

We need a JDBC CONNECTIONS, API, and PACKET trace with timestamps to investigate further. This file is generated on the client machine, not on the server machine.

To turn on JDBC tracing, determine where on the client machine the HANA JDBC driver is located (the file name is ngdbc.jar), cd to this directory, and then run the following commands to enable JDBC CONNECTIONS, API, and PACKET tracing with timestamps:

java -jar ngdbc.jar TRACE ON
java -jar ngdbc.jar TRACE CONNECTIONS ON
java -jar ngdbc.jar TRACE API ON
java -jar ngdbc.jar TRACE PACKET ON
java -jar ngdbc.jar TRACE SHOW TIMESTAMPS ON

Then verify the settings by running:

java -jar ngdbc.jar SHOW

which should produce output similar to this:

  Driver version                               : 2.13.3-5e06dac15e86d540872186594eff7ab462b26292
Settings file name : C:\Users\I826649\.sdb\jdbctracesettings.ini
Shared memory file name : C:\Users\I826649\.sdb\jdbctrace.shm
Configuration
Trace : Enabled
Trace file name : jdbctrace.prt
TRACE CONNECTIONS : On
TRACE API : On
TRACE PACKET : On
TRACE DISTRIBUTION : Off
TRACE STATISTICS : Off
TRACE CLEANERS : On
TRACE DEBUG : Off
Show plain-text client-side encrypted values : Disabled
Show timestamps : Enabled
Show elapsed times : Disabled
Trace file size : Unlimited
Stop on error : Disabled
Trace failure action : Ignore
Performance trace : Disabled
Performance trace file name : jdbcperformancetrace.prt

When you run your application, the driver will produce a file named jdbctrace####.prt in the current directory. This is the file we need.

0 Kudos

Thanks for taking this up! Unfortunately, I cannot attach prt files here. I'm going to try and catch you on Teams.

0 Kudos

richard.jones5, I looked at the trace file and found this line:

com.sap.db.jdbc.HanaStatement@18e3b3c9.executeUpdate("CREATE TABLE <HANA_SCHEMA>.<HANA_TABLE> (""a"" TEXT , ""b"" TEXT , ""c"" TEXT ) ")

which seems to confirm your suspicion. So, I guess I need to check whether there is some way to force Spark to use a known data type instead.

0 Kudos

Thanks again, richard.jones5 , based on your hint I finally got it to work by adding

.option('createTableColumnTypes', 'a varchar(10), b varchar(10), c varchar(10)')

to the df.write...save().

Answers (1)

Answers (1)

0 Kudos

Ways to fix Syntax Errors

Check keyword spelling by referring to the documentation for the type of SQL you are using.

Check table spelling by referring to the database schema.

Check column spelling by referring to the database schema or doing SELECT * FROM the table you are trying to check the column name on.

Greeting,

Rachel Gomez