Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittend failure on multi failover configuration of sqlserver #606

Open
RMHogervorst opened this issue Aug 15, 2024 · 5 comments
Open

Comments

@RMHogervorst
Copy link

I'm having an issue connecting to a sqlserver database with multi failover configuration.
I'm using python with freetds as driver, and this happens both in pymssql and pyobc.

The database address (host/ domain name) resolves into 2 ip addresses.
I believe this commit should add functionality to auto-use a different address when it is available, but that is not what I see happening.

When I connect and re-connect multiple times the connection fails about half of the time, which I interpret as: I'm trying to connect to the inactive sqlserver copy. But the trace logs (reproduced below) indicate that a connection to both instances is made.

I'm using FreeTDS 1.3.6 on ubuntu. with unixODBC 2.3.9.

Are there settings I should change so that the connection is retried on a different ip adress?

details

in the freetds logs (trace) of a failing connection attempt I see the following events:

  • connection to both ip adresses ( tds_setup_socket: connect(2) returned "Operation now in progress" )
  • tds_open_socket() succeeded
  • sending packet (with details about the packet)
  • empty line
  • Changed query state from IDLE to DEAD
  • Read attempt when state is TDS_DEAD
  • login packet rejected
  • (tdserror etc. etc.)

On successful connections I see:

  • connection to both ip adresses ( tds_setup_socket: connect(2) returned "Operation now in progress" )
  • tds_open_socket() succeeded
  • sending packet (with details about the packet)
  • empty line
  • Received packet (with details about packet)
  • empty line
  • (details about the handshake etc etc)
@freddy77
Copy link
Contributor

Not sure what are the empty line but it looks like that both servers are accepting connection but the "failure" one then drops the connection. The current code assumes that if the server accepts the connection it should also reply to the login. If it's not the case than code needs updates. Can you send some more logs (maybe privately) to me to confirm?

@RMHogervorst
Copy link
Author

I'll talk to the DBAs to see of we I can get more info.

@RMHogervorst
Copy link
Author

Here is a log of failure

08:40:58.435239 508046 (log.c:187):Starting log file for FreeTDS 1.3.6
	on 2024-08-19 08:40:58 with debug flags 0xffff.
08:40:58.435283 508046 (iconv.c:367):tds_iconv_open(0x238d590, UTF-8)
08:40:58.435404 508046 (iconv.c:198):local name for ISO-8859-1 is ISO-8859-1
08:40:58.435424 508046 (iconv.c:198):local name for UTF-8 is UTF-8
08:40:58.435430 508046 (iconv.c:198):local name for UCS-2LE is UCS-2LE
08:40:58.435434 508046 (iconv.c:198):local name for UCS-2BE is UCS-2BE
08:40:58.435463 508046 (iconv.c:389):setting up conversions for client charset "UTF-8"
08:40:58.435470 508046 (iconv.c:391):preparing iconv for "UTF-8" <-> "UCS-2LE" conversion
08:40:58.435562 508046 (iconv.c:430):tds_iconv_open: done
08:40:58.435675 508046 (net.c:391):Connecting with protocol version 7.4
08:40:58.435703 508046 (net.c:318):Connecting to <IPADRESS1> port 1433
08:40:58.435851 508046 (net.c:340):tds_setup_socket: connect(2) returned "Operation now in progress"
08:40:58.435917 508046 (net.c:318):Connecting to <IPADRESS2> port 1433
08:40:58.435984 508046 (net.c:340):tds_setup_socket: connect(2) returned "Operation now in progress"
08:40:58.436005 508046 (net.c:528):tds_open_socket() succeeded
08:40:58.436045 508046 (packet.c:852):Sending packet
0000 12 01 00 3a 00 00 00 00-00 00 1a 00 06 01 00 20 |...:.... ....... |
0010 00 01 02 00 21 00 0c 03-00 2d 00 04 04 00 31 00 |....!... .-....1.|
0020 01 ff 09 00 00 00 00 00-00 4d 53 53 51 4c 53 65 |........ .MSSQLSe|
0030 72 76 65 72 00 8e c0 07-00 00                   |rver.... ..|

08:41:01.463945 508046 (util.c:179):Changed query state from IDLE to DEAD
08:41:01.464037 508046 (packet.c:539):Read attempt when state is TDS_DEAD
08:41:01.464048 508046 (login.c:599):login packet rejected
08:41:01.464056 508046 (util.c:333):tdserror(0x231f770, 0x238d960, 20002, 0)
08:41:01.464064 508046 (odbc.c:2369):msgno 20002 20003
08:41:01.464088 508046 (util.c:363):tdserror: client library returned TDS_INT_CANCEL(2)
08:41:01.464095 508046 (util.c:386):tdserror: returning TDS_INT_CANCEL(2)
08:41:01.464104 508046 (mem.c:656):tds_free_all_results()
08:41:01.464231 508046 (error.c:414):odbc_errs_add: "Unable to connect to data source"
08:41:01.464254 508046 (error_export.h:83):SQLErrorW((nil), 0x238a8b0, (nil), 0x7ffccabaf8b0, 0x7ffccabaf4ac, 0x7ffccabb10c0, 1026, 0x7ffccabaf4aa)
08:41:01.464267 508046 (error.c:559):SQLGetDiagRec: "[FreeTDS][SQL Server]Adaptive Server connection failed"
08:41:01.464286 508046 (error_export.h:83):SQLErrorW((nil), 0x238a8b0, (nil), 0x7ffccabaf8b0, 0x7ffccabaf4ac, 0x7ffccabb10c0, 1026, 0x7ffccabaf4aa)
08:41:01.464296 508046 (error.c:559):SQLGetDiagRec: "[FreeTDS][SQL Server]Unable to connect to data source"
08:41:01.464310 508046 (error_export.h:83):SQLErrorW((nil), 0x238a8b0, (nil), 0x7ffccabaf8b0, 0x7ffccabaf4ac, 0x7ffccabb10c0, 1026, 0x7ffccabaf4aa)
08:41:01.464318 508046 (odbc.c:4065):SQLFreeHandle(2, 0x238a8b0)
08:41:01.464324 508046 (odbc.c:4091):_SQLFreeConnect(0x238a8b0)
08:41:01.464330 508046 (bcp.c:673):_bcp_free_storage(0x238a8b0)
08:41:01.464338 508046 (odbc.c:4065):SQLFreeHandle(1, 0x22e5540)
08:41:01.464346 508046 (odbc.c:4138):_SQLFreeEnv(0x22e5540)

@freddy77
Copy link
Contributor

freddy77 commented Sep 5, 2024

I tried to reproduce having multiple IPs for a name with one accepting and closing connection. The MS driver behave similarly, if it hit the IP disconnecting it gives failure.

@RMHogervorst
Copy link
Author

Thank you so much for trying this. My deployment is in a complex network architecture and most likely culprits are istio or some other network layers rewriting some adresses and responding when they shouldn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants