Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I2C Master regression (IDFGH-12542) #13547

Closed
3 tasks done
ammaree opened this issue Apr 5, 2024 · 22 comments
Closed
3 tasks done

I2C Master regression (IDFGH-12542) #13547

ammaree opened this issue Apr 5, 2024 · 22 comments
Assignees
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally Type: Bug bugs in IDF

Comments

@ammaree
Copy link

ammaree commented Apr 5, 2024

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

v5.3-dev-3220-g9c99a385ad

Espressif SoC revision.

ESP32 rev 3

Operating System used.

macOS

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

None

Development Kit.

ESP32 WRover Kit

Power Supply used.

USB

What is the expected behavior?

I2C device discovery to work

What is the actual behavior?

I2C device discovery fail on ALL devices

Steps to reproduce.

Our code was working 100% on 6 different platforms configured with 2 ~ 6 different I2C devices per platform.

Using master up to 2024/03/18 all platforms all devices discovered and worked perfectly.

Having come back from leave, with NO changes to the code other than updating to latest master, all platforms fail trying to discover all devices. With the ESP-32 Rover devkit v4.1 test device only the first address (0x08) returns the correct status (ESP_ERR_NOT_FOUND) , no device found, all remaining addresses return ESP_OK

Debug Logs.

9.126 0 main i2c.master I2C hardware NACK detected
9.134 0 main i2c.master s_i2c_synchronous_transaction(833): I2C transaction failed
9.137 0 main i2c.master i2c_master_transmit(1035): I2C transaction failed
I2C -0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -a -b -c -d -e -f
0x: xx xx xx xx xx xx xx xx -- 09 0A 0B 0C 0D 0E 0F
1x: 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
2x: 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
3x: 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
4x: 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
5x: 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
6x: 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
7x: 70 71 72 73 74 75 76 77

More Information.

No response

@ammaree ammaree added the Type: Bug bugs in IDF label Apr 5, 2024
@espressif-bot espressif-bot added the Status: Opened Issue is new label Apr 5, 2024
@github-actions github-actions bot changed the title I2C Master regression I2C Master regression (IDFGH-12542) Apr 5, 2024
@ammaree
Copy link
Author

ammaree commented Apr 5, 2024

@mythbuster5 guess this would be for your attention

@mythbuster5
Copy link
Collaborator

mythbuster5 commented Apr 6, 2024

@ammaree Sorry by this. We have a testing here tests this similar usage https://github.com/espressif/esp-idf/blob/master/examples/peripherals/i2c/i2c_tools/main/cmd_i2ctools.c#L118 but it doesn't report anything wrong

     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
50: -- -- -- -- -- -- -- -- -- -- -- 5b -- -- -- -- 
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
70: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

May I check your code and check in which case this regression will happen?

@mythbuster5
Copy link
Collaborator

Seeing from log, it might not be probe interface? Is it possible to be _transmit ? If so, what about change return to break https://github.com/espressif/esp-idf/blob/master/components/esp_driver_i2c/i2c_master.c#L428

@mythbuster5 mythbuster5 self-assigned this Apr 6, 2024
@ammaree
Copy link
Author

ammaree commented Apr 6, 2024

const halI2C_Init_t halI2C_Init[halI2C_NUM] = {
  {
	.i2c_port = I2C_NUM_0,
	.sda_io_num = GPIO_NUM_26,
	.scl_io_num = GPIO_NUM_27,
	.clk_source = I2C_CLK_SRC_DEFAULT,
	.flags.enable_internal_pullup = 1,
  }
}

void halI2C_BusConfig(u8_t eCh) {
	IF_myASSERT(debugPARAM, eCh < halI2C_NUM);
	ESP_ERROR_CHECK(__real_i2c_new_master_bus(&halI2C_Init[eCh], &hI2Cbus[eCh]));
}

/***
 * halI2C_CheckAddress() - check if ANY type of I2C device is present at a specific address
 */
bool halI2C_CheckAddress(u8_t eCh, u16_t Addr) {
	int iRV = i2c_master_probe(hI2Cbus[eCh], Addr, i2cTIMEOUT_DEFAULT);
	IF_SL_ERR(debugTRACK && iRV == ESP_ERR_TIMEOUT, "Timeout B=%d A=x%X, check bus termination", eCh, Addr);
	return (iRV == ESP_OK) ? 1 : 0;
}

@ammaree
Copy link
Author

ammaree commented Apr 6, 2024

Only 2 I2C devices in this specific config, DS2482 and DS2482 at addresses 0x18 and 0x19

The errors occur before any _transmit function in my code.

  1. Initialise bus
  2. Discover devices
  3. 0x08 returns correct status (0)
  4. 0x09~0x17 return incorrect status (1)
  5. 0x18 & 0x19 return correct status (1) but assume this is by accident
  6. 0x1A to 0x77 all return incorrect status (1)

@mythbuster5
Copy link
Collaborator

mythbuster5 commented Apr 6, 2024

Haven't reproduced currently🤨. Mine code is very simple. And quite as same as yours I think?

    i2c_master_bus_handle_t bus_handle;

    i2c_master_bus_config_t i2c_mst_config = {
        .clk_source = I2C_CLK_SRC_DEFAULT,
        .i2c_port = 0,
        .scl_io_num = 27,
        .sda_io_num = 26,
        .flags.enable_internal_pullup = true,
    };

    TEST_ESP_OK(i2c_new_master_bus(&i2c_mst_config, &bus_handle));

    esp_err_t ret = ESP_OK;
    for (int i = 0; i < 0x7f; i++) {
        ret = i2c_master_probe(bus_handle, i, 20);
        printf("ret val is %x\n", ret);
    }

So...(1) Could you please use my code under esp-idf environment and check whether it works or not.
(2) There are pretty few commits this days. May you find which commit is the cause?

Actually i2c.master i2c_master_transmit(1035): I2C transaction failed this log seems from i2c_master_transmit. What the pure log about i2c_master_probe interface? @ammaree

@ammaree
Copy link
Author

ammaree commented Apr 6, 2024

halI2C_BusConfig(eCh);
halI2C_GeneralCall(eCh, I2C_GENCALL_RESET_WRITE);
IF_EXEC_1(debugTRACK && ioB1GET(ioI2Cinit), halI2C_BusScan, eCh);

@mythbuster5
Copy link
Collaborator

Checked one more time. Still doesn't reproduce the issue. Poke me if more clue has been found~~

@ammaree
Copy link
Author

ammaree commented Apr 8, 2024

Have tested using i2c_tools example, works. Will test further

@ammaree
Copy link
Author

ammaree commented Apr 8, 2024

@mythbuster5

I have made some progress. The problem starts with

halI2C_GeneralCall(eCh, I2C_GENCALL_RESET_WRITE); which translates to

void halI2C_GeneralCall(u8_t eCh, u8_t cCmd) {
	// https://www.i2c-bus.org/addressing/general-call-address/
	const i2c_device_config_t dev_cfg = {
		.dev_addr_length = I2C_ADDR_BIT_LEN_7, 			// 0 = default
		.device_address = I2C_GENCALL_ADDRESS, 			// 00 = default
		.scl_speed_hz = 100000,
	};

	i2c_di_t * psI2C = &sI2C_DI[eCh];
	i2c_master_dev_handle_t hndl;
	ESP_ERROR_CHECK(__real_i2c_master_bus_add_device(hI2Cbus[psI2C->Port], &dev_cfg, &hndl));
	ESP_ERROR_CHECK(__real_i2c_master_transmit(hndl, &cCmd, sizeof(u8_t), i2cTIMEOUT_DEFAULT));
	ESP_ERROR_CHECK(__real_i2c_master_bus_rm_device(hndl));
}

With this API call removed the error message
i2c.master I2C hardware NACK detected
and the I2C devices are discovered correctly.

The DS2484 device at address 0x18 is identified correctly but identifying the DS2482-800 at 0x19 causes the following 2 error messages:
i2c.master s_i2c_synchronous_transaction(833): I2C transaction failed i2c.master i2c_master_transmit(1036): I2C transaction failed

So, with the I2C_GENCALL_RESET_WRITE https://www.i2c-bus.org/addressing/general-call-address/ removed we are a step forward, but we need to fix this problem and try determine the cause of the other 2 error messages.

Please note: In spite of the 2 error messages the DS2482-800 device seems to operate correctly.

@nebkat
Copy link
Contributor

nebkat commented Apr 9, 2024

@mythbuster5 Can we please get 8f84ab3 and the other recent changes merged into release/v5.2 ASAP and hasten the release of v5.2.2? We have updated our framework to 5.2 assuming this was a stable release but have had significant problems due to the new driver which has been extremely frustrating to deal with.

EDIT: I see the changes just got pushed to release/v5.2, thank you.

@ammaree
Copy link
Author

ammaree commented Apr 9, 2024

@mythbuster5

I can confirm that:

a) writing to I2C address 0x00 (General call) with data 0x06 (RESET) is definitely a problem, result in NACK error message as well as the device discovery failing. See https://www.nxp.com/docs/en/user-guide/UM10204.pdf section 3.1.14 for more info.

b) the other 2 error messages only seem to occur when a DS2482-800 device is being used. The actual device address seems to be irrelevant, and the errors do not occur with the (very similar) DS2482-100/101 nor the DS2484 devices.

@mythbuster5
Copy link
Collaborator

Thanks for this message. I will do more testing~

@mythbuster5
Copy link
Collaborator

I saw there is a sentence for 0x00+0x60 Following a General Call, (0000 0000), sending 0000 0110 (06h) as the second byte causes a software reset. This feature is optional and not all devices respond to this command. So that might be the reason why some of your device works, some are return NACK.

As for your issue title regression, the only thing I did is to add nack check. But I also reserve a back door for this change. That is in i2c_device_config_t for flags.disable_ack_check. That will disable the nack check and I think the code will work as same as before in your case

@ammaree
Copy link
Author

ammaree commented Apr 11, 2024

Have added the flag and it does take the error message away. Maybe the name flags.disable_ack_check is a bit misleading since it seems to disable both ACK and NACK checks?

But what I think should be looked at (and corrected) is the fact this this error I2C hardware NACK detected leaves the I2C environment is a state of complete disfunction, no other devices is detected properly afterwards.

Lastly, the new scp_wait_us member does not seem to have any getter/setter API's. If these are added it might help us to customise the configuration for a specific address once we have determined the actual device at the address.

It might also be able to help once errors with specific devices are discovered after having been added to the bus, without having to remove and re-add the device from the I2C bus.

@mythbuster5
Copy link
Collaborator

Thanks for that much useful information.

For writing to I2C address 0x00 (General call) with data 0x06 (RESET) is definitely a problem, result in NACK error message as well as the device discovery failing. See [https://www.nxp.com/docs/en/user-guide/UM10204.pdf](https://github.com/espressif/esp-idf/issues/url) section 3.1.14 for more info. It's a problem indeed. Will fix it soon.

As for the other 2 error messages only seem to occur when a DS2482-800 device is being used. The actual device address seems to be irrelevant, and the errors do not occur with the (very similar) DS2482-100/101 nor the DS2484 devices. The two fail messages. That because there is no device react the general call, and online data return NACK, which usually treated as an error.

image

@ammaree
Copy link
Author

ammaree commented Apr 11, 2024

9.069 0 i2cTM i2c.master s_i2c_synchronous_transaction(833): I2C transaction failed
9.097 0 i2cTM i2c.master i2c_master_transmit(1035): I2C transaction failed

Not sure I agree on the cause here. With the General Call Reset commented out these errors still occur 100% of the time on the DS2482-800 devices, not the -100/101 nor the DS2484 devices. These errors are as a result of normal read and/or write operations on the DS2482-800 device specifically.

Have just done a test and changed the scp_wait_ms from 2000 to 10000 and the error messages disappeared.

Maybe the default has to be increased, alternatively a get/set API to override selectively?

@ammaree
Copy link
Author

ammaree commented Apr 11, 2024

Apologies, ignore last message.

I will comment once the fix you are working on has been tested.

@mythbuster5
Copy link
Collaborator

This got closed based on commit above. Feel free to poke me if there is still something else.

@ammaree
Copy link
Author

ammaree commented Apr 17, 2024

Thanks, will do

@ammaree
Copy link
Author

ammaree commented Apr 17, 2024

@mythbuster5

OK, have tested the fix on devices with DS2482.

I have reverted the .flags.disable_ack_check back to false for both the General Reset call as well as the device initialisation with the following error codes as a result.
8.613 0 main i2c.master I2C hardware NACK detected
8.622 0 main i2c.master I2C transaction unexpected nack detected
8.627 0 main i2c.master s_i2c_synchronous_transaction(870): I2C transaction failed
8.635 0 main i2c.master i2c_master_transmit(1072): I2C transaction failed
8.661 0 i2cTM i2c.master I2C transaction unexpected nack detected
8.666 1 i2cTM i2c.master s_i2c_synchronous_transaction(870): I2C transaction failed
8.671 1 i2cTM i2c.master i2c_master_transmit(1072): I2C transaction failed

After the process the DS2482 seem to be working so my thought is that the functionality has been restore similar to what was working on the old I2C driver, but with new error messages.

  1. Am I missing something with the new "flag" that was added, should the bus/device initialisation be adapted?
  2. If these errors are to be expected under certain circumstance, are they then really errors?
  3. Should the level of these logs not be changed to INFO, DEBUG or VERBOSE?
  4. If I have to leave the .flags.disable_ack_check=true to get rid of the messages and have the device functioning, what have we then really achieved?

@mythbuster5
Copy link
Collaborator

Well. Firstly, usually when a add a flag, i try to keep flag.item = 0 as behavior as before and flag.item = 1 as a feature to avoid a breaking change. But like .flags.disable_ack_check user complains a lot that default behavior doesn't check nack properly. So I have to make some changes..

Secondly, I understand your concern。 LIke probe interface, it depends on nack but I didn't pring nack error log because it's expected. But for transmit, nack is unexpected for most of time. As a driver, pretty hard to know it is expected or not.

Thirdly. yes I agree the log now it too much. But they are log errors indeed. I'm now considering to make them clearer

Finally. I'm not very clear with this question. As I said before, we check nack in transmit most of time. But we also agree to bypass nack check for some situations. Like no device connected but want to see online data, etc. So I think it also could be useful in your case.

Have a good day and thanks for your opinions~

@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: Opened Issue is new labels Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

4 participants