realsense-ros initial_reset problem when using a combination of D435 and D415

Hi, I’m developing a robot equipped with D435 and D415. It has been bothering me for some time that enabling initial_reset with this combination can cause startup failures depending on the startup order and connection method.

Specifications
CameraD435 and D415 (Firmware: 05.13.00.50)
ROSMelodic and Noetic
ComputerIntel cpu and NVIDIA Jetson TX2
Branchtag 2.3.2
arg "initial_reset"true
arg "device_type"d415
  • launch code
<launch>
  <arg name="serial_no_camera1"         default=""/>
  <arg name="camera1"                   default="camera1"/>
  <arg name="tf_prefix_camera1"         default="$(arg camera1)"/>
  <arg name="initial_reset"             default="true"/>
  <arg name="reconnect_timeout"         default="6.0"/>

  <group ns="$(arg camera1)">
    <include file="$(find realsense2_camera)/launch/includes/nodelet.launch.xml">
      <arg name="serial_no"             value="$(arg serial_no_camera1)"/>
      <arg name="tf_prefix"             value="$(arg tf_prefix_camera1)"/>
      <arg name="initial_reset"         value="$(arg initial_reset)"/>
      <arg name="reconnect_timeout"     value="$(arg reconnect_timeout)"/>
      <arg name="device_type"           value="d415"/>
    </include>
  </group>
</launch>
  • log
[ INFO] [1638718107.455778788]: Initializing nodelet with 8 worker threads.
[ INFO] [1638718107.731504760]: RealSense ROS v2.3.2
[ INFO] [1638718107.731596483]: Built with LibRealSense v2.50.0
[ INFO] [1638718107.731644363]: Running with LibRealSense v2.50.0
[ INFO] [1638718107.784830955]:
[ INFO] [1638718107.794028407]: Device with serial number 920312070332 was found.

[ INFO] [1638718107.794091575]: Device with physical ID /sys/devices/pci0000:00/0000:00:14.0/usb2/2-4/2-4:1.0/video4linux/video0 was found.
[ INFO] [1638718107.794117418]: Device with name Intel RealSense D435 was found.
[ INFO] [1638718107.794870521]: Device with port number 2-4 was found.
[ INFO] [1638718107.897403701]: Device with serial number 840412061919 was found.

[ INFO] [1638718107.897500653]: Device with physical ID /sys/devices/pci0000:00/0000:00:14.0/usb2/2-5/2-5:1.0/video4linux/video3 was found.
[ INFO] [1638718107.897537773]: Device with name Intel RealSense D415 was found.
[ INFO] [1638718107.898256042]: Device with port number 2-5 was found.
[ INFO] [1638718107.898347614]: Device USB type: 3.2
[ INFO] [1638718107.898405764]: Resetting device...
[ INFO] [1638718113.995645738]:
[ INFO] [1638718114.000380892]: Device with serial number 920312070332 was found.

[ INFO] [1638718114.000414794]: Device with physical ID /sys/devices/pci0000:00/0000:00:14.0/usb2/2-4/2-4:1.0/video4linux/video0 was found.
[ INFO] [1638718114.000428797]: Device with name Intel RealSense D435 was found.
[ INFO] [1638718114.000773426]: Device with port number 2-4 was found.
(Node will stop here permanently...)

I have investigated this problem in newer versions (lib:2.50.0 and 2.3.2, ROS1) and found a solution, which I will report.

This problem occurs in the following locations

https://github.com/IntelRealSense/realsense-ros/blob/f400d682beee6c216052a419f419e95b797255ad/realsense2_camera/src/realsense_node_factory.cpp#L99-101

When ros2::device is generated multiple times in a loop, it seems to fail to generate and go into a locked state. This problem occurs on the D415 side, since D435 is always stored first in the list.

The following fix has been found to solve this problem.

diff --git a/realsense2_camera/src/realsense_node_factory.cpp b/realsense2_camera/src/realsense_node_factory.cpp
index 3eb7e96..372d142 100644
--- a/realsense2_camera/src/realsense_node_factory.cpp
+++ b/realsense2_camera/src/realsense_node_factory.cpp
@@ -96,9 +96,9 @@ void RealSenseNodeFactory::getDevice(rs2::device_list list)
                {
                        bool found = false;
                ROS_INFO_STREAM(" ");
+                       rs2::device dev;
                        for (size_t count = 0; count < list.size(); count++)
                        {
-                               rs2::device dev;
                                try
                                {
                                        dev = list[count];

After the change, D415 will be detected after D435 and the node will boot correctly.

Since this does not occur when using GDB, the details of the problem have not been identified.

Since both the problem and the solution are puzzling, we cannot say that this fix is the right one. However, it is effective in dealing with problems, so I suggested it. I would be happy to get help from anyone who knows.

Thanks.

Asked Jan 08 '22 16:01
avatar nomumu
nomumu

10 Answer:

Hi @nomumu Thanks so much for sharing your knowledge with the RealSense ROS community!

Would it be practical for your project to launch the cameras individually in separate ROS terminals with rs_camera.launch if you are not using this method already, so that initial_reset instructions can be issued for individual cameras in a specific terminal instead of all of them? Intel provide RealSense ROS guides for launching cameras this way, either with one computer or two computers.

One computer https://github.com/IntelRealSense/realsense-ros/wiki/Showcase-of-using-2-cameras

Two computers https://github.com/IntelRealSense/realsense-ros/wiki/showcase-of-using-3-cameras-in-2-machines

1
Answered Dec 05 '21 at 18:23
avatar  of MartyG-RealSense
MartyG-RealSense

Thank you for your reply!

To simplify reporting, I used the basic example of rs_multi_devices.launch, and it looks like the same problem occurs with rs_camera.launch when using the initial_reset option. Splitting the startup file into two and not using D415's initial_reset will work. Essentially, I think the problem is that the D415 cannot reconnect after initial_reset in an environment where the D435 is connected. (You don't need to launch more than one device to reproduce this.)

I haven't heard of any similar cases, so maybe this is a problem unique to me...

It is difficult due to the structure of my robot, but I think the idea of splitting the camera to two computers is certainly valid.

1
Answered Dec 06 '21 at 00:50
avatar  of nomumu
nomumu

There used to be issues in the RealSense ROS wrapper with launching a D435 and a RealSense T265 Tracking Camera at the same time. In response to that, the ability was introduced to identify cameras by the ID of the USB port that they are plugged into with the usb_port_id instruction instead of the camera's unique serial number.

The wrapper documentation states: "usb_port_id: will attach to the device with the given USB port (usb_port_id). i.e 4-1, 4-2 etc. Default, ignore USB port when choosing a device".

You should be able to incorporate usb_port_id into a roslaunch instruction with a command such as the example below.

usb_port_id:=4-1

1
Answered Dec 06 '21 at 09:58
avatar  of MartyG-RealSense
MartyG-RealSense

Hi, First of all, thanks for the advice. I have checked your suggestion, although it does not match the part I am reporting. The usb_port_id that you suggested did not solve the problem.

Secondly, I was able to analyze the problem with GDB. I will report the details in the next comment.

1
Answered Dec 07 '21 at 16:25
avatar  of nomumu
nomumu

The problem I encountered turned out to be a deadlock in librealsense. I am attaching the stack trace log of the thread in problem.

2188_gdb_report.txt

The relationship between the two threads where the problem occurs is described below.

  • Thread 21
    This is the thread that executes getDevice(). This function executes rs2::device::~device() at the end of the loop block to release the dev. At this time, the mutexes are locked in the following order inside librealsense.

    [ _devices_changed_callbacks_mtx -> _dispatch_mutex ]

  • Thread 17 This thread is called when the connection status of the camera changes due to a reset, etc. At this time, on_device_changed() locks the mutex in the following order.

    [ _dispatch_mutex -> _devices_changed_callbacks_mtx ]

Due to these relationships, deadlock will occur if rs2::device is regenerated while issuing initial_reset. I am thinking of creating a PR with the changes I reported in the beginning, what do you think?

1
Answered Dec 07 '21 at 17:18
avatar  of nomumu
nomumu

I will refer your question to @doronhi the RealSense ROS wrapper developer. Thanks very much for the detailed information that you have provided during this case!

1
Answered Dec 07 '21 at 17:25
avatar  of MartyG-RealSense
MartyG-RealSense

Hi @nomumu , Thanks for your report and thorough investigation. How often do you estimate the issue occurs? I ask because I copied and run your example launch file several times but could not reproduce the issue. In my case, the D415 was discovered first so I modified the launch file to seek the D435i camera but that is unlikely to be the issue.

My command line is: roslaunch realsense2_camera my_rs_camera.launch - my_rs_camera.launch is the launch code you supplied. I am testing on a noetic system, using librealsense2 version 2.50.0, built with V4L backend. What is the source of the librealsense2 version you have? Is it possible that it uses the RSUSB backend? If you build it yourself, can you share the cmake command you used? If not, can you share the output of the following command: apt list --installed | grep realsense?

1
Answered Dec 08 '21 at 12:16
avatar  of doronhi
doronhi

Hi, @doronhi. Thanks, I really appreciate your quick and polite response!

First, It is some information about my environments about librealsense. Would the following information give you a clue?

~$ apt list --installed | grep realsense

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

librealsense2-dbg/focal,now 2.50.0-0~realsense0.6128 amd64
librealsense2-dev/focal,now 2.50.0-0~realsense0.6128 amd64
librealsense2-dkms/focal,now 1.3.18-0ubuntu1 all
librealsense2-gl/focal,now 2.50.0-0~realsense0.6128 amd64
librealsense2-net/focal,now 2.50.0-0~realsense0.6128 amd64
librealsense2-udev-rules/focal,now 2.50.0-0~realsense0.6128 amd64
librealsense2-utils/focal,now 2.50.0-0~realsense0.6128 amd64
librealsense2/focal,now 2.50.0-0~realsense0.6128 amd64

Next, there is an additional report. From your information, I also confirmed the combination of D415 and D435i. As a result, I realized that I had overlooked the reproduction condition.

In reality, the combination of cameras is not important. The order in which they are registered in the device_list is important. My report log at the top of this issue shows the camera connected as follows

USB Port 2-4 ==== D435 (first of device_list) USB Port 2-5 ==== D415 (second)

With this connection, I had problems reconnecting the D415, but when I connected 2-4 and 2-5 so that they were reversed, I was able to confirm that this now occurred with the D435 (and of course with the D435i). This problem seems to occur when initial_reset is executed for the second or later camera of multiple realsenses that has been assigned a USB port ID. Could you try changing the device_type of launch file to match the USB port ID situation?

1
Answered Dec 08 '21 at 13:51
avatar  of nomumu
nomumu

Hi @nomumu , Sorry for my late response. So far the issue does not reproduce on my machine and I could not yet find the time to investigate further. I have received a report of a similar issue with Dashing so I believe there is indeed an issue and it most likely stems from within librealsense but it may take me a while to confirm and come up with a meaningful response. I am happy that you have found a workaround that works for you and hope to get around to it soon and understand why it works and how to fix the underneath problem.

1
Answered Dec 20 '21 at 12:41
avatar  of doronhi
doronhi

Hi @doronhi, Thanks for your update. Hmmm...not reproducible is a difficult situation. For reference, I am using Intel NUC, Let's NOTE, and JetsonTX2 to reported this issue. Interesting information as I have not done any checking with ROS2. If there is anything I can do to help you solve this issue, please let me know.

1
Answered Dec 22 '21 at 16:02
avatar  of nomumu
nomumu