catching and resuming from sigsegv

Discussion in 'Programming & Software Development' started by MadOnion87, Oct 21, 2008.

  1. MadOnion87

    MadOnion87 Member

    Joined:
    Jul 10, 2002
    Messages:
    3,232
    Location:
    Old Trafford, Manchester
    Hey all,

    I'm writing this massive system where we basically just gonna say "hey there's got to be some bugs remaining in the software when we deploy it". So now I'm trying to catch seg faults and then stop the thread responsible, destroy the objects and restart the thread. However it doesn't seem like its possible, the man pages of signal says its behaviour for multithreaded processes is undefined and sigaction didn't seem to work either. Have anyone tried this?

    I'm using linux btw.

    cheers
     
  2. Thunder

    Thunder Member

    Joined:
    Jul 22, 2001
    Messages:
    782
    Location:
    Brisbane
    It's not possible to do that, as far as I understand. If there is a SIGSEGV then your program is going to be terminated. you can't restart individual threads like that, It operates on a process level, not a thread level.

    What you can do is have another program which starts your main program via a fork() and exec(). Then have it do a waitpid() (wait until process termination). when your main program crashes waitpid() will return and you can just do another fork(),exec() and waitpid() again.
     
  3. OP
    OP
    MadOnion87

    MadOnion87 Member

    Joined:
    Jul 10, 2002
    Messages:
    3,232
    Location:
    Old Trafford, Manchester
    Thanks for the suggestion.

    I'm trying to get it to be as robust as possible that's why I wanna restart the thread and keep the program running. I've done a bit more googling and seems like its possible, but quite complicated. I have to fix the problem and let it resume. I tried doing a pthread_kill in the signal handler and it will infinite loop because it will jump back to the instruction that caused the segfault, strange because the thread shouldn't exist anymore.
     
  4. Thunder

    Thunder Member

    Joined:
    Jul 22, 2001
    Messages:
    782
    Location:
    Brisbane
    I believe it can be done ONLY if you can fix what caused the SIGSEGV in the signal handler, which is only useful in a very limited number of situations.

    Once the signal handler returns it will go back to where it was (the code that raised the SIGSEGV) and continue executing, when this happens the problem needs to have been fixed in the signal handler. Shooting the offending thread in the head does /NOT/ count as fixing what caused the problem.
     
  5. OP
    OP
    MadOnion87

    MadOnion87 Member

    Joined:
    Jul 10, 2002
    Messages:
    3,232
    Location:
    Old Trafford, Manchester
    The way we've structured it is that when the thread exits, all the memory goes with it. I've managed to get it to do what I wanted. So in the signal handler i set a flag indicating that the thread is dead then i do a longjmp back out to the parent thread. If the flag is set then I do pthread_cancel on it to kill it, reset the flag and reinitialise the memory (class) and restart the thread.

    thanks for your reply

    [edit] and yea I know that's not exactly fixing the problem, but at least we can keep the program going. And of course I will try all eliminate all the bugs if I can. [/edit]
     
  6. Thunder

    Thunder Member

    Joined:
    Jul 22, 2001
    Messages:
    782
    Location:
    Brisbane
    Interesting, I didn't even think it was possible. So it will restart the thread if you dereference NULL or scribble memory?
     
  7. OP
    OP
    MadOnion87

    MadOnion87 Member

    Joined:
    Jul 10, 2002
    Messages:
    3,232
    Location:
    Old Trafford, Manchester
    yep

    And to scale up to many threads, we just have a map marking which thread has died and which hasn't. We do the longjmp to that point where we check for dead threads. Also, re-enable the signal handler in the signal handler.
     
  8. Stinger

    Stinger Member

    Joined:
    Feb 15, 2002
    Messages:
    264
    haha that sounds so hackish :)
     
  9. OP
    OP
    MadOnion87

    MadOnion87 Member

    Joined:
    Jul 10, 2002
    Messages:
    3,232
    Location:
    Old Trafford, Manchester
    lol hey it works
    now our system can handle both exceptions and segfaults. Its for a system that controls a robot in a competition. We would rather it restarts the thread and (possibly) do funny things than for the whole system to just die and the robot falls to the ground and breaks.
     
  10. OP
    OP
    MadOnion87

    MadOnion87 Member

    Joined:
    Jul 10, 2002
    Messages:
    3,232
    Location:
    Old Trafford, Manchester
    lol, i would imagine that you want the robot to do more than just collapse and die though
     

Share This Page

Advertisement: