Showing posts with label debug. Show all posts
Showing posts with label debug. Show all posts

Thursday, June 30, 2011

Setting up minicom and ckermit for u-boot serial file transfers

Took awhile to get this simple bit of legacy working. I'm working over a USB serial console to a dev board and needed to update some parts of flash but I don't have working ethernet yet. U-Boot allows for kermit binary file transfers, but default ckermit settings don't appear to work well. So to speed others along, here's what I did. Quite simply, add this to (or create) ~/.kermrc:

set carrier-watch off
set handshake none
set flow-control none
robust
set file type bin
set rec pack 1000
set send pack 1000
set window 5

Otherwise, ckermit expects some modem line signals and CONNECT before it starts the transfer. Now you can use minicom's send command.

Thursday, September 9, 2010

The Facebook Like button PITA: fb_xd_fragment=

So I have been using Facebook's like button for awhile now. I have it on a couple blogs, and a site I maintain using drupal. I started noticing that I was getting requests with a query of "?fb_xd_fragment=". This is a problem, because it shows a blank page.

I did some investigating, and this seems to be caused by two bugs (that I can tell). One, that Facebook's API causes a bug in IE that sends this query in the first place, and second that my website, because of the Facebook API, causes the page to become visibly blank because of it.

So, what to do? I decided to kill the query string at the source: my apache web server using mod_rewrite.

So here's what I added to the apache config for just this web server:

RewriteEngine On
RewriteCond %{QUERY_STRING} fb_xd_fragment=
RewriteRule (.*) $1? [R=301]

So this basically sends a 301 HTTP code (Moved Permanently) with the same request, minus the query string. Note that the "?" on the RewriteRule is needed so that mod_rewrite doesn't append the old query back on. Seems to be doing the trick, and now I'm not losing traffic due to a Facebook API bug.

Friday, May 28, 2010

Dear Users: Do not withhold information from developers, kthxbye

I accidentally stumbled upon a debugging case today with a user that seems to be a common problem. I wont call this user out directly, but he was a case study in what not to do when you want help from a developer like myself.

The basic volley started off with the usual chit chat in an IRC channel:

<User> Can someone help me compile a module for my kernel?
<Me> Sure, what seems to be the trouble?

So off we went with some IRC and PasteBin exchanges of his compile problem. I looked at the source code for the driver he was trying to compile, and it was a one-line obvious fix to get it working with a newer kernel such as the one found on the Ubuntu 10.04 Lucid system he was working on.

So now the module compiled, and he tried loading it. Hmm...the module disagreed with symbols from modules on his running system, videodev to be exact.

Weird. That shouldn't happen. I asked him if he had compiled or installed different versions of v4l than what his system came with. He didn't recall. However, after getting him to pastebin "ls -lR" of his modules directory, it was apparent that 3 days ago, he did in fact completely replace the drivers/media install.

This meant that those modules didn't match the stock headers that came with his running kernel. This took a very short time for him, but considerable time for me (volunteer time) to find out. After finding out, he admitted to replacing the modules.

Now it was obvious that he was embarrassed to admit he had junked up his system, and even more embarrassing that I caught him in a lie. He could have saved time for both of us. If I had given up after helping with his initial problem, he would have been stuck not knowing how to fix it.

So the moral of the story here is, don't hide information from people trying to help you. Tell all the gross details. If you fed your cat buttermilk waffles off your keyboard, it might help to know that if your 'H' is stuck.

Wednesday, May 26, 2010

Debugging: The elusive deadlock

It's very infrequent that I come across a deadlock bug in my code that 1) isn't easy to find, and 2) is very easy to reproduce.

I had a user report a bug in my solo6010 driver where he has two cards installed in the system. He is on a Core2Duo. If he starts mplayer up on each display for the two cards he has installed (2 mplayer instances), his machine instantly deadlocks and spews to the console.

At first I wasn't able to easily reproduce this. I'm on a Core2Quad, but since I have 4 cards installed I decided to start an mplayer instance for each display device for each card (4 mplayer instances). Oddly enough, I also deadlocked and spewed softlockup messages to the console.

Do you see where this is going? I decided, for clarity, to disable two of my cores:

echo 0 | sudo tee /sys/devices/system/cpu/cpu2/online
echo 0 | sudo tee /sys/devices/system/cpu/cpu3/online

Sure enough it only took two mplayer instances to deadlock my machine this time. Weird! Now my driver currently is able to pull 44 feeds from 4 cards at once for the MPEG feeds. Here, in this case, I am deadlocking with just two YUV feeds from the uncompressed video of the card. This code is much less complex, and the locking even less so. No parts of the driver share data between card instances (each card instance has it's own data and locks).

Upon further investigation I've noticed that this deadlock appears to happen in spin_unlock_irqrestore() during wake_up().

After carefully tracing the code, it was vaguely apparent that my logic around the wakeup routine for when it tries to grab a frame from the the hardware was a little off. I was using a different wake struct for each file handle, when I should have been using one per card. Not to mention, I was not taking advantage of the video sync IRQ to send a wakeup to the thread so that it knew a new frame was ready to grab (allowed me to spin less, and guaranteed the threads to be awoken when a new frame was ready).

Reworking this logic just a bit cleared the deadlock. Honestly, I'm not entirely sure of how the scenario caused a deadlock. It appears to be something in the underlying logic for wait/wake_up routines. I wont argue that it is fixed now, and my code is cleaner and more efficient, so I wont ask too many questions.