Friday 22 July 2011

Windows Media Player NUI

**WARNING** If you would like to try out this software, you may need to change this line:
            //set WMP source
            axWindowsMediaPlayer.URL = @"C:\Users\Public\Videos\Sample Videos\Wildlife.wmv";

That is where a default windows video is located on my computer, but may not be on yours. You'll need to change it to a file on your computer. 

Demo 1
Demo 2/Tutorial

I've been interested in Natural User Interfaces for some time and frequent websites such as Nui Group often. With the release of the Kinect SDK I decided to give making a Media Player NUI a shot and released a demo a few weeks ago. Since then Mark and I made some changes, and thought I'd do an update, show some code, and allow others to download and try it. We intend to make another demo soon showing the new features. This code requires having the Kinect NUI and Audio operational as well as the Windows Media Player SDK.

We started with a Windows Form Application in Microsoft Visual C# 2010 Express and put a media player and a timer on the screen, as well as following all the setup required for the Kinect

The meat of the code is here.

        const int STOP_TIMER = 10;
        int timer = 10;

         void nui_SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
        {
            //reset watchdog timer that stops the video from playing
            timer = STOP_TIMER;
        }
         private void timer1_Tick(object sender, EventArgs e)
        {
            timer--;
            if (timer < 0)
            {
                voice_enabled = true;
                axWindowsMediaPlayer.Ctlcontrols.pause();
            }
        }

We declared some variable and value which counts down constantly using a timer. This variable is reset whenever the Kinect uses the nui_SkeletonFrameReady function, which is whenever it thinks it sees a skeleton. The timer will not reach 0 until the kinect is tracking nothing, at which point it will enable voice commands and pause the media player.

Essentially, what we wanted to do is make it so that if someone had to run off and answer the door, get the phone, save a life, that the video they were watching would pause for them without the need of finding a remote. This implementation currently pauses once there is no one watching the video, but could be adapted to if anyone leaves.


Another feature we created was a simple, hand controlled remote for the screen. There are no preference controls for it currently, and is set to what I thought was comfortable. We tried multiple methods for creating a remote on screen, such as having the mouse follow the right hand, however we found other methods to be easier.


        public float z_last = 0;
        public float z_avg = 0;
        public float delta_z = 0;
        public float z_current = 0;

        void nui_SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
        {
            SkeletonFrame skeletonFrame = e.SkeletonFrame;
            foreach (SkeletonData data in skeletonFrame.Skeletons)
            {
                if (data.TrackingState == SkeletonTrackingState.Tracked)
                {
                    z_current = data.Joints[JointID.WristRight].Position.Z;
                    delta_z = z_current - z_last;
                    if (data.Joints[JointID.WristRight].Position.Y > data.Joints[JointID.Spine].Position.Y)
                    {
                        z_avg = (float)((z_avg * 0.7) + (delta_z * 0.3));
                    }
                    else
                    {
                        z_avg = 0;
                        //Part of the bump version of the remote
                        pictureBox1.Visible = false;
                        remote_enabled = false;
                    }
                    z_last = z_current;
                    .
                    .
                    .
We actually track data for the x y and z movements of the right wrist, however I thought I'd only explain one. We keep track of the wrists current position, last position, and find it's change in position (delta). If the wrist is above the spine, we then calculate it's average movement over some distance. Doing this, we can compare it's movement and see if it exceeded some limit, which we can use for various remote controls.
                    if (z_avg < -remote_enabled_limit && !remote_enabled)
                    {
                        remote_enabled = true;
                        remote_enabled_timer = -50;
                        remote_option = "PLAY";
                    }

Here we see if the z's average (which is forward and backward movement) exceeds a limit set which enables the remote. This is how we compare movement in the negative direction of z's field. Essentially, holding your hand at should level and pushing forward with a reasonable speed will enable a remote on screen which then has buttons that can be controlled using x y and z gestures.


The remote has volume up, down, fast forward, rewind, play, pause. By bumping up or down you can select volume controls.

To change the volume, you smoothly press forward and the volume rapidly changes, you do not need to constantly shake your hand forward and back, though you may if you would like.

Fast Forward and Rewind automatically is executed when the hand gestures right or left. I assume that if you are selecting fast forward or rewind, it would be because you would like to use them, not because you'd like to just hover your hand over the button.

Play and Pause is often disabled for less than a second to not have players accidentally click it after other buttons. It can be used by pressing forward over the button with a reasonable amount of movement.

The remote disappears once the hand is below the spine... I should probably change that to waist as it sometimes disappears when I select volume down.


The program also allows voice commands. We've done nothing special with the voice commands and learned how to do it straight off the Kinect demo. Current commands include "play", "rewind", "pause", "stop", "fast forward", "volume up", "volume down", "up", "down", "mute", "fullscreen", "maximize", "remote".

Voice commands are not enabled while the video is playing without a gesture.
Mark and I found gestures rather difficult to write (which may become our next adventure into the kinect world) and have written only one gesture this complicated.
                    //compare joints :(

                    //check for hands below head and above shoulders
                    if ((data.Joints[JointID.HandLeft].Position.Y < data.Joints[JointID.Head].Position.Y) && (data.Joints[JointID.HandLeft].Position.Y > data.Joints[JointID.ShoulderCenter].Position.Y))
                    {
                        if ((data.Joints[JointID.HandRight].Position.Y < data.Joints[JointID.Head].Position.Y) && (data.Joints[JointID.HandRight].Position.Y > data.Joints[JointID.ShoulderCenter].Position.Y))
                        {
                            //check hands are bent indwards
                            if ((data.Joints[JointID.HandLeft].Position.X > data.Joints[JointID.ElbowLeft].Position.X) && (data.Joints[JointID.HandRight].Position.X < data.Joints[JointID.ElbowRight].Position.X))
                            {
                                //check elbows are below shoulders
                                if ((data.Joints[JointID.ElbowLeft].Position.Y < data.Joints[JointID.ShoulderCenter].Position.Y) && (data.Joints[JointID.ElbowRight].Position.Y < data.Joints[JointID.ShoulderCenter].Position.Y))
                                {
                                    //Yay we did it!
                                    voice_enabled = true;
                                    voice_enabled_timer = 0;

                                }
                            }
                        }
                    }
This is the messy code we did to check, essentially, that the user put a hand on both sides of their mouth, similar to what one would do when shouting over a distance.  This enables voice commands while the video is playing, though for a short time. While the video is not playing however, voice commands can be done without this gesture.


Known Issues:
There were errors on closing the program. I have not seen them in sometime and kept pushing off fixing them. I've done nothing to fix it, so I assume they are still there.
Fullscreen hides the remote, but leaves it operating.
Threading makes things difficult to control, it's a feature, not a bug.
Voice commands are not as top notch as one may hope.

Required Software:
Kinect Audio SDK
Kinect NUI SDK
Windows Media Player SDK
Our code

Sunday 3 July 2011

Xbox Kinect SDK

So the Xbox Kinect SDK released a little while back over at Microsoft Research Labs and is quite interesting. I've been playing around with it and learning some C# at the same time.

The SDK allows audio, depth, and skeletal tracking, among other things, and comes with a plethora of demos. Simply, one can have 2 active skeletons with joints labeled with X, Y, and Z axis values, as well as many values to determine accuracy of the values, smoothing of animation, and much more.

For my first project, a friend of mine (Mark Arnott) messed around just testing each data field and putting them on the screen. We managed to make it understand some gestures, such as pushing our hand towards the display, and enlarging, shrinking and moving a picture using the position and distance between our hands.

I've for some time though that a NUI would be great for a media player. I've downloaded the WMP SDK and Mark and I have managed to create some NUI controls for the WMP.

Exiting the frame will cause the media player to pause.
Saying "Rewind" when paused will cause the media player to rewind.
Saying "Play" when paused will cause the media player to play.

We are still considering other controls that would be good, such as proper gestures for volume, and maybe controls to rewind 3 seconds and begin again. The main idea is if the phone rings, or you want to go to the bathroom, or for any reason, it is mildly annoying to find the remote, push the pauses button, and then continue with what you wanted to do. Also, when no one is watching the TV or a Movie, does the thing really need to keep going?

Check out our demo here.