OpenCV

27.05.2021

There has been a lot going on lately with computer vision. Cool development tools are basically free for you to try. My latest discovery was mediapipe framework, which is available for many programming platforms.

I have always been fascinated by computer vision systems. They are able to squeese information from both static and moving image. I think in most cases, the only limitation is your own imagination.

I started writing my own project with python language, since it gives all the tools you need. My example application uses security camera image and it tracks people who are passing by. Since resolution is quite low, people are unrecognizable. I am looking to buy better camera, but this is ok for testing.

Once posedetection is able to recognize human form, you are able to export 33 coordinate points in real time with x,y,z -coordinates. Coordinates are published as normalized form, but once you multiply them with camera / video resolution you wil get corresponding points with the screen resolution. Z-value is more experimental, but seems to work fine too (depth value). Point values are relative and do not match with any real dimensional value (meters / inches / feet). To accomplish this, You would need to calculate correction matrix (gives You real life values for corresponding coordinate point). I might try that later.

You have 33 coordinate points x,y,z at your disposal
You will get normalized values for each axis (x,y,z).
  • Below you can see that it doesn’t actually take some many lines of code to produce this example.

This was quite educational and fun to play with. I also tried webcam image instead of recorded videofile and results were very good. You can change model complexity to change prosessing power, but then at the same time you wil loose tracking reliability. Value 1 seemed to work just fine for me.

Also by changing the min_detection_confidence and min_tracking_confidence gives you some control over the results.

I have and idea of recording all coordinate points (for example in csv/json format) and then visualise the exact pose with blender (using skeleton model). Let’s see if it gets done.

I will propably remount my camera with only 1-axis being tilted. In this case you can survive with simple trigonometry. Also other variable might cause problems, such as terrain being uneven, lens distorsions, poor visibility etc.

Let’s see when this gets updated.

More info:

https://google.github.io/mediapipe/