Thoughts on FixPix, image AI and where it's all going...
Note: Comments are currently disabled. If you wish to send a comment, please mail it to firstname.lastname@example.org
Sep 28, 2023
FixPix used to create cover of Med journal!
Med journal is a highly reputable monthly journal of medicine.
Each month, a new image is selected for the cover and the images are
both relevant to a medical topic as well as asthetically
and artistically pleasing. The cover of the September issue of Med journal
was created by using the Fast Style Transfer plugin of FixPix.
You can see the issue on the Med journal site here
More plugins arriving soon
New plugins have been developed - various Super Resolution techniques and debluring
May 08, 2023
ChatGPT and AI - Game over for any desk job (software developers included)?
Over the past few months, I've been having many sessions with ChatGPT and the new Bing chat which is based on the same technology. Some of the sessions occured when I was taking a break from developing FixPix Home and just entertaining myself by having deep philosophical conversations with a computer program (still hard to grasp that I'm living in a time to witness this), and other times using it to help with various focused technical tasks related to development of FixPix.
Now when I say focused technical tasks, I don't mean trivial technical tasks. I mean software development tasks that are well defined but can be complex and require a domain expert to perform correctly. Tasks that could take me hours to complete but ChatGPT writes the correct code (sometimes with minor fixes required) less than a second after I send it the specification. This can be for writing entire complex functions just from a specification in English, or converting existing code so that it uses different libraries or a different computer language.
So, at the moment for the purpose of helping with development tasks, it has the capabilities of a genius intern with no integrated large-system view. It can save hours of work, but I can't tell it to take a look at the entire codebase of FixPix and then give it high-level directives, or tell it to learn what is required to create the full infrastructure for the packaging logistics.
I don't believe there's a difference in principle between understanding the trees and understanding the forest. Both are manifestations of complex processes, and both have emergent behaviour that stems from the complexity of the processes that they are made of. but when you change the level of abstraction, the complexity of the forest does not require a higher intelligence to understand than the complexity of a tree. Since ChatGPT has already proved it is able to make abstractions and perform just as well if not better in many other tasks that humans thought they were unique at, I see no reason why large language models like ChatGPT won't be able to be both the low level developer, the project designer, and the system architect.
Can it also be the visionary?
One of the criteria to being a visionary is the ability to be creative, but creativity is not something that is well defined. When a chess master that is considered very creative in the game consistantly loses to an AI program, does that mean the program is more creative in chess than a human? Also, there are many questions that I've asked ChatGPT which appear in creativity tests and it passed with flying colors.
But of course, there is probably more to creativity and more to being a visionary than being creative (such as imagination, ambition, passion, confidence, open-mindedness, determination and adaptability - well at least that is what ChatGPT claims...). ChatGPT still gets many things wrong, it still makes up incorrect facts many times, and is still a very long way from being able to carry out the desk job of an experienced creative human with an integrative knowledge can do. However, I remain of the opinion that the difference between what an AI can do in this world and what a human can do is something that can be bridged, and at the exponential rate things are going, might be bridged faster than most believe.
Some may find this a bleak view - a view that kills the magic in this world. While I definitely share the concern that this might disrupt a lot of things we are used to and cause anxiety due to major readjustents that will be required of us, it will never kill the magic. I don't mean this in the sense that there is always something that we will be able to do better than an algorithm. My take on it is that what is magical about existance is not the uniqness of human abilites, but rather existance itself. Just think a bit about what a miracle being is. We are amazing by being a manifestation of it. This magical being does not belong to us - we belong to it, in the same way that any other intelligence which might evolve beyond ours will also belong to it, and if this intelligence ends up being self concsious and wondering about being - even more magic!
So, are we going to be replaced by AI?
From the incurable optimist point of view, it doesn't really matter - it's magical either way.
From the "will I be able to continue getting paid for what I spent many years of my life mastering?" point of view - the jury is still out, but I figure we'll find a way to adapt, just as we did after other revolutions.
I'd like to summarize with a part of a short passage by Raymond Smullyan which I really liked when I read it (too many years ago) and nicely captures the essence of the above:
"What is the difference between an optimist and an incurable optimist?" Answer: "An optimist is one who says, "Everything is for the best; mankind will survive." An incurable optimist is one who says, "Everything is for the best; mankind will survive. And even if mankind doesn't survive, it is still for the best."
December 19, 2022
FixPix Desktop, ChatGPT and Riffusion
FixPix Desktop has been released! (check it out here https://fixpix.net/download.php) The essence of FixPix Desktop is a platform for running the latest and greatest AI on your local images. It was aimed to be easy to use by whoever wants to see the power of image processing in general and AI image processing in particular. The modular architecture enables to add external plug-in components (FixPix Desktop plugins) each containing their own set of features. It also was made to be easy to create additional plugins by any developer who uses Python and has something to contribute (or sell) in the image processing field.
If you follow the AI news, no doubt you've already heard about ChatGPT, the new and jaw-dropping chatbot from OpenAI (of DALL-E 2 fame). You might ask why bring up AI that in the field of natural language processing and generation in a blog about AI in the field of image processing and generation?
I believe the capabilites of ChatGPT and what that means are just too mind-blowing to ignore. The emergent properties you get when you train a giant neural net on 300 billion words taken from just about every content type on the Internet - web texts, articles, books, Wikipedia etc. are nothing sort of mind blowing. I had conversations with ChatGPT and was totally stunned by what a breakthrough neural networks have made in the field of AI. You can ask anything of it and speed and way in which it answers makes you realize that we're not in Kanzas anymore. Write a poem on a particular subject? No problem. Explain a joke? Sure. Explain the famous double slit experiment the way a stand-up comedian might? OK. In fact I just tried the last one now. Here's ChatGPT's immediate response:
"Hey, have you heard about the double-slit experiment? It's this crazy physics thing that's supposed to show how particles can act like waves.
"So here's how it works: you shine a light through a wall with two slits in it, and then you measure what happens on the other side. And you know what you see? A bunch of frickin' waves! I mean, seriously, what the heck is going on here?
"I mean, I get it if you shine a light through water, you get waves. But this is a solid wall we're talking about! And yet, when you measure it, the light is acting like it's a bunch of waves.
"It's like, what kind of magic is this? Is the wall secretly made of Jell-O or something? I mean, I don't even know what to believe anymore.
"But the craziest part is when you cover up one of the slits. Suddenly, the waves disappear and the particles start acting like... well, particles. It's like they're like, 'Oh, you want us to act normal? Fine, we'll act normal.'
"I mean, I don't know about you, but I'm starting to feel like I'm living in the Matrix or something. It's like, what other crazy stuff is out there that we don't even know about?
"But hey, I guess that's the beauty of science, right? It's always full of surprises. Who knows what other weird stuff we're going to discover?
"Anyway, that's the double-slit experiment for you. It's like, one minute you think you know how the world works, and the next minute you're questioning everything you thought you knew.
"But hey, it's all part of the journey, right? The journey to understanding the mysteries of the universe. It's a trip, man."
I read some people's responses that nitpick about ChatGPT giving inaccurate or wrong replies or not being especially creative (or funny for that matter). The above sketch would probably fall in that category, but I believe that totally misses the point of what is happening here. It's basically like seeing the Wright brothers first flight of 120 feet and complaining that it is not interesting because you can't fly overseas to your vacation with this technology.
No - I believe this is the mark of the begining where everything changes, with way more far-reaching implications (technological, economical, social, ethical and philosophical) than how the ability to fly changed the world
When you live in times of exponent technological growth and creativity, there's no telling what tomorrow will bring...
The second reason I bring up ChatGPT is that it's really all connected in various ways.
RIFFUSION (https://www.riffusion.com/about) is a great example of great creativity of taking AI from one field and using it to create amazing result in another. In this particular case, Seth Forsgren and Hayk Martiros took spectograms, which are images that represent a piece of music. They then trained the model to create images of spectograms from text and converted the spectograms back to audio that you can listen to. Genius! 😃
November 07, 2022
So, How Big is Infinity?
Also, what does this question have to do with image processing and AI?
Before going into that, I'd like to introduce the open source Stable Diffusion image generation AI that's rivaling commercial large closed source players with the quality of the images it creates. In fact I've already written about Stable Diffusion before but managed to lose it somewhere...)
If you haven't yet heard of Stable Diffusion, a quick internet search will return enough headlines, articles, tutorials, blogs etc. about this amazing piece of technology. You'll quickly realize the impact this software has had on the open source image generation and AI fields, as well as future impact it will have on creating art and the obvious ethical concerns on what the potential for bad actors to misuse it.
Entire books can be written on the subject, but I'd like to focus here on something quite profound which goes deeper than any particular technology, regardless how amazing it might be.
For starters, let's look at some simple examples of what Stable Diffusion (as well as other state of the art text to image generators) can do. Basically, you give it a descriptive text and in seconds (each of the examples below were produced in less than 10 seconds) it produces an image that captures the descriptive text that was entered (to understand how this magic is perfomed is beyond the scope of this post, but you can easily look it up).
Example 1: Two Huskeys playing chess by Monet
Example 2 (using the same text): Two Huskeys playing chess by Monet
Example 3: A racoon playing the violin by Da Vinci
The above are just three simple examples. The number of possible examples that Stable Diffusion can create are limited only by your imagination to give textual descriptions of what you want to see in a picture, and of course, as you've seen, giving the same description twice will create two different pictures that convey the essence of the description.
The possibilities are endless.
Or are they?
Let's get back to basics to reflect on this question.
What is a computer image?
Any computer image is basically just a rectangular combination of pixels, where each pixel is a rectangle that can be a certain color.
For example, below is an example of an image that is composed of 90 pixels (9 rows, each containing 10 pixels), and in this case, each pixel can take only one of two colors: black or white.
You probably recognize that the above particular combination of black pixels in the 10 x 9 grid of pixels is a smiley. We could also draw other recognizable patterns on this 10 x 9 grid, for example:
I'm sure you could get creative and add draw many more images that I haven't included here given the freedom to draw black pixels wherever you'd like on this 10x9 grid, but could you be endlessly creative?
To get the answer, let's take this to the extreme. Suppose that instead of having a 10x9 grids like the ones above, you only had a 1x1 grid, so you have only one pixel that you can paint. How creative can you get? Well, you have the freedom to draw either one black pixel or one white pixel, so in total you can only produce two "pictures" (if you can call one pixel a picture...)
So basically your creative freedom given only one pixel to paint boils down to two boring (and small) pictures:
That's it. Two is the limit to the number of pictures you can generate if you only have one pixel to work with (which can take on either a black or a white color).
Since that seems obvious, lets see how many pictures we can generate if we are given a grid of two by two pixels (total of 4 pixels)
We can calculate this in a similar way that we used to understand how many pictures we can create with 1 pixel above:
So above is a 2x2 grid of pixels (numbered 1 to 4), and each pixel can take one of two colors, namely black and white. How many combinations of pictures can we create with this grid?
There are two posibilites to paint pixel 1 (black or white, just like the two possibilites when we only had one pixel to work with)
For each of the two possibilities to paint pixel 1, there are two possibilities to paint pixel 2, so we have 2x2=4 possibilities to paint combinations of pixel 1 and pixel 2
For each of the 4 possibilities to paint pixel 1 and 2, there are two possibilities to paint pixel 3, so we have 2x2x2=8 possibilities to paint combinations of pixel 1, pixel 2 and pixel 3
For each of the 8 possibilities to paint pixel 1, pixel 2, and pixel 3 there are two possibilities to paint pixel 4, so that gives us 2x2x2x2=16 possibilities to paint combinations of pixel 1, pixel 2,pixel 3 and pixel 4.
So, given 4 pixels, we are limited to creating a total of 16 different pictures. The following illustrates all the possible "pictures" that can be generated given 4 pixels.
However, every additional pixel that we add to the grid, multiplies the previous number of possible combinations by two, and this number of combinations for painting the different pixels grows very fast the larger the grid that you are allowed to use. In fact, this is exactly what exponential growth means.
To summarize, the number of possible pictures that can be generated can be calculated by repeatedly muliplying 2 as many times as there are pixels in the grid.
As we saw above, on a grid of 4 pixels, the number of possible images that can be generated is 2x2x2x2 also known as 2 to the power of 4, or 24 = 16.
Just to show you how fast the number of possible images that can be generated grows with the number of pixels in the image, let's get back to our simple 10x9 grid on which we drew some simple recognizable shapes. That grid has 90 pixels, and while the number of possible pictures we could generate with 4 pixels was 24 = 16, the number of possible pictures we can generate with 90 pixels is 290, which turns out to be the number below:
Or if you prefer it in words, with a 10x9 grid on which you can only draw white or black pixels, you are limited to generating one octillion two hundred thirty-seven septillion nine hundred forty sextillion thirty-nine quintillion two hundred eighty-five quadrillion three hundred eighty trillion two hundred seventy-four billion eight hundred ninety-nine million one hundred twenty-four thousand two hundred twenty-four different images.
Note that if you randomly choose an image from the above huge number of possible images, there's a high probability that it will just look like unrecognizable noise and won't appear like any shape we are familiar with. Only a very tiny fraction of these images would have any resemblence to something we'd call a meaningful picture (like the samples shown above).
We can't do much if we are limited to working with 90 pixels. Let's make things a bit more interesting.
Below is a picture using a 200x200 grid of black and white pixels that I'm sure you'll recognize:
How many different pictures can we make using the above 200x200 grid?
We saw that the number of different pictures that can be created using black or white pixels on a 2 by 2 grid is 16.
We then saw that the number of different pictures that can be generated using black or white pixels on a 10 by 9 grid is already a huge number (with 28 digits)
The number of different pictures that can be created with the above 200x200 grid is a number which has a length of 12,042 digits !
If we increase our creative freedom and enable full color on the above 200x200 grid, then instead of being limited to black or white we'd have the freedom to choose for each and every pixel in the grid, one of 16,777,216 different colors, and would increase the total number of different pictures we could generate to 16,777,21640,000 - a number that has a length of 288,989 digits !
So the number of different images that one can create even for small 200x200 images is so large that we can't even begin too wrap our limited human minds around the sheer scale of it. Even the number of atoms in the the entire observable universe becomes completely insignificant compared to these scales.
But the takeaway here is that as large as this number is, it is not infinite. A number with 288,989 digits is just as close to infinity as a number with 1 digit.
Our minds are not equiped to grasp the infinite. In fact, the very purpose of the mind is to make things finite so that we can discern between them and try and make sense of the relationships between different finite things, but it's an interesting exercise to try and grasp what pictures are included in the collection of 16,777,216400 pictures which contain all the possible full-color pictures that can be generated on a grid of 200x200 pixels.
So let's imagine you had the super ability to instantly travel to any time you wanted, from the bing bang 13.8 billion years ago to this day.
Let's also imagine you had the super ability to instantly travel to any place you wanted.
You would also be able to take a camera with you.
Also, suppose you are a person who likes to take pictures of everything, and by that I really mean everything - everything, everywhere and at every point in time.
Every event that has ever happened would be captured on your camera: You'd take all the possible pictures of everything that has ever existed. You'd take these pictures in all zoom levels and from every possible angle.
You'd take pictures showing the process of how every object in the universe formed and evolved - galaxies, star, planets, life on planets
Specifically, you'd also take pictures of everything and everyone that existed on earth at any point in time.
So, apart from pictures of every possible non-living things in the history of our planet, and all living things in the history of our planet (we'd finally know exactly how all dinosaurs and other prehistoric animals looked like), The entire biography of every living human on earth would be there in pictures, taken from every possible distance at every possible angle for every instant they existed. You could generate millions of different pictures for each second in the life of each and every person that has ever lived. Nothing would escape your camera.
Can you begin to imagine how many pictures that adds up to?
But all those pictures are in fact just a tiny, tiny fraction of the number of possible pictures in a 200x200 grid, because they are only pictures of all the events in the history of the universe (and Earth) that actually happened - whether it is Bach writing his first music piece, or a picture of the first stages of building the great pyramid of Giza.
For every event above that has actually happened (and which you so graciously took a picture of for posterity), we can generate an almost countless number of pictures of events that never happened - for example that time Napoleon Bonaparte (died 1821) met Charlie Chaplin (born 1889) at Admiral Edward Russell’s Punch Party (occured in 1694). So in collection of 16,777,216400 pictures you would also find a vastly larger collection of pictures for any possible imagined event involving every possible imagined thing.
And for every single picture of an imagined event, you'd find a mind-boggingly huge number of pictures that just seem like a random collection of pixels.
So, no - there isn't any image generation software that can create an infinite number of pictures as the number of pictures (especially meaningful pictures) that can be generated is always finite, but that fininte number is so vast that it beyond our mind's ability to grasp it, and contains every visual thing we could ever experience or imagine.
November 07, 2022
FixPix image-processing application to be released soon!
After months of work, I'm happy to announce the coming release of the FixPix image-processing application, which will have several advantages:
One time payment and it's yours (i.e own, rather than rent)
Unlimited access to the features provided in the FixPix site.
No Internet access required to use it after installation
Leverages the power of your computer's graphic card for faster results
A lifetime of free upgrades
The first release will be for the Windows platform, followed by Mac OS/X and Linux.
For a quick glance of what it is about, check out the help file below:
August 22, 2022
AI and Parkinson's Law
There's a short article by Tim Bryce from back in 2006 which is named "Parkinson's Law in IT". It's worth
reading. In a nutshell, it says that "As computer
hardware capacity increases, software becomes more bloated.". I'd expand that and say that in general we
tend to waste the things we take for granted. This can lead to not making the most of what we have, not being
grateful for what exists and instead focusing on what is missing, etc.
I could go deeper into how it is possible to balance the natural striving for more and better while at the same time
being happy with what already exists but this post is not going to go down that rabbit hole.
What I will bring up is the fact that the latest and greatest neural networks (i.e those that can achieve the
amazing feats that Google, OpenAI and others have shown) still require huge resources in terms of GPU memory and power.
To experiment on your home PC using the open source version that simulates Google's Imagen network requires
a strong graphics card that has around 16GB of memory. It does have a smaller neural network that you can use
on graphic cards with 8GB on memory but the generated images, though impressive (in terms of what we believed
were the limits of what a computer can do just a few years ago) could not be compared in any way with the results
that the commercial propriatary networks can achieve.
There is a cynical quote that says that every problem can be solved if you throw enough money at it. When there's
enough money to solve the problem then the companies that can allow themselves to will usually use the simplest
solution (relatively speaking) to solve an AI task, which is to make use of larger and larger networks, using many
strong graphic processors, fed with increadible amounts of data to produce very smart, but perhaps also very bloated
deep neural networks.
Of course, the question is how much of the size of the resulting neural network is just bloat due to opting for
brute force to train it (for those that have the resources to apply such force) and how much is it a theoretical
necessity derived from the amount of knowledge/information required to create an internal representation that can
create the jaw-dropping results that these neural networks can accomplish. This is a field of ongoing research
and hopefully with further research we will discover that while not everyone can buy a spaceship and fly to Mars,
everyone will be able to download a lightweight neural network to their own PC or even phone and create with it
amazing things which today we can only do by paying companies which have the resources to train and host these
August 18, 2022
AI for Art - Proprietary and Open Source
A big part of the AI revolution is due to the success of deep neural networks (if the concept
is unfamiliar, there are many good simple introductions on the Web, for example here),
and neural networks are also the driver rapidly evolving field of AI art and image generation.
In a previous post, I mentioned two amazing text-to-image technologies that were developed by Google and OpenAI, namely Imagen and DALL-E 2,
but there are more technologies and companies in this area
In the open source space:
Craiyon (formerly DALL-E mini)
Stable Diffusion (non-commercial license)
In the proprietary space:
The above list is not comprehensive by any means - there are dozens more (most of which are built using the
latest open source technologies), and I expect that with the freedom of information powered by
open source, the number of commercial and opens source solution will continue to accelerate exponentially.
August 14, 2022
Control vs. Choice or Dependency vs. Freedom
Note : The original post that was meant to be here took three hours to write and then was accidentally deleted 🤦.
This new post will not try to recover the scope of the original because there is a lot of other work to do here at FixPix, but
hopefully it will retain the main points of that magnificent post that is now in blog heaven...
The Internet began as a distributed network. What this means is that unlike a centralized system. no one person or entity owned or
controlled the Internet. Everyone had the freedom to add their own knowledge and the ability to share their own ideas,
or resources with anyone else on the network.
Having no central control over what people did on the Internet did not make it devolve into chaos, but instead made it evolve
into one of the most amazing phenomenon in the history of mankind with and enabled an exponential growth of
freely accessible knowledge.
Of course, the soaring popularity of the Internet was quickly leveraged by commercial entities and also enabled business models
where centralized services are used, giving companies control on who gets to do what with the services offered.
Software is created using programming langages. There are many programming languages software developers can choose from to develop software
but computers don't really understand programming languages. Computers can only run machine code. The programming languages are made
to be understood by humans and programs written in these languages are known as the "source code". This source code is translated into
machine code which your computer can then run.
For example, the famous Microsoft Word application is written in a programming language called C++. However, if you look inside the Word application
on your computer you won't find the C++ language it was created with but just illegible machine code. This is becuase Microsoft does not publish the
source code for Word and only provides you with the machine code that enables you to run the Word application.
This makes sense as a lot of money and time was invested by Microsoft in developing Word and giving the source code would mean anyone could then
instantly create their own version of the software, add to it and sell it.
And yet, over the years various "open source" movements and initiatived gained popularity. Open source means developers publish
the source code of the applications they develop. This lets other developers the ability to both fix and expand the original
application, and indeed that is what happened.
The popularity of open source has soared and today hunderds of thousands of open source projects exist, many of them as good as, if not better than
their closed source counterparts.
Yeah, great stories, but what does any of this have to do with FixPix?
We at FixPix are aware of the pros and cons of centralized vs distributed systems and open vs. closed source. While freedom comes with
a price (mainly its exploitation by bad actors) we firmly believe that freedom is what drives abundance and flourishing.
As such we strive to provide more freedom to anyone who wants to use the software:
Centralized vs. Distributed
FixPix is currently a SaaS (Software as a Service) product. This is a popular business model on the Internet where you basically rent the software
instead of buying it. This can be a a reccurring payment or pay per use model, but either way continued use requires continued pay.
A different model is one where you purchase the software with a one-time payment, download it to your computer and then use it as much as you like.
Soon we will be releasing a standalone software version that you can purchase. You pay once for the features and use them as much as you like
afterwards. There are other advantages for buying the software such as utilzation of the power of your computer for faster processing times and the
ability to use the software even without Internet connectivity. There is another exciting feature related to freedom which will be in the
standalone version of the product but this will be left for a later post...
Open Source vs. Closed Source
A lot of software development work went into making FixPix software a reality. Howerver, the variety of cutting edge AI technologies that FixPix
makes accessible is made possible by the contribution of developers that have made their technologies open source. To show the great appreciation and
respect we have for these developers, FixPix will credit the authors of open source technologies it leverages (even if their open source license does
not require it). We'll also strive to add our own contibution to the open source movement and this blog will point out amazing open source projects.
Get ready for a celebration of choices on how you want to use FixPix with our soon to be released downloadable software.
The popularity of open source enables us to quickly add more exciting features (and tell you who authored them).
Always backup your blog posts...
August 12, 2022
The first FixPix blog entry !
This marks the beginning of history for FixPix 😃
FixPix was designed from the start to be a place where everyone can learn hands-on what amazing things can be done with images
using the latest AI technologies. Until very recently things like, OpenAI's DALL-E 2 and Google's Imagen would have been considered totally science fiction. Seriously, who would have believed that in their lifetimes they could give a computer an elaborate textual description of a picture you'd like to create and in seconds you'd get an image that a graphic artist might need hours to create?
But these two amazing examples are just some of the jaw-dropping things AI can do today and it is anyone's guess what they'll be able to do tomorrow, bundled with technological, social, ethical and even philosophical implications...
The AI landscape in general and image AI in particular is evolving at an incredible rate, and seems to only be accelerating. This blog will contain updates on the hottest news in the field, explanations about various technologies and new features in FixPix that will enable you to easily run these technologies and be amazed at what they can do.