Friday, January 28, 2022

1/28/22: Presentation

  Agenda & minutes:

  • Welcome to the 3rd meeting! 
  • Any updates/news/inputs/comments?
    • VW: We should think about recording the meetings. 
  • Organizational issues
    • This agenda and minutes is on a blog, https://AIntwhatitusedtobe.blogspot.com. You can add comments if you like, just like for many other blogs.
    • This meeting will be automatically ended by Zoom after 40 min. 
      • Continuing the discussion in another meeting afterward has been suggested. This is something to consider in the future.
    • Future meetings: The consensus is that each meeting focus on a presentation by someone about something. (This is instead of trying to cover a bunch of smaller things in each meeting.)
      • Today, Jan. 28: VK will present his current work. 
      • Feb. 4 meeting: Tentatively, one of MM's students will present. 
      • Feb. 11 meeting: We will focus on "Attention is All You Need" and read and discuss it together.
      • Feb. 18 meeting: Any ideas/requests?
  • Readings that have been suggested
    • "Deep learning—a first meta-survey of selected reviews across scientific disciplines, their commonalities, challenges and research impact," https://peerj.com/articles/cs-773
      • We read the abstract last time and scanned more this time. It is not clear whether we should continue reading material from it. Any opinions/thoughts/comments?
    • "Attention is all you need," https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
      • We read the first sentence of the abstract and discussed it quite a bit. This is a classic and important paper. We will devote a future meeting to reading through more of it, bit by bit. 
    • Some quantum computing references we could read as needed (from VW):
      • - Quantum crossing threshold (free): https://www.nature.com/articles/s41586-021-04273-w
        - Crossing threshold in silicon: https://www.nature.com/articles/s41586-021-04182-y
        - Three-qubit donor processor in Si: https://www.nature.com/articles/s41586-021-04292-7
  • VW has "a Python Quick Reference that runs in Google collab that I can go over for real time ML demonstrations." He presented but did not finish last time, and we didn't get to it this time, so we could devote a future meeting to it.

1 comment:

  1. Enjoyed VK's presentation.

    One thing that leapt out in the Transformer discussion is that decomposing the image into equal sized patches does not reflect the best economy in carrying semantic meaning through the computation. Fanny pointed out that you could spend a lot of time computing on things that don't matter. I proposed that instead of using patches one would use image segments instead. These segments would then be labeled, and would have a better correspondence to meaning than ad hoc image patches that depend on the camera position and field of view. The example of an image of a broken femur was brought up and then DB mentioned that 'how would you know it was a broken femur'. A short reflection yields this, one could train against broken femur images that had been segmented, rather than broken femur images that had been subdivided into patches. Only when the femur had been shattered to the point that it was pixel sized would this approach be less effective than the segmentation approach.

    We talked about the flattening of a matrix from 2D to 1D and the fact that lateral relationships horizontally, vertically, diagonally and in the general case - radially - from each pixel are lost at the moment the matrix is flattened. Flattening corresponds to a loss of the associated semantic information image that comes from these lateral relationships. It is interesting that as we go from processing an entire image, to processing a pixel in an image, that the relations we care about go from a Cartesian coordinate system to a Radial coordinate system but I digress.

    One question I had about these language transformers is, overall, how does the system figure out that certain word tokens are interchanged in order? The English prepositional phrase, "on the table" might well translate in another language to "the table on". French does this, 'the blue chicken" --> "le poulet bleu".

    One principal used in coming up with the segmentation vs. patches approach was, "Explainability of AI", which is a hot research topic of course along with, "Hyperparameter Optimization".

    Here's why. When an image is segmented by silhouetting each object in the image, then a clear word or symbol can be associated with each silhouette. Additionally one can obtain z ordering, x, y position, and velocity information by noticing in a sequence of images, how the silhouettes of the thus segmented objects are moving. Doing this via a patch approach seems inefficient.

    Two last items:
    It would be really helpful to see the pipeline diagrams that include 'multi-head attention' animated to show functionally what they are doing in the computation and how that flows downstream. Animations such as we see in playground.tensorflow.org are a first step in addressing the explainability problem, and certainly help us in our research understanding. They further constitute a tangible proof-of-work product (term borrowed from cryptocurrency mining) that we can post to sites such as youtube for the edification of others.

    In this vein when I was at JPL I recall a paper presentation by Charles Anderson, where he was able to detect movement in images using pyramidal techniques and Laplace transforms. Primitive by today's ambitious standards, but ahead of its time nonetheless. Image pyramids are useful to reduce training times by using lower resolution images which are less expensive to segment. Then as a solution is approached higher resolutions can be invoked to improve a solution which has effectively converged.



    ReplyDelete